Get logprobs at output token level

:wave: Hey there, I was wondering if it was possible to get logprobs for each individual token in the output of a Gemini model.
In all the examples I find online (here), it seems I can only get avg_logprobs. Basically an average of logprobs for all output tokens. This defeats the purpose of more fine-grained control of what we receive and how we use it.
In my case specifically, I receive a complex JSON output and I need to compute model confidence (from logprobs) from some specific entries in the JSON only.
Essentially what OpenAI offers since day 1 (here).
Any hints please?

Addendum
It seems with gemini-1.5-flash-002 I can invoke the model with

generation_config = genai.GenerationConfig(response_logprobs=True)

This might go in the right direction according to docs, but it’s unusable. After 3 calls I get

ResourceExhausted: 429 Unable to submit request because you've reached the maximum number of requests with logprobs you can make per day. Remove logprobs from the request or try again tomorrow.
5 Likes

is there any updates?

Unfortunately not that I know of

Would love an update on this too

Logan had posted that this would be available ~4 months ago (x.com). But can’t seem to be find anything in the documentation.

Hi there,

Using an enabled responseLogprobs setting in the generationConfig and using models gemini-1.5-pro-002, gemini-1.5-flash-002 gemini-2.0-flash-exp, or gemini-exp-1206 gives me an HTTP 400 as response:

{
  "error": {
    "code": 400,
    "message": "Logprobs is not supported for the current model.",
    "status": "INVALID_ARGUMENT"
  }
}

Does anyone have a successful request?

Cheers

2 Likes

Hi jkirstaetter, we have never used mentioned models. We use the openai SDK with gemini-2.0-flash 001 and gpt-4o.. This combination works fine, beside the fact that we ran into quota problems with gemini quite early.

1 Like

Hi there,

Have we ever got a solution or reasoning as to why it always returns "Logprobs is not supported for the current model." for seemingly every model I try when I call with Google’s GenAI library (not OpenAI’s SDK)?

Here is the minimal example using pythons google-genai library:

from google import genai
import os

# create client
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents='What type of food is a tomato?',
    config={
        'response_mime_type': 'application/json',
        'response_logprobs': True
    },
)

Which returns ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Logprobs is not supported for the current model.', 'status': 'INVALID_ARGUMENT'}}

Do we also know which models are meant to support this?

Thanks in advance.

Hi @jkirstaetter, did you find a solution to this with google’s genai?
Thanks in advance

Last I checked, you need to use Vertex and you only get one request per day that resets at 12am PDT. It also only works on some of the models, but I think flash-2.0 was working for me when I checked last week.

I think the only change would be you need to setup vertex access and add the vertex args to your client, after authenticating.

client = genai.Client(
    vertexai=True,
    project="project_name",
    location="us-central1",
)

Generate content with the Vertex AI Gemini API | Generative AI on Vertex AI | Google Cloud

Here is a complete untested script:

from google import genai
from google.genai import types
import base64

"""
A simple example using Google's Vertex client, which allows one to generate logprobs once per day.
"""
PROJECT_NAME="my_project"
def generate():
  client = genai.Client(
      vertexai=True,
      project=PROJECT_NAME,
      location="us-central1",
  )


  model = "gemini-2.0-flash"
  contents = [
    types.Content(
      role="user",
      parts=[
        types.Part.from_text(text="""hello""")
      ]
    ),
  ]
  generate_content_config = types.GenerateContentConfig(
    temperature = 1,
    top_p = 0.95,
    response_logprobs=True,
    logprobs=1,
    max_output_tokens = 8192,
    response_modalities = ["TEXT"],
    safety_settings = [types.SafetySetting(
      category="HARM_CATEGORY_HATE_SPEECH",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_DANGEROUS_CONTENT",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_HARASSMENT",
      threshold="OFF"
    )],
  )

  for chunk in client.models.generate_content_stream(
      model = model,
      contents = contents,
      config = generate_content_config,
      ):
      print(chunk.text, end="")
  return response

x = generate()
print(x)

It prints something like:

candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text=‘Hello! How can I help you today?’)], role=‘model’), citation_metadata=None, finish_message=None, token_count=None, finish_reason=<FinishReason.STOP: ‘STOP’>, avg_logprobs=-7.311527676052517, grounding_metadata=None, index=None, logprobs_result=LogprobsResult(chosen_candidates=[LogprobsResultCandidate(log_probability=-0.00019188585, token=‘Hello’, token_id=None), LogprobsResultCandidate(log_probability=-0.0030728758, token=‘!’, token_id=None), LogprobsResultCandidate(log_probability=-0.01058189, token=’ How’, token_id=None), LogprobsResultCandidate(log_probability=-7.783962e-05, token=’ can’, token_id=None), LogprobsResultCandidate(log_probability=-2.0266912e-06, token=’ I’, token_id=None), LogprobsResultCandidate(log_probability=-0.00032465608, token=’ help’, token_id=None), LogprobsResultCandidate(log_probability=-1.0729074e-06, token=’ you’, token_id=None), LogprobsResultCandidate(log_probability=-5.2448504e-06, token=’ today’, token_id=None), LogprobsResultCandidate(log_probability=-2.3844768e-07, token=‘?’, token_id=None)], top_candidates=[LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.00019188585, token=‘Hello’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.0030728758, token=‘!’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.01058189, token=’ How’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-7.783962e-05, token=’ can’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-2.0266912e-06, token=’ I’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.00032465608, token=’ help’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-1.0729074e-06, token=’ you’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-5.2448504e-06, token=’ today’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-2.3844768e-07, token=‘?’, token_id=None)])]), safety_ratings=None)] create_time=datetime.datetime(2025, 5, 7, 7, 47, 37, 19485, tzinfo=TzInfo(UTC)) response_id=‘GRAbaJ2YAZPw-O4Poo_T-Ac’ model_version=‘gemini-2.5-flash-preview-04-17’ prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=9, candidates_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=9)], prompt_token_count=1, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=1)], thoughts_token_count=245, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=255, traffic_type=<TrafficType.ON_DEMAND: ‘ON_DEMAND’>) automatic_function_calling_history= parsed=None

Thanks for letting me know. Unfortunately 1 request a day is incredibly limited for my application.
How frustrating Gemini have removed support for this in seemingly all their models via their genai API!