Get logprobs at output token level

Francesco_Pochetti · December 13, 2024, 9:05am

Hey there, I was wondering if it was possible to get logprobs for each individual token in the output of a Gemini model.
In all the examples I find online (here), it seems I can only get avg_logprobs. Basically an average of logprobs for all output tokens. This defeats the purpose of more fine-grained control of what we receive and how we use it.
In my case specifically, I receive a complex JSON output and I need to compute model confidence (from logprobs) from some specific entries in the JSON only.
Essentially what OpenAI offers since day 1 (here).
Any hints please?

Addendum
It seems with gemini-1.5-flash-002 I can invoke the model with

generation_config = genai.GenerationConfig(response_logprobs=True)

This might go in the right direction according to docs, but it’s unusable. After 3 calls I get

ResourceExhausted: 429 Unable to submit request because you've reached the maximum number of requests with logprobs you can make per day. Remove logprobs from the request or try again tomorrow.

chaewon.huh · February 14, 2025, 7:20am

is there any updates?

Francesco_Pochetti · February 14, 2025, 7:52am

Unfortunately not that I know of

Varun_Jain · February 18, 2025, 2:59am

Would love an update on this too

Logan had posted that this would be available ~4 months ago (x.com). But can’t seem to be find anything in the documentation.

jkirstaetter · February 18, 2025, 2:24pm

Hi there,

Using an enabled responseLogprobs setting in the generationConfig and using models gemini-1.5-pro-002, gemini-1.5-flash-002 gemini-2.0-flash-exp, or gemini-exp-1206 gives me an HTTP 400 as response:

{
  "error": {
    "code": 400,
    "message": "Logprobs is not supported for the current model.",
    "status": "INVALID_ARGUMENT"
  }
}

Does anyone have a successful request?

Cheers

Co_Brainers · April 23, 2025, 11:28am

Hi jkirstaetter, we have never used mentioned models. We use the openai SDK with gemini-2.0-flash 001 and gpt-4o.. This combination works fine, beside the fact that we ran into quota problems with gemini quite early.

Will_Powell · May 15, 2025, 5:44pm

Hi there,

Have we ever got a solution or reasoning as to why it always returns "Logprobs is not supported for the current model." for seemingly every model I try when I call with Google’s GenAI library (not OpenAI’s SDK)?

Here is the minimal example using pythons google-genai library:

from google import genai
import os

# create client
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

response = client.models.generate_content(
    model='gemini-2.0-flash',
    contents='What type of food is a tomato?',
    config={
        'response_mime_type': 'application/json',
        'response_logprobs': True
    },
)

Which returns ClientError: 400 INVALID_ARGUMENT. {'error': {'code': 400, 'message': 'Logprobs is not supported for the current model.', 'status': 'INVALID_ARGUMENT'}}

Do we also know which models are meant to support this?

Thanks in advance.

Will_Powell · May 16, 2025, 8:43am

Hi @jkirstaetter, did you find a solution to this with google’s genai?
Thanks in advance

T_A · May 16, 2025, 4:08pm

Last I checked, you need to use Vertex and you only get one request per day that resets at 12am PDT. It also only works on some of the models, but I think flash-2.0 was working for me when I checked last week.

I think the only change would be you need to setup vertex access and add the vertex args to your client, after authenticating.

client = genai.Client(
    vertexai=True,
    project="project_name",
    location="us-central1",
)

Generate content with the Vertex AI Gemini API | Generative AI on Vertex AI | Google Cloud

T_A · May 16, 2025, 4:15pm

Here is a complete untested script:

from google import genai
from google.genai import types
import base64

"""
A simple example using Google's Vertex client, which allows one to generate logprobs once per day.
"""
PROJECT_NAME="my_project"
def generate():
  client = genai.Client(
      vertexai=True,
      project=PROJECT_NAME,
      location="us-central1",
  )


  model = "gemini-2.0-flash"
  contents = [
    types.Content(
      role="user",
      parts=[
        types.Part.from_text(text="""hello""")
      ]
    ),
  ]
  generate_content_config = types.GenerateContentConfig(
    temperature = 1,
    top_p = 0.95,
    response_logprobs=True,
    logprobs=1,
    max_output_tokens = 8192,
    response_modalities = ["TEXT"],
    safety_settings = [types.SafetySetting(
      category="HARM_CATEGORY_HATE_SPEECH",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_DANGEROUS_CONTENT",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_SEXUALLY_EXPLICIT",
      threshold="OFF"
    ),types.SafetySetting(
      category="HARM_CATEGORY_HARASSMENT",
      threshold="OFF"
    )],
  )

  for chunk in client.models.generate_content_stream(
      model = model,
      contents = contents,
      config = generate_content_config,
      ):
      print(chunk.text, end="")
  return response

x = generate()
print(x)

It prints something like:

candidates=[Candidate(content=Content(parts=[Part(video_metadata=None, thought=None, code_execution_result=None, executable_code=None, file_data=None, function_call=None, function_response=None, inline_data=None, text=‘Hello! How can I help you today?’)], role=‘model’), citation_metadata=None, finish_message=None, token_count=None, finish_reason=<FinishReason.STOP: ‘STOP’>, avg_logprobs=-7.311527676052517, grounding_metadata=None, index=None, logprobs_result=LogprobsResult(chosen_candidates=[LogprobsResultCandidate(log_probability=-0.00019188585, token=‘Hello’, token_id=None), LogprobsResultCandidate(log_probability=-0.0030728758, token=‘!’, token_id=None), LogprobsResultCandidate(log_probability=-0.01058189, token=’ How’, token_id=None), LogprobsResultCandidate(log_probability=-7.783962e-05, token=’ can’, token_id=None), LogprobsResultCandidate(log_probability=-2.0266912e-06, token=’ I’, token_id=None), LogprobsResultCandidate(log_probability=-0.00032465608, token=’ help’, token_id=None), LogprobsResultCandidate(log_probability=-1.0729074e-06, token=’ you’, token_id=None), LogprobsResultCandidate(log_probability=-5.2448504e-06, token=’ today’, token_id=None), LogprobsResultCandidate(log_probability=-2.3844768e-07, token=‘?’, token_id=None)], top_candidates=[LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.00019188585, token=‘Hello’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.0030728758, token=‘!’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.01058189, token=’ How’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-7.783962e-05, token=’ can’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-2.0266912e-06, token=’ I’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-0.00032465608, token=’ help’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-1.0729074e-06, token=’ you’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-5.2448504e-06, token=’ today’, token_id=None)]), LogprobsResultTopCandidates(candidates=[LogprobsResultCandidate(log_probability=-2.3844768e-07, token=‘?’, token_id=None)])]), safety_ratings=None)] create_time=datetime.datetime(2025, 5, 7, 7, 47, 37, 19485, tzinfo=TzInfo(UTC)) response_id=‘GRAbaJ2YAZPw-O4Poo_T-Ac’ model_version=‘gemini-2.5-flash-preview-04-17’ prompt_feedback=None usage_metadata=GenerateContentResponseUsageMetadata(cache_tokens_details=None, cached_content_token_count=None, candidates_token_count=9, candidates_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=9)], prompt_token_count=1, prompt_tokens_details=[ModalityTokenCount(modality=<MediaModality.TEXT: ‘TEXT’>, token_count=1)], thoughts_token_count=245, tool_use_prompt_token_count=None, tool_use_prompt_tokens_details=None, total_token_count=255, traffic_type=<TrafficType.ON_DEMAND: ‘ON_DEMAND’>) automatic_function_calling_history= parsed=None

Will_Powell · May 18, 2025, 6:23pm

Thanks for letting me know. Unfortunately 1 request a day is incredibly limited for my application.
How frustrating Gemini have removed support for this in seemingly all their models via their genai API!

Topic		Replies	Views
400 Bad request error - infuriating! Gemini API api , model-code	10	532	March 24, 2025
Cannot use system instruction with stream mode of `gemini-1.5-flash-002` Gemini API gemini-15 , bug , api	7	352	January 10, 2025
oneOf in response_schema Gemini API api	25	1530	April 13, 2025
400 Invalid argument while using candidate_count>2 and long json in the prompt Gemini API prompt	2	221	February 28, 2025
Is the gemini 2.0 flash api not available yet? Gemini API gemini-flash	6	763	December 16, 2024

Get logprobs at output token level

Related topics