This notebook demonstrates the use of the logprobs
parameter in the Chat Completions API. When logprobs
is enabled, the API returns the log probabilities of each output token, along with a limited number of the most likely tokens at each token position and their log probabilities. The relevant request parameters are:
logprobs
: Whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on thegpt-4-vision-preview
model.top_logprobs
: An integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability.logprobs
must be set to true if this parameter is used.
Log probabilities of output tokens indicate the likelihood of each token occurring in the sequence given the context. To simplify, a logprob is log(p)
, where p
= probability of a token occurring at a specific position based on the previous tokens in the context. Some key points about logprobs
:
- Higher log probabilities suggest a higher likelihood of the token in that context. This allows users to gauge the model's confidence in its output or explore alternative responses the model considered.
- Logprob can be any negative number or
0.0
.0.0
corresponds to 100% probability. - Logprobs allow us to compute the joint probability of a sequence as the sum of the logprobs of the individual tokens. This is useful for scoring and ranking model outputs. Another common approach is to take the average per-token logprob of a sentence to choose the best generation.
- We can examine the
logprobs
assigned to different candidate tokens to understand what options the model considered plausible or implausible.
While there are a wide array of use cases for logprobs
, this notebook will focus on its use for:
- Classification tasks
- Large Language Models excel at many classification tasks, but accurately measuring the model's confidence in its outputs can be challenging.
logprobs
provide a probability associated with each class prediction, enabling users to set their own classification or confidence thresholds.
- Retrieval (Q&A) evaluation
logprobs
can assist with self-evaluation in retrieval applications. In the Q&A example, the model outputs a contrivedhas_sufficient_context_for_answer
boolean, which can serve as a confidence score of whether the answer is contained in the retrieved content. Evaluations of this type can reduce retrieval-based hallucinations and enhance accuracy.
- Autocomplete
logprobs
could help us decide how to suggest words as a user is typing.
- Token highlighting and outputting bytes
- Users can easily create a token highlighter using the built in tokenization that comes with enabling
logprobs
. Additionally, the bytes parameter includes the ASCII encoding of each output character, which is particularly useful for reproducing emojis and special characters.
- Calculating perplexity
logprobs
can be used to help us assess the model's overall confidence in a result and help us compare the confidence of results from different prompts.