Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
ID of the model to use
"meta-llama/Llama-3.3-70B-Instruct"
The prompt to generate completions for A single string prompt
1
false
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far
0
Modify the likelihood of specified tokens appearing in the completion.
Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. Mathematically, the bias is added to the logits generated by the model prior to sampling. The exact effect will vary per model, but values between -1 and 1 should decrease or increase likelihood of selection; values like -100 or 100 should result in a ban or exclusive selection of the relevant token.
{ "1234567890": 0.5, "1234567891": -0.5 }
An integer between 0 and 20 specifying the number of most likely tokens to return at each token position, each with an associated log probability.
1
The maximum number of tokens to generate in the chat completion
4096
How many chat completion choices to generate for each input message
1
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far
0
If specified, our system will make a best effort to sample deterministically
123
Up to 4 sequences where the API will stop generating further tokens
"json([\"stop\", \"halt\"])"
Whether to stream back partial progress
false
Options for streaming response. Only set this when you set stream: true.
The suffix that comes after a completion of inserted text.
"json(\"\\n\")"
What sampling temperature to use, between 0 and 2
0.7
An alternative to sampling with temperature
1
A unique identifier representing your end-user
"user-1234"
Response
Chat completions
Array of completion choices response
[
{
"text": "This is a test",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
]
The creation time of the request
"2021-01-01T00:00:00.000Z"
The ID of the request
"cmpl-1234567890"
The model used for the request
"meta-llama/Llama-3.3-70B-Instruct"
The object type
"text_completion"
The system fingerprint
"system-fingerprint"
The usage information for the request