Sampling Parameters

Sampling Parameters#

Sampling parameters are a dictionary of parameters that are used to control the sampling process and allow for more customization of outputs. They are passed to the model as a JSON object. We support all sampling parameters supported in vllm’s SamplingParams class. See the vllm documentation for more information.

When sampling parameters are not provided, the default values in vllm will be used, with the following exceptions:

  • temperature = 0.75

  • max_tokens = 1024

  • repetition_penalty = 1.15 (in the case of using a json schema, otherwise 1.0)

When user provided sampling parameters are included, we use only the default vllm values and the user provided values as overrides.

The overrides we use by default are based on testing to increase general quality of outputs, but we recommend experimenting with these parameters to find the best settings for your specific use case.