Create Batch Inference Jobs

Contents

Create Batch Inference Jobs#

Batch jobs are the core interface for using Materialized Intelligence. Using this API, you can run batch inference jobs on large datasets. Our system is designed to handle very large numbers of prompts at low cost and high speed. If you plan to run more than 1 million prompts or 100 million tokens at a time, please reach out to us at team@materialized.dev so we can help you scale.

POST https://api.materialized.dev/batch-inference#

Run batch inference on a list of prompts.

Parameters:
  • inputs (list, required) – The list of prompts to run inference on.

  • model (str, optional, default=llama-3.1-8b) – The Model ID to use for inference. See Available Models for a list of available models.

  • system_prompt (str, optional, default=None) – A system prompt to use for the inference. Use this parameter to provide consistent, task-specific instructions to the model. See System Prompts for more information.

  • json_schema (object, optional, default=None) – If supplied, a JSON schema that the output must adhere to. Must follow the json-schema.org specification. See Structured Outputs for more information.

  • sampling_params (object, optional, default=None) – If supplied, a dictionary of sampling parameters to use for the inference. See Sampling Parameters for more information.

  • job_priority (int, optional, default=0) – The priority of the job. Currently, only priority 0 and 1 are supported. See Job Priority for more information.

  • dryrun (boolean, optional, default=False) – If True, the API will return cost estimates instead of running inference. See Cost Estimates for more information.

  • random_seed_per_input (boolean, optional, default=False) – If True, a random seed will be generated for each input. This is useful for diversity in outputs.

Request Headers:
Accept:

application/json

Returns:

A job_id that can be used to poll for the status and results of the job.

Basic Example#

import requests

url = "https://api.materialized.dev/batch-inference"

params = {
    "model": "llama-3.1-8b",
    "inputs": [
        "What is the meaning of life?",
        "What is the capital of France?",
        "What is the best way to cook a steak?"
    ]
}
headers = {
    "Authorization": "Key <YOUR_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=params, headers=headers)
results = response.json()