Python SDK#

The Python SDK provides a Pythonic way to interact with the API. In many prototyping scenarios, you may find it most convenient to use the Python SDK and CLI to interact with Materialized.

See the Installation guide to install the SDK.

Basic Methods#

Setting your API key#

When you initialize the SDK, you can set your API key by calling the set_api_key method. Additionally, you can set your API key by running the mi login command in the CLI.

set_api_key(self, api_key: str)#

Set the API key for the Materialized Intelligence API.

Parameters:
  • api_key (str): The API key to set.

Returns: None

Running batch inference#

infer(self, data, model='llama-3.1-8b', column=None, output_column='inference_result', job_priority=0, json_schema=None, system_prompt=None, dry_run=False)#

Run LLM inference on a large list, table, dataframe, or file.

Parameters:
  • data (Union[List, pd.DataFrame, pl.DataFrame, str]): The data to run inference on.

  • model (str, optional): The model to use for inference. Default is “llama-3.1-8b”.

  • column (str, optional): The column name to use for inference. Required if data is a DataFrame or file path.

  • output_column (str, optional): The column name to store the inference results in if input is a DataFrame. Defaults to “inference_result”.

  • job_priority (int, optional): The priority of the job. Default is 0.

  • json_schema (dict, optional): A JSON schema for the output. Defaults to None.

  • system_prompt (str, optional): A system prompt to add to all inputs. This allows you to define the behavior of the model. Defaults to None.

  • sampling_params (dict, optional): A dictionary of sampling parameters to use for the inference. Defaults to None, which uses the default sampling parameters.

  • random_seed_per_input (bool, optional): If True, a random seed will be generated for each input. This is useful for diversity in outputs. Defaults to False.

  • dry_run (bool, optional): If True, return cost estimates instead of running inference. Default is False.

  • stay_attached (bool, optional): If True, the SDK will stay attached to the job and update you on the status and results as they become available. Default is True for priority 0 jobs, and False for priority 1 jobs.

Returns: Union[List, pd.DataFrame, pl.DataFrame, str]: The results of the inference or job ID.

Getting quotas#

get_quotas(self)#

Get your current quotas.

Returns: list: A list of quotas, one for each priority level. Contains row_quota and token_quota for each priority level.

Job Methods#

Listing jobs#

list_jobs(self)#

List all jobs associated with the API key.

Returns: list: A list of job details.

Getting job status#

get_job_status(self, job_id: str)#

Get the status of a job by its ID.

Parameters:
  • job_id (str): The ID of the job to retrieve the status for.

Returns: dict: The status of the job.

Getting job results#

get_job_results(self, job_id: str)#

Get the results of a job by its ID.

Parameters:
  • job_id (str): The ID of the job to retrieve the results for.

  • include_inputs (bool, optional): Whether to include the inputs in the results. Defaults to False.

  • include_cumulative_logprobs (bool, optional): Whether to include the cumulative logprobs in the results. Defaults to False.

Returns: Union[List, Dict]: The results of the job. If include_inputs is True, the results will be a dictionary with inputs and outputs keys. If include_inputs is False, the results will be a list of outputs, in the same order as the inputs.

Cancelling jobs#

cancel_job(self, job_id: str)#

Cancel a job by its ID.

Parameters:
  • job_id (str): The ID of the job to cancel.

Returns: dict: The status of the job cancellation.

Stage Methods#

Creating a stage#

create_stage(self)#

Create a new internal stage.

Returns: dict: A dictionary containing the stage ID.

Listing all stages#

list_stages(self)#

List all stages.

Returns: list: A list of stage IDs.

Listing all files in a stage#

list_stage_files(self, stage_id: str)#

List all files in a stage.

Parameters:
  • stage_id (str): The ID of the stage to list the files in.

Returns: list: A list of file names in the stage.

Uploading files to a stage#

upload_to_stage(self, stage_id: List[str] | str = None, file_paths: List[str] | str = None)#

Upload files to a stage.

This method uploads files to a stage. Accepts a stage ID and file paths. If only a single parameter is provided, it will be interpreted as the file paths.

Parameters:
  • stage_id (Union[List[str], str], optional): The ID of the stage to upload the files to. If not provided, the files will be uploaded to a new stage.

  • file_paths (Union[List[str], str], optional): A list of file paths to upload.

Returns: list: A list of file names in the stage.

Downloading files from a stage#

download_from_stage(self, stage_id: str, files: List[str] | str = None, output_path: str = None)#

Download a file from a stage.

This method downloads files from a stage. Accepts a stage ID and file name. If no file name is provided, all files in the stage will be downloaded.

Parameters:
  • stage_id (str): The ID of the stage to download the file from.

  • files (Union[List[str], str], optional): The name(s) of the file(s) to download. If not provided, all files in the stage will be downloaded.

  • output_path (str, optional): The directory to save the downloaded files to. If not provided, the files will be saved to the current working directory.

Returns: None