Cost Estimates

Cost Estimates#

A significant benefit to batch inference is decreased costs, as well as transparent pricing. We aim to significantly decrease costs for latency insensitive tasks, and provide a transparent pricing model so you know in advance how much a batch job will cost.

To get the cost estimate for a batch job, set the dryrun parameter to true. Instead of running the inference, the API will return a lower and upper bound on the cost of the job. Dryruns are free, so we recommend setting this parameter to true before running the job to ensure you understand costs beforehand.

The lower bound cost is based on the count of inputs tokens, and the per million-token cost for the model. The upper bound is twice that, estimating that the number of outputs tokens will be the same as the number of inputs tokens.

In the future, we’ll add more sophisticated cost predictions methods, but for now this provides a simple way to understand the cost of a batch job. If you need a more accurate cost estimate, please reach out to us at team@materialized.dev.