We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

featured

fp4

128k

$0.09/$0.45 in/out Mtoken

openai/

gpt-oss-120b

text-generation

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

featured

fp4

128k

$0.04/$0.16 in/out Mtoken

openai/

gpt-oss-20b

text-generation

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo cover image

featured

fp4

256k

$0.30/$1.20 in/out Mtoken

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

text-generation

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

featured

16k

$0.30/$1.80 in/out Mtoken

allenai/

olmOCR-7B-0725-FP8

text-generation

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

featured

fp8

128k

$0.55/$2.00 in/out Mtoken

zai-org/

GLM-4.5

text-generation

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

featured

mixed: fp8/fp4

128k

$0.50/$2.00 in/out Mtoken

moonshotai/

Kimi-K2-Instruct

text-generation

Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.

Qwen/Qwen3-235B-A22B-Thinking-2507 cover image

featured

fp8

256k

$0.13/$0.60 in/out Mtoken

Qwen/

Qwen3-235B-A22B-Thinking-2507

text-generation

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

Qwen/Qwen3-Coder-480B-A35B-Instruct cover image

featured

fp8

256k

$0.40/$1.60 in/out Mtoken

Qwen/

Qwen3-Coder-480B-A35B-Instruct

text-generation

featured

fp8

128k

$0.20/$1.10 in/out Mtoken

zai-org/

GLM-4.5-Air

text-generation

mistralai/Voxtral-Small-24B-2507 cover image

featured

bf16

$0.00300 / minute

mistralai/

Voxtral-Small-24B-2507

automatic-speech-recognition

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

mistralai/Voxtral-Mini-3B-2507 cover image

featured

bf16

$0.00100 / minute

mistralai/

Voxtral-Mini-3B-2507

automatic-speech-recognition

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

deepseek-ai/DeepSeek-R1-0528-Turbo cover image

featured

fp4

32k

$1.00/$3.00 in/out Mtoken

deepseek-ai/

DeepSeek-R1-0528-Turbo

text-generation

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

Qwen/Qwen3-235B-A22B-Instruct-2507 cover image

featured

fp8

256k

$0.13/$0.60 in/out Mtoken

Qwen/

Qwen3-235B-A22B-Instruct-2507

text-generation

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

featured

fp8

40k

$0.08/$0.29 in/out Mtoken

Qwen/

Qwen3-30B-A3B

text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

featured

fp8

40k

$0.10/$0.30 in/out Mtoken

Qwen/

Qwen3-32B

text-generation

featured

fp8

40k

$0.06/$0.24 in/out Mtoken

Qwen/

Qwen3-14B

text-generation

featured

fp4

32k

$1.00/$3.00 in/out Mtoken

deepseek-ai/

DeepSeek-V3-0324-Turbo

text-generation

meta-llama/Llama-4-Maverick-17B-128E-Instruct-Turbo cover image

featured

fp8

$0.50 / Mtoken

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

bigcode/

starcoder2-15b

openchat/

openchat_3.5

openai/

whisper-tiny

Featured Models

mistralai/

Mistral-Small-3.2-24B-Instruct-2506

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

meta-llama/

Llama-Guard-4-12B

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

sesame/

csm-1b

microsoft/

phi-4

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales