We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

openai/gpt-oss-120b cover image
featured
fp4
128k
$0.09/$0.45 in/out Mtoken
  • text-generation

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

openai/gpt-oss-20b cover image
featured
fp4
128k
$0.04/$0.16 in/out Mtoken
  • text-generation

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo cover image
featured
fp4
256k
$0.30/$1.20 in/out Mtoken
  • text-generation

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

allenai/olmOCR-7B-0725-FP8 cover image
featured
16k
$0.30/$1.80 in/out Mtoken
  • text-generation

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

zai-org/GLM-4.5 cover image
featured
fp8
128k
$0.55/$2.00 in/out Mtoken
  • text-generation

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

moonshotai/Kimi-K2-Instruct cover image
featured
mixed: fp8/fp4
128k
$0.50/$2.00 in/out Mtoken
  • text-generation

Kimi K2 is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks.

Qwen/Qwen3-235B-A22B-Thinking-2507 cover image
featured
fp8
256k
$0.13/$0.60 in/out Mtoken
  • text-generation

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

Qwen/Qwen3-Coder-480B-A35B-Instruct cover image
featured
fp8
256k
$0.40/$1.60 in/out Mtoken
  • text-generation

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

zai-org/GLM-4.5-Air cover image
featured
fp8
128k
$0.20/$1.10 in/out Mtoken
  • text-generation

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

mistralai/Voxtral-Small-24B-2507 cover image
featured
bf16
$0.00300 / minute
  • automatic-speech-recognition

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

mistralai/Voxtral-Mini-3B-2507 cover image
featured
bf16
$0.00100 / minute
  • automatic-speech-recognition

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

deepseek-ai/DeepSeek-R1-0528-Turbo cover image
featured
fp4
32k
$1.00/$3.00 in/out Mtoken
  • text-generation

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

Qwen/Qwen3-235B-A22B-Instruct-2507 cover image
featured
fp8
256k
$0.13/$0.60 in/out Mtoken
  • text-generation

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

Qwen/Qwen3-30B-A3B cover image
featured
fp8
40k
$0.08/$0.29 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Qwen/Qwen3-32B cover image
featured
fp8
40k
$0.10/$0.30 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

Qwen/Qwen3-14B cover image
featured
fp8
40k
$0.06/$0.24 in/out Mtoken
  • text-generation

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

meta-llama/Llama-4-Maverick-17B-128E-Instruct-Turbo cover image
featured
fp8
8k
$0.50 / Mtoken
  • text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.