Compare chat, image, and video models by composite score and category-specific metrics. Rankings are editorially maintained for reference.
| Rank | Model | Provider | ScoreEditorial composite score; higher ranks higher | Context (K)Context window size (thousand tokens) | Input $Input token price (USD per 1M tokens) | Output $Output token price (USD per 1M tokens) | MMLUMassive Multitask Language Understanding accuracy (%) | HumanEvalCode generation benchmark pass rate (%) | EloArena Elo from human preference battles; higher is stronger |
|---|---|---|---|---|---|---|---|---|---|
| 1 | G GPT-5.5 | — | 98.0 | 1050 | 5 | 30 | 91.50 | 91.40 | 1420 |
| 2 | C Claude Fable 5 Anthropic's most capable model, positioned above Opus tier. Public Mythos-class model with 1M context, scoring >10% higher than Claude Opus 4.8 on key benchmarks. Adaptive thinking only. | Anthropic | 97.0 | 1000 | 10 | 50 | — | — | — |
| 3 | G Gemini 3.5 Pro | Google DeepMind | 96.0 | 1000 | 1.50 | 9 | 91 | 89.50 | 1400 |
| 4 | C Claude Opus 4.8 Anthropic 最新旗舰推理模型 | Anthropic | 95.0 | 1000 | 5 | 25 | 0 | — | 0 |
| 5 | C Claude Opus 4.7 | — | 93.0 | 1000 | 5 | 25 | 91 | 92.50 | 1400 |
| 6 | G GLM-5.2 开源推理模型新标杆,MIT 许可,AA Intelligence Index 51 分位居开源模型榜首。MoE 架构 753B 总参数/40B 活跃参数,1M 上下文。GDPval-AA v2 得分 1524,与 GPT-5.5(1514)持平。科学推理能力突出:GPQA Diamond 89%,HLE 40%。 | Z.ai (Zhipu AI) | 91.0 | 1000 | 1.40 | 4.40 | — | — | — |
| 7 | G GPT-5.4 Pro OpenAI最强推理模型,1M+上下文,已解决前沿数学难题(Ramsey超图、Erdős问题) | OpenAI | 90.0 | 1050 | 30 | 180 | — | — | — |
| 8 | G Gemini 3.5 Flash | — | 90.0 | 1049 | 1.50 | 9 | 92.30 | 86.80 | 1370 |
| 9 | G GPT-5.5 Instant | OpenAI | 88.0 | 922 | 0.75 | 3 | 89.50 | 88.20 | 1350 |
| 10 | D DeepSeek V4 Pro 深度推理模型,MIT 开源许可,1M 上下文窗口,MoE 架构 1.6T 总参数/49B 活跃参数。AA Intelligence Index 44 分,仅次于 GLM-5.2 的开源模型第二名。缓存命中价格极低($0.004/M tokens)。 | DeepSeek | 87.0 | 1000 | 0.43 | 0.87 | — | — | — |
| 11 | K Kimi K2.7 Code 1T MoE 编程专用模型,256K 上下文,Modified MIT 开源,推理 token 消耗降低 30% | Moonshot AI | 85.0 | 256 | 0.74 | 3.50 | — | — | — |
| 12 | Q Qwen3.7 Max | — | 85.0 | 1000 | 1.25 | 3.75 | 87 | 87 | 1300 |
| 13 | G Gemini 3.1 Pro | — | 85.0 | 1049 | 2 | 12 | 87.50 | 85 | 1300 |
| 14 | D DeepSeek V4 Flash 高性价比推理模型,MIT 开源许可,1M 上下文窗口,MoE 架构 284B 总参数/13B 活跃参数。AA Intelligence Index 40 分,输出价格仅 $0.28/M tokens,缓存命中 $0.003/M tokens,极致性价比。 | DeepSeek | 83.0 | 1000 | 0.09 | 0.18 | — | — | — |
| 15 | C Cursor Composer 2.5 | Cursor | 82.0 | 256 | 0 | 0 | 85 | 86 | 1260 |
| 16 | K Kimi K2.6 | — | 82.0 | 262 | 0.68 | 3.42 | 85.50 | 84.50 | 1280 |
| 17 | G GPT-5.4 | — | 82.0 | 1050 | 2.50 | 15 | 88.20 | 87.50 | 1320 |
| 18 | G Grok 4.20 xAI推理模型,2M上下文,最低幻觉率,支持Agent工具调用 | xAI | 81.0 | 2000 | 1.25 | 2.50 | — | — | — |
| 19 | C Claude Sonnet 4.6 | Anthropic | 80.0 | 1000 | 3 | 15 | 86.50 | 88 | 1280 |
| 20 | M MiniMax M3 First open-weights model combining frontier coding, 1M context, and native multimodality. MSA sparse attention architecture. SWE-Bench Pro 59.0%, TerminalBench 66.0%. Aggressively priced. | MiniMax | 80.0 | 1000 | 0.30 | 1.20 | — | — | — |
| 21 | W Windsurf SWE-1.6 | Windsurf (Codeium) | 80.0 | 200 | 0 | 0 | 0 | 0 | 0 |
| 22 | G Grok 4.3 | — | 80.0 | 1000 | 1.25 | 2.50 | 86 | 85 | 1270 |
| 23 | M MiMo-V2.5 Pro | Xiaomi | 78.0 | 1000 | 0.44 | 0.88 | 85 | 84 | 1260 |
| 24 | Q Qwen3.6 Plus | — | 76.0 | 1000 | 0.33 | 1.95 | 84 | 84 | 1250 |
| 25 | G GPT-4o | OpenAI | 75.0 | 128 | 2.50 | 10 | 88.70 | 90.20 | 1287 |
| 26 | Q Qwen3.7 Plus 阿里通义千问3.7系列性价比模型,1M上下文,支持多模态Agent | Alibaba (Qwen) | 75.0 | 1000 | 0.32 | 1.28 | — | — | — |
| 27 | G GLM-5.1 | 智谱AI (Zhipu) | 75.0 | 200 | 0.40 | 1.20 | 83 | 82 | 1240 |
| 28 | C Cursor Composer 2 | Cursor | 72.0 | 256 | 0 | 0 | 82 | 82 | 1220 |
| 29 | M MiMo-V2.5 | — | 72.0 | 1049 | 0.15 | 0.29 | — | — | — |
| 30 | M MiniMax-M2.7 | — | 72.0 | 205 | 0.28 | 1.20 | 82 | 81 | 1220 |
| 31 | K Kimi K2.5 | — | 72.0 | 262 | 0.40 | 1.90 | 82 | 82 | 1220 |
| 32 | G Gemini 3 Flash | Google DeepMind | 70.0 | 1000 | 0.15 | 0.60 | 82 | 80.50 | 1220 |
| 33 | G GLM-5 | 智谱AI (Zhipu) | 70.0 | 200 | 0.30 | 0.90 | 81 | 79 | 1210 |
| 34 | Q Qwen3.5 397B | Alibaba (Qwen) | 68.0 | 262 | 0.45 | 1.35 | 80.50 | 80.50 | 1200 |
| 35 | G GPT-5.4 Mini GPT-5.4高效变体,400K上下文,优化高吞吐场景 | OpenAI | 67.0 | 400 | 0.75 | 4.50 | — | — | — |
| 36 | Q Qwen3 Coder 480B A35B Qwen's most powerful open-source coding model. 480B MoE with 35B active params, native 256K context (YaRN scalable to 1M). Strong SWE-Bench performance. Apache 2.0 licensed. Ships with Qwen Code CLI. | Alibaba (Qwen) | 66.0 | 256 | 0.22 | 1.80 | — | — | — |
| 37 | G Gemini 2.5 Pro | Google DeepMind | 65.0 | 1000 | 0.35 | 1.40 | 80.50 | 78 | 1180 |
| 38 | G Grok 3 | xAI | 65.0 | 1000 | 0.15 | 0.60 | 80 | 80 | 1180 |
| 39 | H Hunyuan Hy3 Preview | Tencent Hunyuan | 65.0 | 256 | 0.06 | 0.18 | 79 | 78 | 1180 |
| 40 | C Claude 4.5 Haiku | Anthropic | 60.0 | 200 | 0.80 | 4 | 78 | 75 | 1150 |
| 41 | S Step 3.7 Flash StepFun's latest multimodal MoE model with 196B parameter language backbone and vision encoder for native image/video understanding | StepFun | 60.0 | 256 | 0.20 | 1.15 | 78 | 75 | — |
| 42 | G GPT-5.4 Nano GPT-5.4最轻量变体,400K上下文,极速低成本 | OpenAI | 58.0 | 400 | 0.20 | 1.25 | — | — | — |
| 43 | C Cursor Composer 1.5 | Cursor | 58.0 | 200 | 0 | 0 | 76 | 74 | 1150 |
| 44 | D DeepSeek R1 | DeepSeek | 55.0 | 128 | 0.55 | 2.19 | 78.50 | 78.50 | 1100 |
| 45 | N Nova 2.0 Pro | Amazon | 55.0 | 256 | 0.80 | 3.20 | 76 | 72 | 1120 |
| 46 | N Nemotron 3 Super | NVIDIA | 55.0 | 1000 | 0.14 | 0.42 | 76 | 74 | 1120 |
| 47 | M Mistral Medium 3.5 Mistral's 128B dense instruction model supporting text+image input, optimized for agentic workflows, coding, and complex reasoning | Mistral AI | 55.0 | 256 | 1.50 | 7.50 | 76 | 72 | — |
| 48 | S Step 3.5 Flash | StepFun (阶跃星辰) | 55.0 | 256 | 0.03 | 0.09 | 75 | 72 | 1100 |
| 49 | D Doubao Seed Code | 字节跳动 (ByteDance) | 55.0 | 256 | 0.10 | 0.30 | 76 | 74 | 1120 |
| 50 | M Mistral Large 3 | Mistral AI | 50.0 | 256 | 0.30 | 0.90 | 75 | 70 | 1100 |
| 51 | G Grok Build 0.1 xAI's coding-focused model trained for agentic software engineering workflows, supports text+image input | xAI | 50.0 | 256 | 1 | 2 | — | — | — |
| 52 | C Command A+ | Cohere | 48.0 | 128 | 0 | 0 | 0 | — | 0 |
| 53 | L Llama 4 Maverick | Meta | 45.0 | 1000 | 0.17 | 0.50 | 72 | 72 | 1080 |
| 54 | E ERNIE 5.0 Thinking | 百度 (Baidu) | 45.0 | 128 | 0.25 | 0.75 | 70 | 68 | 1050 |
| 55 | L Llama 4 Scout | Meta | 35.0 | 10000 | 0.11 | 0.33 | 65 | 65 | 1000 |
| 56 | M Mistral Small 4 | Mistral AI | 35.0 | 256 | 0.10 | 0.30 | 65 | 62 | 980 |
| 57 | C Command A | Cohere | 35.0 | 256 | 1.50 | 4.50 | 62 | 60 | 970 |
| 58 | P Phi-4 | Microsoft | 30.0 | 16 | 0.08 | 0.24 | 60 | 65 | 950 |
| 59 | J Jamba 1.7 Large | AI21 Labs | 30.0 | 256 | 1.30 | 3.90 | 58 | 60 | 930 |