AI model leaderboard · LinkWord

Compare chat, image, and video models by composite score and category-specific metrics. Rankings are editorially maintained for reference.

Sort by

Rank	Model	Provider	ScoreEditorial composite score; higher ranks higher	Context (K)Context window size (thousand tokens)	Input $Input token price (USD per 1M tokens)	Output $Output token price (USD per 1M tokens)	MMLUMassive Multitask Language Understanding accuracy (%)	HumanEvalCode generation benchmark pass rate (%)	EloArena Elo from human preference battles; higher is stronger
1	G GPT-5.5	—	98.0	1050	5	30	91.50	91.40	1420
2	C Claude Fable 5 Anthropic's most capable model, positioned above Opus tier. Public Mythos-class model with 1M context, scoring >10% higher than Claude Opus 4.8 on key benchmarks. Adaptive thinking only.	Anthropic	97.0	1000	10	50	—	—	—
3	G Gemini 3.5 Pro	Google DeepMind	96.0	1000	1.50	9	91	89.50	1400
4	C Claude Opus 4.8 Anthropic 最新旗舰推理模型	Anthropic	95.0	1000	5	25	0	—	0
5	C Claude Opus 4.7	—	93.0	1000	5	25	91	92.50	1400
6	G GLM-5.2 开源推理模型新标杆，MIT 许可，AA Intelligence Index 51 分位居开源模型榜首。MoE 架构 753B 总参数/40B 活跃参数，1M 上下文。GDPval-AA v2 得分 1524，与 GPT-5.5（1514）持平。科学推理能力突出：GPQA Diamond 89%，HLE 40%。	Z.ai (Zhipu AI)	91.0	1000	1.40	4.40	—	—	—
7	G GPT-5.4 Pro OpenAI最强推理模型，1M+上下文，已解决前沿数学难题（Ramsey超图、Erdős问题）	OpenAI	90.0	1050	30	180	—	—	—
8	G Gemini 3.5 Flash	—	90.0	1049	1.50	9	92.30	86.80	1370
9	G GPT-5.5 Instant	OpenAI	88.0	922	0.75	3	89.50	88.20	1350
10	D DeepSeek V4 Pro 深度推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 1.6T 总参数/49B 活跃参数。AA Intelligence Index 44 分，仅次于 GLM-5.2 的开源模型第二名。缓存命中价格极低（$0.004/M tokens）。	DeepSeek	87.0	1000	0.43	0.87	—	—	—
11	K Kimi K2.7 Code 1T MoE 编程专用模型，256K 上下文，Modified MIT 开源，推理 token 消耗降低 30%	Moonshot AI	85.0	256	0.74	3.50	—	—	—
12	Q Qwen3.7 Max	—	85.0	1000	1.25	3.75	87	87	1300
13	G Gemini 3.1 Pro	—	85.0	1049	2	12	87.50	85	1300
14	D DeepSeek V4 Flash 高性价比推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 284B 总参数/13B 活跃参数。AA Intelligence Index 40 分，输出价格仅 $0.28/M tokens，缓存命中 $0.003/M tokens，极致性价比。	DeepSeek	83.0	1000	0.09	0.18	—	—	—
15	C Cursor Composer 2.5	Cursor	82.0	256	0	0	85	86	1260
16	K Kimi K2.6	—	82.0	262	0.68	3.42	85.50	84.50	1280
17	G GPT-5.4	—	82.0	1050	2.50	15	88.20	87.50	1320
18	G Grok 4.20 xAI推理模型，2M上下文，最低幻觉率，支持Agent工具调用	xAI	81.0	2000	1.25	2.50	—	—	—
19	C Claude Sonnet 4.6	Anthropic	80.0	1000	3	15	86.50	88	1280
20	M MiniMax M3 First open-weights model combining frontier coding, 1M context, and native multimodality. MSA sparse attention architecture. SWE-Bench Pro 59.0%, TerminalBench 66.0%. Aggressively priced.	MiniMax	80.0	1000	0.30	1.20	—	—	—
21	W Windsurf SWE-1.6	Windsurf (Codeium)	80.0	200	0	0	0	0	0
22	G Grok 4.3	—	80.0	1000	1.25	2.50	86	85	1270
23	M MiMo-V2.5 Pro	Xiaomi	78.0	1000	0.44	0.88	85	84	1260
24	Q Qwen3.6 Plus	—	76.0	1000	0.33	1.95	84	84	1250
25	G GPT-4o	OpenAI	75.0	128	2.50	10	88.70	90.20	1287
26	Q Qwen3.7 Plus 阿里通义千问3.7系列性价比模型，1M上下文，支持多模态Agent	Alibaba (Qwen)	75.0	1000	0.32	1.28	—	—	—
27	G GLM-5.1	智谱AI (Zhipu)	75.0	200	0.40	1.20	83	82	1240
28	C Cursor Composer 2	Cursor	72.0	256	0	0	82	82	1220
29	M MiMo-V2.5	—	72.0	1049	0.15	0.29	—	—	—
30	M MiniMax-M2.7	—	72.0	205	0.28	1.20	82	81	1220
31	K Kimi K2.5	—	72.0	262	0.40	1.90	82	82	1220
32	G Gemini 3 Flash	Google DeepMind	70.0	1000	0.15	0.60	82	80.50	1220
33	G GLM-5	智谱AI (Zhipu)	70.0	200	0.30	0.90	81	79	1210
34	Q Qwen3.5 397B	Alibaba (Qwen)	68.0	262	0.45	1.35	80.50	80.50	1200
35	G GPT-5.4 Mini GPT-5.4高效变体，400K上下文，优化高吞吐场景	OpenAI	67.0	400	0.75	4.50	—	—	—
36	Q Qwen3 Coder 480B A35B Qwen's most powerful open-source coding model. 480B MoE with 35B active params, native 256K context (YaRN scalable to 1M). Strong SWE-Bench performance. Apache 2.0 licensed. Ships with Qwen Code CLI.	Alibaba (Qwen)	66.0	256	0.22	1.80	—	—	—
37	G Gemini 2.5 Pro	Google DeepMind	65.0	1000	0.35	1.40	80.50	78	1180
38	G Grok 3	xAI	65.0	1000	0.15	0.60	80	80	1180
39	H Hunyuan Hy3 Preview	Tencent Hunyuan	65.0	256	0.06	0.18	79	78	1180
40	C Claude 4.5 Haiku	Anthropic	60.0	200	0.80	4	78	75	1150
41	S Step 3.7 Flash StepFun's latest multimodal MoE model with 196B parameter language backbone and vision encoder for native image/video understanding	StepFun	60.0	256	0.20	1.15	78	75	—
42	G GPT-5.4 Nano GPT-5.4最轻量变体，400K上下文，极速低成本	OpenAI	58.0	400	0.20	1.25	—	—	—
43	C Cursor Composer 1.5	Cursor	58.0	200	0	0	76	74	1150
44	D DeepSeek R1	DeepSeek	55.0	128	0.55	2.19	78.50	78.50	1100
45	N Nova 2.0 Pro	Amazon	55.0	256	0.80	3.20	76	72	1120
46	N Nemotron 3 Super	NVIDIA	55.0	1000	0.14	0.42	76	74	1120
47	M Mistral Medium 3.5 Mistral's 128B dense instruction model supporting text+image input, optimized for agentic workflows, coding, and complex reasoning	Mistral AI	55.0	256	1.50	7.50	76	72	—
48	S Step 3.5 Flash	StepFun (阶跃星辰)	55.0	256	0.03	0.09	75	72	1100
49	D Doubao Seed Code	字节跳动 (ByteDance)	55.0	256	0.10	0.30	76	74	1120
50	M Mistral Large 3	Mistral AI	50.0	256	0.30	0.90	75	70	1100
51	G Grok Build 0.1 xAI's coding-focused model trained for agentic software engineering workflows, supports text+image input	xAI	50.0	256	1	2	—	—	—
52	C Command A+	Cohere	48.0	128	0	0	0	—	0
53	L Llama 4 Maverick	Meta	45.0	1000	0.17	0.50	72	72	1080
54	E ERNIE 5.0 Thinking	百度 (Baidu)	45.0	128	0.25	0.75	70	68	1050
55	L Llama 4 Scout	Meta	35.0	10000	0.11	0.33	65	65	1000
56	M Mistral Small 4	Mistral AI	35.0	256	0.10	0.30	65	62	980
57	C Command A	Cohere	35.0	256	1.50	4.50	62	60	970
58	P Phi-4	Microsoft	30.0	16	0.08	0.24	60	65	950
59	J Jamba 1.7 Large	AI21 Labs	30.0	256	1.30	3.90	58	60	930

Back to home

AI model leaderboard · LinkWord

Compare chat, image, and video models by composite score and category-specific metrics. Rankings are editorially maintained for reference.

Sort by

Rank	Model	Provider	ScoreEditorial composite score; higher ranks higher	Context (K)Context window size (thousand tokens)	Input $Input token price (USD per 1M tokens)	Output $Output token price (USD per 1M tokens)	MMLUMassive Multitask Language Understanding accuracy (%)	HumanEvalCode generation benchmark pass rate (%)	EloArena Elo from human preference battles; higher is stronger
1	G GPT-5.5	—	98.0	1050	5	30	91.50	91.40	1420
2	C Claude Fable 5 Anthropic's most capable model, positioned above Opus tier. Public Mythos-class model with 1M context, scoring >10% higher than Claude Opus 4.8 on key benchmarks. Adaptive thinking only.	Anthropic	97.0	1000	10	50	—	—	—
3	G Gemini 3.5 Pro	Google DeepMind	96.0	1000	1.50	9	91	89.50	1400
4	C Claude Opus 4.8 Anthropic 最新旗舰推理模型	Anthropic	95.0	1000	5	25	0	—	0
5	C Claude Opus 4.7	—	93.0	1000	5	25	91	92.50	1400
6	G GLM-5.2 开源推理模型新标杆，MIT 许可，AA Intelligence Index 51 分位居开源模型榜首。MoE 架构 753B 总参数/40B 活跃参数，1M 上下文。GDPval-AA v2 得分 1524，与 GPT-5.5（1514）持平。科学推理能力突出：GPQA Diamond 89%，HLE 40%。	Z.ai (Zhipu AI)	91.0	1000	1.40	4.40	—	—	—
7	G GPT-5.4 Pro OpenAI最强推理模型，1M+上下文，已解决前沿数学难题（Ramsey超图、Erdős问题）	OpenAI	90.0	1050	30	180	—	—	—
8	G Gemini 3.5 Flash	—	90.0	1049	1.50	9	92.30	86.80	1370
9	G GPT-5.5 Instant	OpenAI	88.0	922	0.75	3	89.50	88.20	1350
10	D DeepSeek V4 Pro 深度推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 1.6T 总参数/49B 活跃参数。AA Intelligence Index 44 分，仅次于 GLM-5.2 的开源模型第二名。缓存命中价格极低（$0.004/M tokens）。	DeepSeek	87.0	1000	0.43	0.87	—	—	—
11	K Kimi K2.7 Code 1T MoE 编程专用模型，256K 上下文，Modified MIT 开源，推理 token 消耗降低 30%	Moonshot AI	85.0	256	0.74	3.50	—	—	—
12	Q Qwen3.7 Max	—	85.0	1000	1.25	3.75	87	87	1300
13	G Gemini 3.1 Pro	—	85.0	1049	2	12	87.50	85	1300
14	D DeepSeek V4 Flash 高性价比推理模型，MIT 开源许可，1M 上下文窗口，MoE 架构 284B 总参数/13B 活跃参数。AA Intelligence Index 40 分，输出价格仅 $0.28/M tokens，缓存命中 $0.003/M tokens，极致性价比。	DeepSeek	83.0	1000	0.09	0.18	—	—	—
15	C Cursor Composer 2.5	Cursor	82.0	256	0	0	85	86	1260
16	K Kimi K2.6	—	82.0	262	0.68	3.42	85.50	84.50	1280
17	G GPT-5.4	—	82.0	1050	2.50	15	88.20	87.50	1320
18	G Grok 4.20 xAI推理模型，2M上下文，最低幻觉率，支持Agent工具调用	xAI	81.0	2000	1.25	2.50	—	—	—
19	C Claude Sonnet 4.6	Anthropic	80.0	1000	3	15	86.50	88	1280
20	M MiniMax M3 First open-weights model combining frontier coding, 1M context, and native multimodality. MSA sparse attention architecture. SWE-Bench Pro 59.0%, TerminalBench 66.0%. Aggressively priced.	MiniMax	80.0	1000	0.30	1.20	—	—	—
21	W Windsurf SWE-1.6	Windsurf (Codeium)	80.0	200	0	0	0	0	0
22	G Grok 4.3	—	80.0	1000	1.25	2.50	86	85	1270
23	M MiMo-V2.5 Pro	Xiaomi	78.0	1000	0.44	0.88	85	84	1260
24	Q Qwen3.6 Plus	—	76.0	1000	0.33	1.95	84	84	1250
25	G GPT-4o	OpenAI	75.0	128	2.50	10	88.70	90.20	1287
26	Q Qwen3.7 Plus 阿里通义千问3.7系列性价比模型，1M上下文，支持多模态Agent	Alibaba (Qwen)	75.0	1000	0.32	1.28	—	—	—
27	G GLM-5.1	智谱AI (Zhipu)	75.0	200	0.40	1.20	83	82	1240
28	C Cursor Composer 2	Cursor	72.0	256	0	0	82	82	1220
29	M MiMo-V2.5	—	72.0	1049	0.15	0.29	—	—	—
30	M MiniMax-M2.7	—	72.0	205	0.28	1.20	82	81	1220
31	K Kimi K2.5	—	72.0	262	0.40	1.90	82	82	1220
32	G Gemini 3 Flash	Google DeepMind	70.0	1000	0.15	0.60	82	80.50	1220
33	G GLM-5	智谱AI (Zhipu)	70.0	200	0.30	0.90	81	79	1210
34	Q Qwen3.5 397B	Alibaba (Qwen)	68.0	262	0.45	1.35	80.50	80.50	1200
35	G GPT-5.4 Mini GPT-5.4高效变体，400K上下文，优化高吞吐场景	OpenAI	67.0	400	0.75	4.50	—	—	—
36	Q Qwen3 Coder 480B A35B Qwen's most powerful open-source coding model. 480B MoE with 35B active params, native 256K context (YaRN scalable to 1M). Strong SWE-Bench performance. Apache 2.0 licensed. Ships with Qwen Code CLI.	Alibaba (Qwen)	66.0	256	0.22	1.80	—	—	—
37	G Gemini 2.5 Pro	Google DeepMind	65.0	1000	0.35	1.40	80.50	78	1180
38	G Grok 3	xAI	65.0	1000	0.15	0.60	80	80	1180
39	H Hunyuan Hy3 Preview	Tencent Hunyuan	65.0	256	0.06	0.18	79	78	1180
40	C Claude 4.5 Haiku	Anthropic	60.0	200	0.80	4	78	75	1150
41	S Step 3.7 Flash StepFun's latest multimodal MoE model with 196B parameter language backbone and vision encoder for native image/video understanding	StepFun	60.0	256	0.20	1.15	78	75	—
42	G GPT-5.4 Nano GPT-5.4最轻量变体，400K上下文，极速低成本	OpenAI	58.0	400	0.20	1.25	—	—	—
43	C Cursor Composer 1.5	Cursor	58.0	200	0	0	76	74	1150
44	D DeepSeek R1	DeepSeek	55.0	128	0.55	2.19	78.50	78.50	1100
45	N Nova 2.0 Pro	Amazon	55.0	256	0.80	3.20	76	72	1120
46	N Nemotron 3 Super	NVIDIA	55.0	1000	0.14	0.42	76	74	1120
47	M Mistral Medium 3.5 Mistral's 128B dense instruction model supporting text+image input, optimized for agentic workflows, coding, and complex reasoning	Mistral AI	55.0	256	1.50	7.50	76	72	—
48	S Step 3.5 Flash	StepFun (阶跃星辰)	55.0	256	0.03	0.09	75	72	1100
49	D Doubao Seed Code	字节跳动 (ByteDance)	55.0	256	0.10	0.30	76	74	1120
50	M Mistral Large 3	Mistral AI	50.0	256	0.30	0.90	75	70	1100
51	G Grok Build 0.1 xAI's coding-focused model trained for agentic software engineering workflows, supports text+image input	xAI	50.0	256	1	2	—	—	—
52	C Command A+	Cohere	48.0	128	0	0	0	—	0
53	L Llama 4 Maverick	Meta	45.0	1000	0.17	0.50	72	72	1080
54	E ERNIE 5.0 Thinking	百度 (Baidu)	45.0	128	0.25	0.75	70	68	1050
55	L Llama 4 Scout	Meta	35.0	10000	0.11	0.33	65	65	1000
56	M Mistral Small 4	Mistral AI	35.0	256	0.10	0.30	65	62	980
57	C Command A	Cohere	35.0	256	1.50	4.50	62	60	970
58	P Phi-4	Microsoft	30.0	16	0.08	0.24	60	65	950
59	J Jamba 1.7 Large	AI21 Labs	30.0	256	1.30	3.90	58	60	930