GLM-5.2 Open Weights Released: 753B MoE Model Nears Claude Opus 4.8 in Coding

After making GLM-5.2 available to GLM Coding Plan subscribers on June 13, Z.ai officially released the full model weights on HuggingFace and ModelScope on June 17 under the MIT license. The release also came with the first official benchmark table, which had been withheld at launch — and the numbers show a significant jump over May's GLM-5.1.

Model Specs

GLM-5.2 is a 753B-parameter mixture-of-experts (MoE) model with 44B active parameters per inference. The context window extends from 5.1's 200K to 1M tokens, with a 128K output limit. The key architectural innovation is IndexShare: every 4 Transformer layers share a lightweight indexer, reducing per-token FLOPs by 2.9x at 1M context length. The MTP (multi-token prediction) layer also sees improvements, boosting speculative decoding acceptance length by 20%.

The model introduces three-tier effort control: Lite, Pro, and Max. Lite provides fast responses for simple tasks, while Max allocates more compute for complex coding work — a step up from the previous binary thinking/standard modes.

Benchmarks: New Open-Source Ceiling

The benchmark table covers three long-horizon coding evaluations and four standard coding benchmarks:

Long-Horizon Coding (Hours-Level)

BenchmarkGLM-5.2Claude Opus 4.8GPT-5.5
FrontierSWE74.475.173.4
PostTrainBench34.3#1Below GLM-5.2
SWE-Marathon13.0#1

FrontierSWE measures a model's ability to complete open-ended technical projects spanning hours. GLM-5.2 trails Opus 4.8 by just 1 point, beats GPT-5.5 by 1 point, and surpasses Opus 4.7 by 11 points. On PostTrainBench, GLM-5.2 ranks second, outperforming both Opus 4.7 and GPT-5.5. SWE-Marathon shows a larger gap — about 13 points behind Opus 4.8 — but GLM-5.2 remains the highest-ranked open-source model.

Standard Coding

BenchmarkGLM-5.2GLM-5.1Claude Opus 4.8
Terminal-Bench 2.181.063.585.0
SWE-bench Pro62.158.469.2
FrontierSWE74.430.575.1
MCP-Atlas76.871.877.8

Terminal-Bench 2.1 shows 81.0 vs 85.0 — a 4-point gap. SWE-bench Pro is 7 points behind. MCP-Atlas is just 1 point apart. For an MIT-licensed model, this is remarkably competitive.

The Pony Alpha Incident

Before the official release, Z.ai anonymously posted GLM-5.2 on OpenRouter under the codename "Pony Alpha." Blind community testing results: 25% guessed it was Claude Sonnet 5, 20% thought it was a new Grok version, and only a few correctly identified GLM-5. This test essentially proved that without brand labels, users rate GLM-5.2's capabilities on par with closed-source frontier models.

API Pricing

GLM-5 is priced at $0.60 (approx. ¥4.08) per million input tokens and $1.92 (approx. ¥13.06) per million output tokens. GLM-5.2's API pricing is expected to be similar. For comparison, Claude Opus 4.8 costs roughly $15 (approx. ¥102) per million input tokens — a 25x difference.

Z.ai also launched ZCode 3.0, providing GLM Coding Plan users with 3 million free tokens per day.

Deployment

Weights are available on HuggingFace (zai-org/GLM-5.2) and ModelScope. The full-precision model requires approximately 1.5TB of GPU memory; quantization lowers the barrier significantly. Supported inference frameworks include SGLang, vLLM, and xLLM.

Competitive Landscape

ModelParametersContextFrontierSWETerminal-BenchOpen Source
GLM-5.2753B MoE1M74.481.0MIT
Claude Opus 4.8Undisclosed1M75.185.0No
GPT-5.5UndisclosedUndisclosed73.4No
Kimi K2.7 CodeUndisclosed256KYes

GLM-5.2 is now the ceiling for open-source coding models. It doesn't surpass Opus 4.8, but it beats GPT-5.5 on FrontierSWE — and it's MIT-licensed. For enterprises that need local deployment, this is the strongest open-source option available today.