Anfloyanfloy.
+
+ Book
AI Engineering

How to Run OpenClaw and Production Agents on GLM-5.2 (and Cut Your API Bill ~65%)

Your coding tools stay on their subscription - that's what you build with. GLM-5.2 is for the agents you run in production: OpenClaw and your own, routing ~70-80% to GLM and the hard rest to a smarter model. Cut your API bill ~65%.

By Dima Bilous, FounderJun 22, 202610 min read
On this page

If your AI agents run on an API in production, most of what they do all day does not need a frontier model. You build those agents with coding tools like Claude Code, Cursor, and Codex - on your own subscription - but once they ship and run on the API, every task is burning Opus-priced credits for work a cheaper model handles just as well.

This week I moved my production default to GLM-5.2, the new open-weight model from Z.ai. It took about 10 minutes. This is the full step-by-step: how to run OpenClaw and your own production agents on GLM-5.2 through OpenRouter, routing the easy 70-80% to GLM and keeping Claude Opus 4.8 one line away for the hard rest.

The short version

  • Two buckets. Your coding tools - Claude Code, Cursor, Codex - run on your own subscription; that is what you build with, and you leave them alone. GLM-5.2 is for the agents you run in production.
  • OpenClaw and your shipped agents are the target. They run on API credits all day, so that is where switching off a frontier model cuts the bill. OpenClaw especially - it is metered on every action.
  • It is a routing setup. Send the easy 70-80% of an agent's calls to GLM-5.2 and route the hard rest to a smarter model. One OpenRouter key does both.
  • The savings: GLM-5.2 is roughly a sixth of the price of the closed frontier models - about 60-65% off your blended production bill.

What is GLM-5.2?

GLM-5.2 is an open-weight large language model from Z.ai (Zhipu), released in June 2026. It is MIT licensed, ships a 1 million-token context window, and was built for long-horizon coding and agent work. Under the hood it is a roughly 750-billion-parameter mixture-of-experts model with about 40 billion parameters active per token, so you get frontier-class reasoning at a fraction of the inference cost.

"Open weight" means the model is freely available, but you do not have to host it yourself (that needs serious GPUs). You call it through a hosted API like OpenRouter or Z.ai and pay a small per-use fee, far below frontier prices. On independent coding benchmarks GLM-5.2 scores around 62 on SWE-bench Pro and around 81 on Terminal-Bench 2.1, landing just behind Claude Opus 4.8 and at the top of the open-weight pack.

GLM-5.2 vs Claude (Opus 4.8 and Sonnet 4.6)

This is the comparison that matters for your bill. GLM-5.2 sits between Claude Sonnet 4.6 and Claude Opus 4.8 on capability for most agent work, while costing less than either. Here is how the per-token pricing stacks up:

ModelInput / 1M tokensOutput / 1M tokens
GLM-5.2 (Z.ai)$1.40$4.40
GLM-5.2 (OpenRouter)~$1.00~$4.00
Claude Sonnet 4.6$3.00$15.00
Claude Opus 4.8$5.00$25.00

Read that bottom-up: GLM-5.2 output tokens are roughly 3.4x cheaper than Sonnet 4.6 and about 5.7x cheaper than Opus 4.8. For agent work, where output and tool-call volume dominate, that gap is most of your bill. The trade is small: on the hardest reasoning and longest-horizon tasks, Opus 4.8 still wins. So you default to GLM-5.2 and route only the hard calls to a smarter model.

GLM-5.2 vs GPT-5.5

GLM-5.2 also went straight at OpenAI's frontier. On several multi-hour, long-horizon coding benchmarks (FrontierSWE, PostTrainBench, and SWE-Marathon among them) GLM-5.2 outscores GPT-5.5 - at roughly one-sixth of GPT-5.5's cost. For an open-weight model you can run through any provider, that is the headline: near-frontier performance without frontier pricing or lock-in.

Build tools vs. production agents: where GLM-5.2 fits

Here is the distinction most people get backwards, and it is the whole point. Your coding tools - Claude Code, Cursor, and Codex - are what you *build* agents with, and they run on the subscription you already pay for. You do not route those through GLM-5.2. There is nothing to save there (a flat subscription is not metered per token), and you would only degrade the tools you actually build with.

GLM-5.2 is for the other side: the agents you *run* in production. That means OpenClaw and any agent you wrote in code that hits an API per request all day - enriching leads, triaging an inbox, calling tools in a loop. That metered, per-token work is your entire bill, and it is exactly what you move onto GLM-5.2.

So there are two separate setups, and this guide walks both: pointing OpenClaw at GLM-5.2 (Part 3), and the routing setup for the agents you built in code (Part 4). Same goal in both - GLM-5.2 handles the routine 70-80%, and only the hard calls go to a smarter model.

What you will need

  • An OpenRouter account (one key reaches GLM-5.2 and almost every other model).
  • For OpenClaw: a server to run it on. Any VPS works (this guide uses Hostinger, but any Docker host is fine).
  • About 10 to 15 minutes.

Part 1: Get an OpenRouter API key

OpenRouter is a single gateway that lets one API key reach almost any model, including GLM-5.2. It is OpenAI-compatible, so most agents and tools connect by just changing a base URL and key.

  • Go to openrouter.ai and sign in (Google or email).
  • Add a small balance: openrouter.ai/credits, then Manage Billing, add $5 to $10. GLM-5.2 is a paid model, and a balance also raises your rate limit.
  • Create the key: profile icon, then Keys, then Create Key. Name it something like agents. Optionally set a credit limit on the key so it can never overspend.
  • Copy the key (it starts with sk-or-...). OpenRouter shows it once.

That one key is all you need for both setups below. On OpenRouter, GLM-5.2's model name is z-ai/glm-5.2.

Part 2: Set up OpenClaw (skip if you already have it)

OpenClaw is an open-source, self-hosted AI agent you run in production - and it is model-agnostic, so you plug in whatever model you want, which makes it a perfect fit for GLM-5.2. If you already run it, jump to Part 3. The fastest path is a one-click install on a VPS:

  • In your VPS provider's panel (Hostinger has this under VPS, then Docker Manager / App marketplace), deploy the OpenClaw template.
  • During setup it generates a gateway token. Save it - that is how you log into the OpenClaw dashboard.
  • Once it shows "Running," open the app to confirm it loads.

OpenClaw stores its settings in a single file, openclaw.json. On a Docker install it lives on the server, commonly under the project's data folder, for example /docker/<project>/data/.openclaw/openclaw.json.

Part 3: Point OpenClaw at GLM-5.2

Setup one of two. You add OpenRouter as a model provider and set GLM-5.2 as the default. You will edit openclaw.json on the server and restart the container.

Step 3.1 - Back up the config first. Open a terminal on your server (your VPS panel has a Terminal button) and run:

bash
cp /docker/<project>/data/.openclaw/openclaw.json /tmp/openclaw.json.bak

# Not sure of the path? Find it with:
find / -name openclaw.json 2>/dev/null

Step 3.2 - Add the OpenRouter provider and set GLM-5.2 as default. The cleanest way is a tiny script so you do not hand-edit JSON and risk breaking your gateway token. Set your key first, then run the script:

terminal
export OPENROUTER_API_KEY='sk-or-your-key-here'
python3 - <<'PY'
import json, os
p = "/docker/<project>/data/.openclaw/openclaw.json"   # <-- your path
d = json.load(open(p))
d.setdefault("models", {})["mode"] = "merge"
d["models"].setdefault("providers", {})["openrouter"] = {
    "baseUrl": "https://openrouter.ai/api/v1",
    "apiKey": os.environ["OPENROUTER_API_KEY"],
    "api": "openai-completions",
    "models": [{"id": "z-ai/glm-5.2", "name": "GLM-5.2"}],
}
dd = d.setdefault("agents", {}).setdefault("defaults", {})
dd["model"] = {"primary": "openrouter/z-ai/glm-5.2"}
dd["models"] = {"openrouter/z-ai/glm-5.2": {"alias": "GLM-5.2"}}
json.dump(d, open(p, "w"), indent=2)
print("OK: default = openrouter/z-ai/glm-5.2")
PY

Two details that trip people up: each model entry must have a name (a string), not just an id, or OpenClaw rejects the config. And if your web terminal mangles long pastes, build the script from short echo '...' >> /tmp/fix.py lines instead, then run python3 /tmp/fix.py - short lines paste cleanly.

Step 3.3 - Restart OpenClaw:

bash
docker restart $(docker ps -q --filter name=openclaw)

Step 3.4 - Confirm it booted on GLM-5.2:

bash
docker logs --tail 40 $(docker ps -q --filter name=openclaw) 2>&1 | grep -i "agent model"

You want to see agent model: openrouter/z-ai/glm-5.2. Then open the OpenClaw web app, send "hi," and it answers on GLM-5.2. OpenClaw now runs its whole loop on GLM-5.2 instead of a frontier model.

Part 4: The routing setup for your own production agents

Setup two of two, and this is where the 70-80% routing lives. If you built your agents in code, the switch is one base URL and one model name. Point your OpenAI-compatible client at OpenRouter, default the model to z-ai/glm-5.2, and add a single flag that routes the genuinely hard calls up to a smarter model.

agent.py
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

def run(prompt, important=False):
    # GLM-5.2 by default; route only the hard calls to a smarter model.
    model = "anthropic/claude-opus-4-8" if important else "z-ai/glm-5.2"
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
    )

# The routine 70-80% runs on GLM-5.2
run("Summarize this thread and draft a reply.")

# The hard rest gets routed to Opus 4.8
run("Plan the migration across these 12 services.", important=True)

That is the whole routing pattern. The same idea works in any framework - LangChain, the Vercel AI SDK, your own loop - because OpenRouter speaks the OpenAI format. The only change is the model string and the key, and your agent keeps doing exactly what it did before, for a fraction of the per-token cost.

Part 5: Routing the hard calls to a smarter model

The whole point is GLM-5.2 by default, a smarter model only when it matters. How you route the hard rest depends on which setup it is:

  • In OpenClaw: keep GLM-5.2 as the primary. When a task needs a frontier model, switch in the model picker (or set a different primary and restart). On older OpenClaw versions, adding the newest Claude slugs through OpenRouter can crash startup (see Troubleshooting), so the simplest reliable setup is GLM-5.2 as the default with a manual switch for the rare hard task.
  • In your own agents: use the important=True flag from Part 4. The default call goes to GLM-5.2; only the calls you flag get routed to Opus 4.8. Cheap by default, premium on demand.

The cost breakdown

Take a representative production agent task using 100k input and 20k output tokens. Here is what that one task costs on each model:

ModelCost for 100k in + 20k out
GLM-5.2~$0.23
Claude Sonnet 4.6~$0.60
Claude Opus 4.8~$1.00

That is roughly 60-65% cheaper than Sonnet 4.6 for the same job, and about 4-5x cheaper than Opus 4.8. Route the routine 70-80% of your agent's calls to GLM-5.2 and keep the hard rest on a smarter model, and your blended bill drops by around 65%. OpenRouter's prompt caching cuts repeated-context costs further (cached input on GLM-5.2 runs around $0.26 per 1M tokens).

Troubleshooting (the real gotchas)

  • Config rejected, "name expected, received undefined": every model entry needs a name string, not just an id.
  • OpenClaw crashes with `ANTHROPIC_MODEL_ALIASES before initialization`: your OpenClaw version is too old to recognize the newest Claude slugs. Fix: keep OpenClaw on GLM-5.2 only, or update OpenClaw to the latest version, then re-add the Claude models.
  • Web terminal breaks long pastes: build scripts from short echo '...' >> file lines, then run the file. Never paste a 500-character one-liner into a browser terminal.
  • First call is slow or times out: with a 1M-token context, the first token can take a while on long agent runs. Raise your client's request timeout (for OpenClaw, the per-request timeout in the config; for your own code, the client timeout setting).
  • Always back up `openclaw.json` before editing - it holds your gateway token.

When to use GLM-5.2 (and when not)

Default to GLM-5.2 for the bulk of production agent work: enrichment, triage, drafting, research, tool-calling loops, anything an agent does at volume. It is fast, cheap, and good enough that you will rarely notice the difference. Route to Claude Opus 4.8 on the genuinely hard, high-stakes calls where a small quality edge is worth the price. And keep building with Claude Code, Cursor, and Codex on their own subscriptions - the win here is in what your agents run on once they ship, not in what you write them with.

Frequently asked questions

Do I run my coding tools (Claude Code, Cursor) on GLM-5.2?

No. Claude Code, Cursor, and Codex are what you build with, on a flat subscription - you leave them as they are. GLM-5.2 is for the agents you ship that hit an API per token all day. That production, per-token work is where switching off a frontier model actually cuts the bill.

How much does GLM-5.2 cost?

About $1.40 per million input tokens and $4.40 per million output tokens on Z.ai (and roughly $1 / $4 on OpenRouter, depending on provider). That is about 60-65% cheaper than Claude Sonnet 4.6 and 4-5x cheaper than Claude Opus 4.8, or roughly a sixth of the closed frontier models.

How do I switch OpenClaw to GLM-5.2?

Add OpenRouter as a provider in openclaw.json with your key, set the default model to z-ai/glm-5.2, and restart the container. The full step-by-step is in Part 3 above.

What is the OpenRouter model name for GLM-5.2?

z-ai/glm-5.2. Use that exact string as the model id when you configure OpenClaw or any OpenAI-compatible agent through OpenRouter.

Is GLM not the same as GGML?

No. GGML is a file format for running small models locally. GLM is Z.ai's model family. They are unrelated.

Do I have to host GLM-5.2 myself?

No. It is open-weight, but you can call it through a hosted API like OpenRouter or Z.ai and pay per use. Self-hosting only makes sense if you have serious GPUs and a reason to keep inference in-house.

Wrap-up

You now have OpenClaw and your production agents running on GLM-5.2, routing the routine 70-80% to it and only the hard calls to a smarter model. Most teams cut their agent API bill by more than half this way without losing real quality - while still building with whatever coding tools they like, on their own subscriptions.

If you want help wiring this into a production agent setup (multi-model routing, cost controls, the works), that is exactly what we build at Anfloy. Book a call or browse the rest of our guides.

About Dima Bilous

Founder of Anfloy. Builds custom AI agent systems for B2B GTM, content, and internal ops. Forward-deployed AI engineering, not an agency.

All posts
[ 099 ]The next move

Let's build
what your
company needs.

Drop your email. We'll send The Custom Agent Blueprint on what we'd build first for a company like yours, before you ever take a meeting.

↳ Or skip ahead · book a call