Controlling Model Usage in OpenClaw

February 22, 2026 | 5 minutes

When I first set up my OpenClaw bot, Clyde, I didn't think much about which model he was running. I configured things once, saw responses coming through, and moved on to building features.

A few days later, I started getting 401 errors. When I reviewed my Anthropic usage data, it was significantly higher than I expected. Clyde had been running a higher-usage model than I intended, and a few architectural choices I'd made early on were quietly multiplying token consumption, leaving me frequently waiting for my usage limit to reset.

This post covers what I learned about controlling model selection and token usage in OpenClaw including the things that surprised me, the mistakes I made, and the fixes that actually worked.

Lesson 1: Set a Default Model

Don't assume you know which model your agent is running. Set it explicitly in your configuration and verify it. If you skip this step, OpenClaw will pick a default for you, and it might not be the one you expect.

OpenClaw stores model configuration in its configuration hierarchy, typically in openclaw.json. The area of the structure that covers model use looks like this:

1{
2  "agents": {
3    "defaults": {
4      "model": {
5        "primary": "provider/model-id"
6      },
7      "models": {
8        "provider/model-id": {}
9      }
10    }
11  }
12}

primary is the default model for new sessions. models is the set of allowed models for that agent. If you only want your agent to use one model, set both to the same value:

1{
2  "agents": {
3    "defaults": {
4      "model": {
5        "primary": "anthropic/claude-sonnet-4-6"
6      },
7      "models": {
8        "anthropic/claude-sonnet-4-6": {}
9      }
10    }
11  }
12}

You can also manage this from the CLI:

1openclaw models list
2openclaw models set openrouter/anthropic/claude-sonnet-4-6

Model IDs follow the pattern provider/model-id. Always verify after setting the model because a typo here can cause problems, which brings us to the next lesson.

Lesson 2: Silent Fallbacks Will Surprise You

If your configured model is unavailable for any reason, such as a typo in the model ID, the provider is rate-limiting you, or the model is down, OpenClaw may silently fall back to a different model. That fallback might consume more tokens and behave differently, which makes debugging confusing because you're looking at outputs from a model you didn't expect to be running.

You can check which model is active at any time from the CLI:

1openclaw models

This prints the model currently in use, not just the one you configured. If they don't match, you've hit a fallback.

For programmatic checks, you can query the active model at runtime and log it at startup:

1const activeModel = await agent.getActiveModel();
2console.log(`Active model: ${activeModel.id}`);

I'd also recommend adding a periodic check that compares the running model against your configured one:

1const configured = config.agents.defaults.model.primary;
2const active = await agent.getActiveModel();
3
4if (active.id !== configured) {
5  console.warn(`Model mismatch — expected ${configured}, got ${active.id}`);
6}

If the model that's actually running doesn't match what you configured, you'll know immediately instead of finding out when you hit your usage limit.

Lesson 3: Discord Bots Burn Tokens in Ways You Don't Expect

This is where most of my usage surprises came from. Setting the right model was necessary but not sufficient. The real issue was how often the model was being called.

Cron Jobs Add Up Fast

I had one task scheduled to run every minute. That's 1,440 LLM calls per day for that one task. The task was lightweight and the bot often just checked for a tiny bit of data, but it still led to token usage that accumulated quickly.

The fix was straightforward: increase the intervals to 15–30 minutes, only trigger on state changes rather than on a timer, and cache results where possible. It didn't really need to run every minute. I just hadn't thought about how even checking for tiny data payloads would add up over time.

Listening to Too Many Channels

Clyde was listening to every message in every channel he had access to. Even when he wasn't responding, background evaluation of incoming messages was consuming tokens. In a busy Discord server, that adds up.

I restricted Clyde to specific channels and required either a slash command or an @mention to trigger a response. This cut token usage dramatically without changing the user experience much. For the most part, people were already mentioning him when they wanted his attention.

Multiple Bots in One Server

For a while, I had multiple OpenClaw bots all running in the same Discord server. Each one was independently processing the same messages. Triple the token usage for no benefit.

Keep your environments isolated. Run different bots in separate servers or in separate channels unless they need to interact with each other.

What I'd Do Differently From the Start

If I were setting up a new OpenClaw agent today, I'd do a few things before writing any features:

  • Set an explicit model per agent. Don't rely on defaults you haven't verified.
  • Log the active model ID and token usage on every request. This makes usage surprises visible immediately instead of when you start hitting errors.
  • Set conservative cron intervals. Start at 30 minutes and decrease only if you need to.
  • Restrict Discord channels from the start. It's easier to add channels later than to retroactively cut them.

Final Thoughts

Setting a default model in OpenClaw is the easy part. The real work is understanding the full picture of what drives token usage and most of it isn't the model itself. It's the architecture around it: how often the model gets called, how much context it receives, and whether your environment is accidentally duplicating work.

I treat my OpenClaw bots like production infrastructure now. They have monitoring, usage controls, and isolated environments. It took a few hard lessons to get there, but the system is much more predictable as a result.