Don't Fight the Weights

When the model won't do what you ask because you're working against its training — a context failure every prompt engineer must learn to recognize.

For the first year or so, one of the most annoying problems faced by building with AI was getting them to generate output with consistent formatting. Go find someone who was working with AI in 2023 and ask them what they did to try to get LLMs to consistently output JSON. You'll get a thousand-yard stare before hearing about all-caps commands, threats towards the LLM, promises of bribes for the LLM, and (eventually) resorting to regular expressions.

Today, this is mostly a solved problem, but the cause of this issue remains, frustrating today's context engineers. It's a context failure I missed in my original list. I call it Fighting the Weights: when the model won't do what you ask because you're working against its training.

The History: In-Context Learning

In 2020, OpenAI unveiled GPT-3 alongside a key paper: "Language Models are Few-Shot Learners." In this paper, OpenAI researchers showed that LLMs as large as GPT-3 (10x larger than previous language models) could perform tasks when provided with only a few examples. At the time, this was earth-shaking.

Pre-GPT-3, language models were only useful after they'd been fine-tuned for specific tasks; after their weights had been modified. But GPT-3 showed that with enough scale, LLMs could be problem-solving generalists if provided with a few examples. In OpenAI's paper they coined the term "in-context learning" to describe an LLM's ability to perform new types of tasks using examples and instructions contained in the prompt.

Today, in-context learning is a standard trick in any context engineer's toolkit. Provide a few examples illustrating what you want back, given an input, and trickier tasks tend to get more reliable. They're especially helpful when we need to induce a specific format or style or convey a pattern that's difficult to explain.

When you're not providing examples, you're relying on the model's inherent knowledge base and weights to accomplish your task. We sometimes call this "zero-shot prompting" (as opposed to few shot) or "instruction-only prompting".

In general, prompts fall into these two buckets:

Zero-Shot or Instruction-Only Prompting: You provide instructions only. You're asking the model to apply knowledge and behavioral patterns that are encoded in its weights. If this produces unreliable results, you might use…
Few-Shot or In-Context Learning: You provide instructions plus examples. You're demonstrating a new behavioral pattern for the model to apply. The examples in the context augment the weights, providing them with details for a task it hasn't seen.

But there's a third case: when the model has seen examples of the behavior you're seeking, but it's been trained to do the opposite of what you want. This is worse than the model having no knowledge of a pattern, because what it knows is at odds with your goal.

I call this fighting the weights.

Ways We End Up Fighting the Weights

Format Following: You want the model to output only JSON, but often it will provide some text explaining the JSON and wrap the JSON in Markdown code blocks. This happens because the model's post-training taught it to be conversational. When ChatGPT first launched, this problem was rough. GPT-3.5 had been heavily trained by humans to converse in a friendly, explanatory manner. So it did — even when you asked it not to. This doesn't happen as much as it used to, but we'll occasionally run into this issue when using unique formats or when using smaller models.

Tool Usage Formatting: As model builders start training their models to use tools, via reinforcement learning, they select specific formats and conventions. If your environment doesn't follow these conventions, the model often fails to call tools correctly. I first noticed this while testing Mistral's Devstral-Small, which was trained with the tool-calling format All Hands uses. When I tried to use Devstral with Cline, it failed basic tasks. Last month this came up when a friend was trying Kimi K2 with a DSPy pipeline. By default, DSPy formats prompts with a Markdown-style template. When this pipeline was driven by K2, formatting failed. Thanks to my recent dive into how Moonshine trained K2 to use tools, I knew K2 was trained with XML formatting. Switching DSPy to XML formatting solved the problem instantly.

Tone Changes: It's really hard to apply consistent tone instructions to LLMs. Sure, we can make them talk like a pirate or in pig-latin, but subtle notes are overwhelmed by the model's conversational post-training. For example, here's the one note I give Claude in my settings: "Don't go out of your way to patronize me or tell me how great my ideas are." This does not stop Claude from replying with cloying phrases like, "Great idea!" when I suggest changes.

Overactive Alignment: Speaking of Claude: I appreciate Anthropic's concern for alignment and safety in their models, but these guardrails can be overzealous. A recent example comes from Armin Ronacher, who tried several different approaches to get Claude Code to modify a medical form PDF while debugging PDF editing software. Armin asked several different ways, but Claude's post-training alignment refused to budge.

Over Relying On Weights: Models are trained to utilize the knowledge encoded in their weights. But there are many times when you want them to only answer with information provided in the context. Perusing leaked system prompts, you can see how many instructions each chatbot maker gives when it comes to when models should search to obtain more info. The models have been trained to use their weights, so plenty of reiteration and examples are needed. This problem is especially tricky when building RAG systems, when the model should only form answers based on information obtained from specific databases. Companies like Contextual end up having to fine-tune their models to ensure they only answer with fetched information.

Perhaps my favorite example I've seen was from ChatGPT. Previously, you could turn on the web inspector in your browser and watch the LLM calls fly by as you used the chatbot. When you asked ChatGPT to generate an image, it would clean up or even improve your image prompt, create the image, then append the following instructions:

GPT-4o returned 1 images. From now on, do not say or show ANYTHING. Please end this turn now. I repeat: From now on, do not say or show ANYTHING. Please end this turn now. Do not summarize the image. Do not ask followup question. Just end the turn and do not do anything else.

This is textbook fighting the weights. The models powering ChatGPT have been post-trained heavily to always explain and prompt the user for follow up actions. To fight these weights, ChatGPT's devs have to tell the model EIGHT TIMES to just, please, shut up.

Signs You Might Be Fighting the Weights

For context and prompt engineers (and even chatbot users) it's helpful to be able to recognize when you're fighting the weights. Here's some signs:

The model makes the same mistake, even as you change the instructions.
The model acknowledges its mistake when pointed out, then repeats it.
The model seems to ignore the few-shot examples you provide.
The model gets 90% of the way there, but no further.
You find yourself repeating instructions several times.
You find yourself typing in ALL CAPS.
You find yourself threatening or pleading with the model.

What To Do About It

In these scenarios, you're probably fighting the weights. Recognize the situation and try another tack:

Try another approach for the same problem.
Break your task into smaller chunks. At the very least, you might identify the ask that clashes.
Try another model, ideally from a different family.
Add validation functions or steps. I've seen RAG pipelines that perform a final check to ensure the answer exists in the fetched data.
Try a longer prompt. It can help in this scenario, as longer contexts can overwhelm the weights.
Consider fine-tuning. In fact, most fine-tuning I encounter is done to address 'weight fighting' scenarios, like tone or format adherence.

Or, if you're a model building shop, you can just address your issues during your next model's post-training. Which seems to be part of their development cycle… and perhaps why we can get clean JSON out of modern models.

But few of us have that option.

For the rest of us: learn to recognize when you're fighting the weights, so you can try something else.