Skip to content
AI ConceptFoundationscore

Context Window

The context window is the total amount of text a model can process in a single session — your messages, the AI's replies, any instructions, and any extra information you've provided. Once a conversation exceeds this limit, earlier content is dropped and forgotten. This is the single most important constraint to understand: AI has no memory outside its context window.

Videos explaining this concept

E007

Notes on AI

What AI Receives When You Send a Prompt

A Prompt is commonly misunderstood as the sole input to an AI model. In reality, it is only the visible "Top Slice" of a larger input stack, best understood as a Prompt Sandwich.

E011

Notes on AI

Tokens

AI models don't read words — they read tokens, the basic unit of text a model processes. A token is close to a word but not identical: one word can be one token, several tokens, or several words can merge into one. Everything in AI is measured in tokens: input, output, context window size, and pricing. One token is roughly four characters in English; once you understand tokens, the limits and costs of AI stop feeling arbitrary.

E012

Notes on AI

Tokenization

Tokenization is the process of turning raw text into tokens before an AI model processes it. It is preprocessing, not thinking — the model only sees the resulting pieces.

E014

Notes on AI

Context Window

The context window is the model's working space, not its memory — only what is currently visible can be reasoned about. Think of it as a desk: only the papers currently on it can be used, and as new papers arrive, old ones slide off the edge. This explains why instructions seem to disappear, why answers contradict earlier statements, and why long conversations slowly fall apart — the model isn't being careless, it simply no longer sees what you think it should remember.

E015

Notes on AI

Context Engineering

Context engineering is the practice of shaping the entire information environment the model operates in — not just writing better prompts. The model never sees just your prompt: it sees system instructions, safety policies, retrieved documents, tool outputs, and your message all at once. Managing what is visible, what is repeated, and what is emphasised matters more than clever phrasing, because the model responds to what it can see — not to what you intended.

E016

Notes on AI

Why Long Chats Drift

Long conversations degrade AI output because the context gets overcrowded — not because the model loses intelligence. As a conversation grows, instructions get diluted, topics blend together, and eventually early content falls outside the context window entirely. The fix is not to argue harder or add more text: open a new conversation, reintroduce only what matters, and clarity returns.

E017

Notes on AI

"Forgetting" vs "Never Knew"

When AI doesn't know something, there are only three structured causes: a training gap (the information was never in the dataset), an injection gap (it exists but was never added to the system), or a visibility limit (it fell outside the context window mid-conversation). In all three cases the model isn't being unintelligent — it's operating within structural constraints. Understanding which cause applies tells you exactly how to fix it.

E020

Notes on AI

Why Long Chats Get Confused

The LLM inside a chatbot has no memory — every response is generated fresh from the same frozen model. What feels like memory is the surrounding system assembling a prompt from conversation history and re-sending the whole thing each time; the model re-reads, it doesn't remember. When conversations grow too long for the context window, a different approach is used: retrieval-augmented generation (RAG) finds only the relevant pieces and adds them to the prompt at the moment they're needed.

E021

Notes on AI

Grounding

Pasting a document into AI gives the model access to it — but the model still draws on everything it was trained on and fills gaps silently. Grounding adds a boundary with two instructions: tell the model to answer only from the document, and tell it to say "I don't know" when something is missing. Without the second sentence, the model transitions from your source to its training without any signal. Two practical limits: documents larger than the context window can't be fully read, and scanned PDFs are images — the model can't read the words inside.

Notes on AI

How to Keep AI Sharp

AI output quality doesn't depend on which model you're using — the model is a fixed variable. The variable you control is the context: what you give the model to read. Context degrades in three ways — accumulation of irrelevant old messages, dilution when earlier instructions compete with new ones, and noise from errors or contradictions in your documents. The fix for each is simple: start fresh, include only what's relevant, and check what you're handing the model before you ask.

E033

Notes on AI

What Hallucinations Are

A hallucination is what happens when an AI model produces plausible-sounding text that isn't actually true. The word suggests something unusual — a drift, a malfunction. But the model isn't doing a...

Understand first

Related concepts