Full Explanation
The LLM inside a chatbot has no memory. Every response is generated fresh from the same frozen trained model -- it doesn't retain anything between responses and doesn't become smarter as you chat. What feels like memory is actually the surrounding system assembling a prompt that includes the conversation history and re-sending the whole thing to the model each time. The model re-reads; it doesn't remember.
This architecture has a practical constraint: the context window. The model can only see a limited amount of text at once. When conversations grow long, earlier content gets compressed or removed -- and once it leaves the context window, the model cannot access it. For large documents and knowledge bases too big to fit in the prompt, a different approach is used: retrieval augmented generation (RAG). A retrieval system searches for the relevant pieces and adds only those to the prompt at the moment they're needed. The model answers using that context, without retaining any of it.


