AI ConceptReality Checkextended
Latency
Latency is the delay between sending a prompt and receiving a response. For simple prompts with smaller models it can be under a second; for complex reasoning tasks with large models it can be 10–30 seconds. In production AI systems, latency is a real design constraint — not everything can wait, and users notice. Streaming (showing text as it generates) is one common technique to reduce the perception of latency.
No videos covering this concept yet — follow on YouTube to be notified.