AI ConceptReality Checkextended

Latency

Latency is the delay between sending a prompt and receiving a response. For simple prompts with smaller models it can be under a second; for complex reasoning tasks with large models it can be 10–30 seconds. In production AI systems, latency is a real design constraint — not everything can wait, and users notice. Streaming (showing text as it generates) is one common technique to reduce the perception of latency.

No videos covering this concept yet — follow on YouTube to be notified.

Related concepts

Inference Cost (Tokens & Pricing)

Explore on Knowledge Map Full list of concepts