AI ConceptCapabilities & Behaviorexploration

Multimodality

Multimodal models can understand and generate multiple types of data — text, images, audio, video — within a single system. Instead of separate AI tools for each format, a multimodal model can look at an image and describe it, listen to speech and transcribe it, or answer questions that combine visual and textual information. This is a significant expansion of what AI can perceive and respond to.

Videos explaining this concept

E004

Notes on AI

Why GenAI Advanced All at Once

Common perception links GenAI primarily to chatbots. However, GenAI is a broad family of models sharing a single "DNA": Prediction.

Related concepts

Tool Calling

Explore on Knowledge Map Full list of concepts