The AI Lexicon

Your friendly guide to AI terms — short, plain-English definitions so you can learn fast and use them with confidence.

#

3D Reconstruction

The process of building three-dimensional models of objects or scenes from 2D images or sensor data.

A

a Chatbot

Learn what a chatbot is, the difference between rules-based and AI-powered chatbots, and how they use Large Language Models to simulate human conve...

a Diffusion Model

Learn what a Diffusion Model is in AI. Discover how AI art generators like Midjourney and Stable Diffusion create images by reversing noise through...

a Feature in AI

Learn what a feature is in machine learning and AI. Understand feature types, feature engineering, feature selection, and why the right features ar...

a Foundation Model

Learn what a Foundation Model is in AI. Understand how massive pre-trained models like GPT-4 and Claude serve as a base for thousands of specialize...

a GAN (Generative Adversarial Network)

Learn what a GAN (Generative Adversarial Network) is in AI. Understand how two competing neural networks—a Generator and a Discriminator—create rea...

a GPU

Learn what a GPU (Graphics Processing Unit) is and why it is the essential hardware powering modern AI training and deep learning workloads.

a Hidden Layer

Learn what a hidden layer is in neural networks, why they are called 'hidden,' how they transform data, and the difference between deep and shallow...

a Kernel

Understand what a kernel is in AI and machine learning. Learn about kernels in SVMs, the kernel trick for nonlinear classification, kernels in CNNs...

a Large Language Model (LLM)

Learn what a Large Language Model (LLM) is, how it works by predicting the next word, its capabilities in writing and coding, and its key limitations.

a Loss Function

Learn what a loss function is in machine learning, common loss functions like MSE and cross-entropy, how loss guides learning through backpropagati...

a Neural Network

Learn what a neural network is and how it works. Understand layers, weights, biases, activation functions, forward propagation, and backpropagation...

a Prompt

Learn what a prompt is in AI, the anatomy of a good prompt, zero-shot vs few-shot prompting, and the difference between system and user prompts. A ...

a Transformer

Learn what a Transformer is in AI. Understand the self-attention mechanism, encoder-decoder architecture, and why Transformers power GPT, BERT, Cla...

a Validation Set

Learn what a validation set is in machine learning. Understand the train/validation/test split, why you cannot use the test set for tuning, and how...

a Vector Database

Learn about vector databases: how they store and search embeddings using similarity search, power RAG systems, and compare popular options like Pin...

a Vision Transformer (ViT)

Learn what a Vision Transformer (ViT) is. Understand how ViT splits images into patches, applies self-attention, and rivals CNNs for image recognit...

a Weight in Neural Networks

Learn what weights are in neural networks. Understand how weights control the strength of connections between neurons, how they are initialized, an...

a Word Embedding

Learn what word embeddings are. Understand how words become vectors, how semantic similarity works in vector space, and how Word2Vec, GloVe, and mo...

A/B Testing (in ML)

A controlled experiment comparing two model variants to determine which performs better on real users, essential for validating ML improvements in ...

A/B Testing for ML

Comparing two ML model versions in production by routing traffic between them and measuring outcomes.

Ablation Study

A systematic experiment where components of a model or system are removed one at a time to measure the contribution of each component to overall pe...

Abstractive Summarization

An NLP technique that generates new, concise text capturing key information from a source document, rather than simply extracting existing sentences.

Activation Function

A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.

Active Learning

A machine learning approach where the model identifies which unlabeled examples would be most informative to label next, minimizing annotation effort.

Actor-Critic

An RL architecture combining a policy network (actor) with a value network (critic) for stable training.

Adam Optimizer

An adaptive learning rate optimization algorithm that combines momentum and RMSProp, widely used as the default optimizer for training neural netwo...

Adapter

Small, trainable modules inserted into a pre-trained model that enable task-specific fine-tuning without modifying the original model weights.

Adversarial Attack

Deliberately crafted inputs designed to cause AI models to make incorrect predictions, often imperceptible to humans but devastating to models.

Adversarial Robustness

A model's ability to maintain correct predictions when inputs are intentionally perturbed by an attacker.

Agentic Workflow

An AI system that autonomously plans, executes, and iterates on multi-step tasks using tools and reasoning.

AGI

Understand Artificial General Intelligence (AGI) - the theoretical leap from narrow AI specialists to a single, unified intelligence capable of rea...

AI Accelerator

Specialized hardware designed to speed up AI and machine learning workloads, including GPUs, TPUs, and custom AI chips.

AI Agent Framework

Software libraries that provide the building blocks for creating autonomous AI agents with tools, memory, and planning.

AI Alignment

Learn about the AI Alignment problem - the critical challenge of ensuring AI goals match human values. Understand misalignment risks, the paperclip...

AI Art

Visual artwork created with the assistance of artificial intelligence, typically using generative models like diffusion models or GANs to produce i...

AI Bias

Understand AI bias - how machine learning models can develop systematic prejudices from unbalanced training data. Learn about real-world consequenc...

AI Chip

A specialized processor designed specifically for AI workloads, optimized for the matrix operations and parallel computation that neural networks r...

AI Chip (Detailed)

Specialized semiconductor hardware designed to accelerate AI workloads including training and inference.

AI Coding Assistant

AI tools that help developers write, debug, review, and understand code using large language models.

AI Companion

AI systems designed for ongoing conversational interaction, emotional support, or personal assistance.

AI Digital Twin

A virtual replica of a physical system or process, enhanced with AI for simulation, prediction, and optimization.

AI Energy Consumption

The significant and growing energy demands of training and running AI models in data centers worldwide.

AI Ethics

The study of moral principles and values that should guide the design, development, and deployment of artificial intelligence systems.

AI for Science

Using AI to accelerate scientific discovery across physics, chemistry, biology, and other disciplines.

AI Governance

The frameworks, policies, and practices that organizations and governments use to ensure AI systems are developed and deployed responsibly.

AI Guardrails

Programmatic safety constraints that filter, validate, or modify LLM inputs and outputs.

AI Hallucination

When an AI model generates confident but factually incorrect, fabricated, or nonsensical information that has no basis in its training data or real...

AI Inference

Learn what AI inference is - the process of using a trained machine learning model to make predictions on new data. Understand the difference betwe...

AI Literacy

The ability to understand, evaluate, and effectively interact with AI systems in daily life and work.

AI Safety

The interdisciplinary field focused on ensuring AI systems behave as intended and do not cause unintended harm, especially as systems become more c...

AI Sandboxing

Isolated environments for testing AI systems safely before deployment, limiting their access and capabilities.

AI Search

Search engines enhanced with AI that provide synthesized answers, summaries, and conversational search experiences.

AI Transparency

The practice of making AI systems' operations, decisions, and limitations understandable to stakeholders.

AI Watermarking

Embedding hidden signals in AI-generated content to enable detection and attribution.

AI Wearable

Physical devices worn on the body that integrate AI for real-time assistance, monitoring, or augmented experiences.

AI Winter

A period of reduced funding and interest in artificial intelligence research, historically occurring when AI fails to meet inflated expectations.

AI Zeitgeist

Understand the AI Zeitgeist: the prevailing spirit and dominant trends shaping artificial intelligence today. Explore the current AI wave including...

AIOps

Using AI and machine learning to automate and enhance IT operations, monitoring, and incident management.

Algorithmic Accountability

The principle that organizations deploying AI should be answerable for the outcomes of their systems.

an AI Agent

Learn about AI agents: autonomous systems that perceive, reason, plan, and act. Understand the agent loop, types of agents, tool use, function call...

an AI Hallucination

Understand what AI hallucinations are, why large language models generate false or fabricated information with confidence, and how to mitigate the ...

an AI Model

Learn what an AI model is - the trained artifact produced by machine learning that contains the knowledge and parameters needed to make predictions...

an Algorithm

Learn what an algorithm is with this visual, beginner-friendly guide. Understand how algorithms work as step-by-step instructions, from simple reci...

an Encoder

Learn what an encoder is in AI and deep learning, how it compresses data into meaningful representations, the encoder-decoder architecture, and use...

an Iteration in AI

Understand what an iteration means in AI and machine learning. Learn how training loops, epochs, and batches work together to teach neural networks...

an Optimizer

Learn what an optimizer is in machine learning, how gradient descent works, popular optimizers like SGD and Adam, the connection to learning rate, ...

Anomaly Detection

The process of identifying data points, events, or observations that deviate significantly from expected patterns.

Anthropic

An AI safety company founded by former OpenAI researchers, creator of the Claude family of language models, focused on building reliable and safe A...

API (for AI Models)

An Application Programming Interface that allows developers to access AI model capabilities programmatically through structured requests and respon...

Artificial General Intelligence (AGI)

A hypothetical AI system with human-level cognitive abilities across all domains, capable of learning any intellectual task that a human can perform.

Artificial Neural Network (ANN)

A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information.

arXiv

An open-access repository where researchers publish preprints of scientific papers, serving as the primary venue for sharing AI and machine learnin...

Attention Head

An individual attention mechanism within a multi-head attention layer, learning to focus on different aspects of the input like syntax, semantics, ...

Attention Head

An individual attention computation within a multi-head attention layer, each head learning to focus on different types of relationships in the input.

Attention Score

A numerical weight that determines how much a token attends to (or focuses on) another token in the attention mechanism of transformers.

Attention Sink

The phenomenon where LLMs disproportionately attend to the first few tokens regardless of their content.

AUC-ROC

Area Under the Receiver Operating Characteristic Curve -- a metric that measures a classification model's ability to distinguish between classes ac...

Autoencoder

A neural network trained to compress input data into a compact representation (encoding) and then reconstruct the original data from that represent...

AutoML

Automated Machine Learning -- tools and techniques that automate the process of building machine learning models, from feature engineering to model...

Autonomous Agent

An AI system that can independently perceive its environment, make decisions, and take actions to achieve goals without constant human supervision.

Autonomous Vehicle AI

AI systems that enable self-driving vehicles through perception, planning, and decision-making algorithms.

B

Backpropagation

Understand backpropagation, the core algorithm that trains neural networks. Learn how AI learns through forward passes, loss functions, and weight ...

Backward Pass (Backpropagation)

The phase of neural network training where gradients are computed by propagating the error signal backward from the output layer to the input layer.

Bagging (Bootstrap Aggregating)

An ensemble method that trains multiple models on random subsets of the training data, then averages their predictions to reduce variance and preve...

Batch Normalization

A technique that normalizes layer inputs by adjusting and scaling activations across a mini-batch, stabilizing and accelerating neural network trai...

Batch Processing

Processing multiple data samples simultaneously rather than one at a time, leveraging GPU parallelism for efficient training and inference of AI mo...

Batch Size

The number of training examples processed together in one forward/backward pass during neural network training.

Bayesian Inference

A statistical method that updates probability estimates as new evidence becomes available, using Bayes' theorem to combine prior knowledge with obs...

Bayesian Optimization

A strategy for optimizing expensive-to-evaluate functions by building a probabilistic model of the objective and using it to select the most promis...

Beam Search

A search algorithm used in text generation that explores multiple candidate sequences simultaneously, keeping the top-K most promising options at e...

Beam Search

A text generation strategy that explores multiple candidate sequences simultaneously, keeping the top-k most promising candidates at each step.

Benchmark

A standardized test or dataset used to evaluate and compare the performance of AI models on specific tasks.

BERT

Bidirectional Encoder Representations from Transformers -- a landmark language model from Google (2018) that learns deep bidirectional representati...

Bias in AI

Systematic errors in AI systems that produce unfair outcomes, often reflecting historical prejudices in training data or flawed assumptions in mode...

Bias-Variance Tradeoff

The fundamental tension in machine learning between a model's ability to fit training data (low bias) and its ability to generalize to new data (lo...

BLEU Score

Bilingual Evaluation Understudy -- a metric for evaluating machine translation quality by comparing generated text against reference translations u...

Boltzmann Machine

A stochastic generative neural network that learns probability distributions over its inputs.

Boosting

An ensemble technique that trains models sequentially, with each new model focusing on correcting the errors of the previous ones, creating a stron...

Byte-Pair Encoding

A subword tokenization algorithm that iteratively merges the most frequent character pairs.

C

Canary Deployment

Gradually rolling out a new model version to a small subset of traffic before full deployment.

Capsule Network

A neural network architecture proposed by Geoffrey Hinton that uses groups of neurons (capsules) to better capture spatial hierarchies and part-who...

Catastrophic Forgetting

The tendency of neural networks to abruptly lose previously learned knowledge when trained on new data or tasks.

Catastrophic Interference

The tendency of neural networks to abruptly forget previously learned information when trained on new data.

Causal Inference

Methods for determining cause-and-effect relationships from data, going beyond correlation to understand why things happen and predict intervention...

Causal Language Model

A language model that generates text left-to-right, predicting each token based only on the tokens that came before it (never looking ahead).

Chain-of-Thought (CoT)

A prompting technique that encourages language models to show their reasoning step-by-step before arriving at a final answer, significantly improvi...

Chain-of-Thought Reasoning

A prompting technique where LLMs solve problems by generating intermediate reasoning steps.

ChatGPT

OpenAI's conversational AI product that brought large language models to mainstream awareness, powered by GPT-3.5 and later GPT-4.

Checkpoint

A saved snapshot of a model's weights and training state at a specific point during training, enabling recovery from failures and selection of the ...

Chinese AI Models

AI models developed by Chinese companies including DeepSeek, Qwen, Yi, and Baichuan, increasingly competitive globally.

Class Imbalance

A dataset condition where some classes have significantly more examples than others, causing models to be biased toward the majority class.

Classification in AI

Learn what Classification means in AI. Understand how machine learning models sort data into categories like spam detection and image recognition, ...

Classifier-Free Guidance

A technique that improves conditional generation quality by interpolating between conditional and unconditional predictions.

Claude

Anthropic's family of large language models designed with a focus on safety, helpfulness, and honesty, using Constitutional AI alignment techniques.

CLIP

Contrastive Language-Image Pre-training -- an OpenAI model that learns to connect images and text by training on 400 million image-text pairs from ...

Cloud AI

AI services and infrastructure provided through cloud platforms, enabling organizations to train, deploy, and scale AI models without owning specia...

Clustering

An unsupervised learning technique that groups similar data points together without predefined labels.

Code Generation

AI systems that automatically write programming code from natural language descriptions, transforming software development with tools like GitHub C...

Collaborative Filtering

A recommendation technique that predicts user preferences based on the collective behavior of many users, assuming that users who agreed in the pas...

Compute Budget

The total computational resources (measured in FLOPs, GPU-hours, or dollars) allocated for training or running an AI model.

Compute Cluster

A group of interconnected computers (typically GPU servers) working together to train large AI models, connected by high-speed networking.

Computer Vision

Learn what Computer Vision means in AI. Discover how machines interpret images through pixels, patterns, and perception with interactive visual exa...

Concept Drift

The change in the relationship between input features and the target variable over time.

Confabulation

When an AI model generates plausible-sounding but factually incorrect information, also known as hallucination — a major reliability challenge for ...

Confusion Matrix

A table that visualizes a classification model's predictions versus actual labels, showing true positives, true negatives, false positives, and fal...

Constitutional AI

An alignment technique developed by Anthropic where an AI system is guided by a set of written principles (a 'constitution') rather than relying so...

Constitutional AI (Method)

An alignment technique where AI systems self-improve using a set of principles (a constitution) as guidance.

Content-Based Filtering

A recommendation approach that suggests items similar to what a user has previously liked, based on item features rather than other users' behavior.

Context Length Extrapolation

Techniques that allow models to handle longer sequences at inference than they saw during training.

Context Window

The maximum amount of text (measured in tokens) that a language model can process in a single input-output interaction.

Continual Learning

The ability of a model to learn from a continuous stream of data over time, accumulating knowledge without forgetting previously learned information.

Continuous Batching

An inference optimization that dynamically adds and removes requests from a batch as they complete.

Continuous Training

Automatically retraining ML models when new data arrives or performance degrades below a threshold.

Contrastive Learning

A self-supervised learning approach that trains models to bring similar examples closer together and push dissimilar examples apart in a learned re...

ControlNet

A neural network that adds spatial conditioning controls (edges, poses, depth) to diffusion models.

Convolutional Layer

A neural network layer that applies learnable filters (kernels) across input data to detect local patterns like edges, textures, and shapes.

Convolutional Neural Network (CNN)

A neural network architecture that uses learnable filters to detect spatial patterns in data, primarily used for image processing and computer visi...

Convolutional Neural Network (CNN)

A neural network architecture specialized for processing grid-like data such as images, using learned filters that detect features like edges, text...

Cosine Similarity

A metric that measures the angle between two vectors, quantifying how similar their directions are regardless of magnitude. Values range from -1 (o...

Cross-Entropy Loss

The most common loss function for classification tasks, measuring the difference between predicted probability distributions and true labels.

Cross-Entropy Loss

The most common loss function for classification tasks, measuring the difference between predicted probability distributions and actual labels.

Cross-Validation

A resampling technique that trains and evaluates a model on multiple splits of the data for robust performance estimates.

Curriculum Learning

A training strategy that presents examples to the model in a meaningful order -- typically from easy to hard -- mimicking how humans learn.

D

DALL-E

OpenAI's text-to-image generation model that creates original images from text descriptions, named after Salvador Dali and WALL-E.

Data Annotation

Learn what data annotation is and why it matters for AI. Understand classification, object detection, and semantic segmentation - the three levels ...

Data Augmentation

Techniques for artificially increasing the size and diversity of training data by creating modified versions of existing examples.

Data Curation

The process of selecting, cleaning, and organizing training data to maximize AI model quality.

Data Drift

A change in the statistical properties of the data a model receives in production compared to the data it was trained on, causing model performance...

Data Flywheel

A virtuous cycle where a product generates data that improves its ML models, which attract more users and generate more data.

Data in AI

Learn what data means in artificial intelligence, the types of data (structured, unstructured, semi-structured), data quality, data pipelines, and ...

Data Labeling

The process of annotating raw data with meaningful tags or categories, creating the labeled datasets needed to train supervised machine learning mo...

Data Lake

A centralized repository that stores vast amounts of raw data in its native format until needed for analysis or model training.

Data Leakage

When information from outside the training set improperly influences model training, leading to overly optimistic performance estimates that don't ...

Data Mesh

A decentralized data architecture where domain teams own and serve their data as products for organization-wide use.

Data Parallelism

A distributed training strategy that replicates the model on multiple GPUs, with each GPU processing a different subset of the training data.

Data Pipeline

An automated series of steps that collect, process, transform, and deliver data from source systems to where it's needed for AI model training or i...

Data Poisoning

An adversarial attack where malicious data is injected into a training dataset to corrupt the model's learned behavior.

Data Warehouse

A centralized repository that stores structured, processed data from multiple sources for analytics and ML training.

Dead Neuron

A neuron in a neural network that always outputs zero (or a constant), contributing nothing to the model's predictions, typically caused by large n...

DeBERTa

A BERT variant using disentangled attention that separates content and position information.

Decision Tree

A supervised learning algorithm that makes predictions by learning a series of if-then-else decision rules from features, visualizable as a tree st...

Decoder

In transformer architecture, a decoder generates output tokens one at a time using masked self-attention to prevent looking at future tokens, plus ...

Deep Learning

Learn what Deep Learning means in AI. Understand how neural networks use multiple layers to build hierarchical understanding, from edges to complex...

Deep Q-Network

A deep learning extension of Q-learning that uses neural networks to approximate the action-value function.

Deepfake

AI-generated synthetic media that replaces a person's likeness in existing images or videos, typically using deep learning techniques like GANs or ...

Denoising

The process of removing noise from data, a fundamental operation in diffusion models where the network learns to progressively remove Gaussian nois...

Depth Estimation

A computer vision task that predicts the distance of each pixel from the camera, creating a depth map from a single image or stereo pair.

Deterministic vs Stochastic

A deterministic process always produces the same output for the same input, while a stochastic process involves randomness and may produce differen...

Differential Privacy

A mathematical framework that provides formal guarantees that individual data cannot be identified in model outputs.

Dimensionality Reduction

Techniques that reduce the number of features in data while preserving the most important information, making data easier to visualize and process.

Dimensionality Reduction

Techniques that reduce the number of features in a dataset while preserving important information, enabling visualization and combating the curse o...

Direct Preference Optimization

An alignment technique that optimizes language models from human preferences without training a separate reward model.

Direct Preference Optimization (DPO)

An alignment technique that trains language models directly on human preference data without needing a separate reward model, simplifying the RLHF ...

Distributed Training

Training AI models across multiple GPUs or machines simultaneously, essential for large models that exceed the memory and compute capacity of a sin...

DreamBooth

A fine-tuning technique that teaches diffusion models to generate specific subjects from a few photos.

E

Early Stopping

A regularization technique that halts training when validation performance stops improving, preventing the model from overfitting to the training d...

Edge AI

Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud for processing.

ELECTRA

A pre-training method that uses replaced token detection instead of masked language modeling.

Elo Rating (for LLMs)

A ranking system adapted from chess that scores AI models based on head-to-head comparisons from human evaluators, used in the Chatbot Arena leader...

Embedding Model

A model specifically designed to convert text, images, or other data into dense vector representations (embeddings) that capture semantic meaning.

Embedding Space

A continuous vector space where items (words, images, users) are represented as points, with spatial relationships encoding semantic similarity and...

Emergent Ability

A capability that appears in large AI models but is absent in smaller ones, seemingly arising unpredictably as model scale increases.

Emergent Behavior

Unexpected capabilities that appear in large AI models but are absent in smaller versions of the same architecture.

Encoder-Decoder Model

A transformer architecture with separate encoder and decoder components, where the encoder processes input and the decoder generates output conditi...

Encoder-Only Model

A transformer architecture that processes the entire input bidirectionally to produce rich contextual representations, used for understanding tasks...

Ensemble Method

A technique that combines predictions from multiple models to achieve better performance than any single model, leveraging the 'wisdom of crowds' p...

Entity Extraction

The process of identifying and categorizing key information elements (entities) from unstructured text, such as names, dates, amounts, and organiza...

Epoch

One complete pass through the entire training dataset during model training.

Ethical AI

The practice of developing and deploying AI systems that are fair, transparent, accountable, and aligned with human values and societal well-being.

Ethical AI

The practice of designing, developing, and deploying AI systems that are fair, transparent, accountable, and aligned with human values and societal...

Evaluation Metric

A quantitative measure used to assess how well a machine learning model performs on a given task, guiding model selection and improvement.

Experiment Tracking

The systematic recording of ML experiment parameters, metrics, and artifacts for comparison and reproducibility.

Explainability (XAI)

Techniques and methods for making AI model decisions understandable to humans, answering the question 'why did the model make this prediction?'

Explainable AI (XAI)

Learn what Explainable AI (XAI) is. Understand why AI transparency matters, how methods like SHAP, LIME, and attention visualization work, and how ...

Exploration vs Exploitation

The fundamental RL dilemma between trying new actions (exploration) and using known good actions (exploitation).

Extractive Summarization

An NLP technique that creates summaries by selecting and combining the most important sentences directly from the source text, without generating n...

F

F1 Score

The harmonic mean of precision and recall, balancing both metrics in a single score.

Faster R-CNN

A two-stage object detection framework that uses a Region Proposal Network for efficient detection.

Feature Engineering

The process of selecting, transforming, and creating input variables (features) from raw data to improve machine learning model performance.

Feature Extraction

The process of transforming raw data into a set of meaningful numerical features that machine learning models can effectively use for prediction.

Feature Flag (in ML)

A mechanism to enable or disable ML model features or entire models in production without redeploying code.

Feature Store

A centralized data system that manages, stores, and serves the features (input variables) used by machine learning models in both training and prod...

Federated Learning

A machine learning approach where models are trained across multiple decentralized devices or servers without sharing raw data, preserving privacy.

Feedforward Neural Network

The simplest neural network architecture where information flows in one direction from input to output, with no cycles or loops.

Few-Shot Learning

The ability of a model to learn new tasks or recognize new categories from just a handful of examples (typically 1-5), rather than requiring thousa...

Few-Shot Prompting

A prompting technique where you provide a small number of input-output examples before the actual query to guide the model's behavior and output fo...

Fine-Tuning

The process of further training a pre-trained model on a smaller, task-specific dataset to adapt it for a particular use case or domain.

Fine-Tuning in AI

Learn what Fine-Tuning means in AI. Understand how pre-trained base models are specialized with custom data to become domain experts for specific t...

Flash Attention

An IO-aware attention algorithm that reduces memory usage and speeds up transformers through tiling.

Flash Attention

An IO-aware attention algorithm that computes exact attention much faster by optimizing memory access patterns, reducing GPU memory reads/writes.

Forward Pass

The phase of neural network computation where input data flows through the network layer by layer, producing predictions or intermediate representa...

FP16 / BF16 (Half Precision)

16-bit floating-point number formats that use half the memory of standard 32-bit floats, enabling faster training and inference of neural networks ...

Function Calling

The ability of language models to generate structured output that invokes external tools, APIs, or functions, extending their capabilities beyond t...

G

Gemini

Google DeepMind's family of multimodal AI models designed to process and generate text, images, audio, and video within a single unified architecture.

Generalization

A model's ability to perform well on new, unseen data that wasn't part of its training set -- the ultimate goal of machine learning.

Generative AI

Learn what Generative AI means. Understand how AI creates new content including text, images, and code from prompts, and how it differs from tradit...

Generative Model

An AI model that can create new content — text, images, audio, video, or code — by learning the underlying patterns and distributions in training d...

Generative Pre-Training

The training paradigm where a model learns to generate text by predicting the next token on massive unlabeled text corpora, forming the basis of mo...

GloVe

A word embedding method that combines global co-occurrence statistics with local context windows.

GPT (Generative Pre-trained Transformer)

A family of large language models by OpenAI that pioneered the approach of pre-training decoder-only transformers on massive text datasets for gene...

GPU Cluster

A computing infrastructure consisting of many interconnected GPUs used for training large AI models, the essential hardware for frontier AI develop...

GPU Poor

The state of having insufficient GPU compute resources for training or serving AI models.

Gradient Accumulation

Simulating larger batch sizes by accumulating gradients across multiple forward-backward passes before updating.

Gradient Checkpointing

Trading compute for memory by recomputing intermediate activations during backpropagation instead of storing them.

Gradient Descent

The core optimization algorithm used to train neural networks, iteratively adjusting model weights in the direction that reduces the loss function.

Graph Neural Network (GNN)

A neural network designed to operate on graph-structured data, where entities (nodes) are connected by relationships (edges).

GraphRAG

A RAG approach that builds knowledge graphs from documents to enable more structured and comprehensive retrieval.

Grokking

A phenomenon where a neural network suddenly achieves perfect generalization long after memorizing the training data, suggesting models can learn t...

Grounding

Connecting language model outputs to verifiable, factual information sources to reduce hallucination and improve accuracy.

Grounding (RAG/Search)

Connecting LLM responses to verifiable external sources to reduce hallucination.

Group Relative Policy Optimization

A reinforcement learning algorithm for LLMs that uses group-level relative rewards without a critic model.

H

Hallucination Detection

Techniques for automatically identifying when AI models generate factually incorrect or unsupported information.

HNSW (Hierarchical Navigable Small World)

The most popular algorithm for approximate nearest neighbor search in vector databases, providing fast and accurate similarity search for embedding...

Hopfield Network

A recurrent neural network that serves as associative memory, storing and retrieving patterns.

Hugging Face

The leading platform and open-source ecosystem for sharing and using machine learning models, datasets, and applications.

Human-in-the-Loop (HITL)

An AI system design where humans are involved in the decision-making loop, reviewing, correcting, or approving AI outputs before they take effect.

Hyperparameter

Configuration settings that control the training process and model architecture, set before training begins (unlike model parameters/weights that a...

I

Image Classification

A computer vision task that assigns one or more labels to an entire image, determining what category or categories the image belongs to.

Image Generation

AI systems that create new images from text descriptions, sketches, or other inputs, using diffusion models, GANs, or autoregressive approaches.

Image Inpainting

The task of filling in missing or masked regions of an image with plausible content.

Image Segmentation

A computer vision task that assigns a class label to every pixel in an image, dividing it into meaningful regions.

Imitation Learning

A machine learning approach where the model learns to perform a task by observing and mimicking expert demonstrations rather than through explicit ...

In-Context Learning

The ability of large language models to learn new tasks at inference time purely from examples provided in the prompt, without any weight updates o...

Inception Network

A CNN architecture using parallel convolutions of different sizes within a single layer (inception module).

Inference Cost

The computational expense of running a trained AI model to generate predictions or responses.

Inference Optimization

Techniques for making AI model predictions faster and cheaper in production, including quantization, batching, caching, and specialized serving inf...

Information Retrieval

The field of finding relevant documents or passages from a large collection in response to a user query, fundamental to search engines and RAG syst...

Instruction Following

An LLM's ability to accurately understand and execute user instructions, including format, constraints, and intent.

Instruction Tuning

Fine-tuning a language model on a dataset of instructions paired with desired responses, teaching it to follow human instructions across diverse ta...

INT4/INT8 Quantization

Representing model weights or activations as 4-bit or 8-bit integers instead of 16/32-bit floats, dramatically reducing model size and inference cost.

Interleaved Attention

Alternating between local and global attention patterns in transformer layers for efficiency.

Interpretability in AI

Understand interpretability in AI: why it matters, the difference between black box and glass box models, and key methods like SHAP and LIME that e...

Inverse Reinforcement Learning

Learning the reward function that best explains an expert's observed behavior.

J

Jailbreak (LLM)

Techniques that bypass an LLM's safety guardrails to make it produce content it was trained to refuse, such as harmful instructions or offensive ma...

Jaro-Winkler Similarity

Understand Jaro-Winkler similarity: a string matching algorithm that measures how similar two strings are. Learn how it works step by step, the alg...

Jitter in AI

Understand what jitter means in AI and machine learning. Learn how jitter is used for data augmentation, how it helps models generalize, and the di...

JSON Mode (Structured Output)

A feature of AI APIs that constrains the model's output to valid JSON format, ensuring parseable, structured responses for programmatic use.

JSON Schema (for AI)

Using JSON Schema definitions to constrain and validate structured outputs from language models.

K

K-Fold Cross-Validation

Learn what K-Fold Cross-Validation is in machine learning. Understand how data is split into K folds, why it gives more reliable model evaluation, ...

K-Means Clustering

A simple, widely-used clustering algorithm that partitions data into K groups by iteratively assigning points to their nearest cluster center and u...

Knowledge Cutoff

The date beyond which a language model has no training data, meaning it lacks awareness of events, discoveries, or changes that occurred after that...

Knowledge Distillation

A model compression technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model, transferring knowledge ...

Knowledge Distillation

Training a smaller, efficient model to replicate the behavior of a larger model, transferring learned knowledge while dramatically reducing computa...

Knowledge Distillation

Training a smaller student model to replicate the behavior of a larger teacher model.

Knowledge Graph

A structured representation of real-world entities and relationships, stored as a network of nodes (entities) connected by edges (relationships).

KV Cache

A memory optimization for transformer inference that stores previously computed key and value matrices, avoiding redundant computation when generat...

L

Label (Ground Truth)

The correct answer or target value associated with a training example in supervised learning, used to calculate loss and guide model training.

Large Language Model (LLM)

A neural network with billions of parameters trained on massive text datasets to understand and generate human language, forming the foundation of ...

Latent Space

A compressed, abstract representation space learned by a model where each point represents a meaningful encoding of the input data.

Layer Normalization

A normalization technique that standardizes activations across features within a single example, widely used in transformers as an alternative to b...

Learning Rate

Learn what learning rate is in machine learning, what happens when it is too high or too low, how to find the optimal learning rate, and common lea...

LLaMA (Large Language Model Meta AI)

Meta's family of open-weight large language models that brought frontier-level capabilities to the open-source community.

LLM Agent

An AI system that uses a large language model as its core reasoning engine, augmented with tools, memory, and planning capabilities to autonomously...

LLM Router

A system that automatically selects the best LLM for each query based on complexity, cost, and capability requirements.

Long Context

The ability of language models to process very long inputs (100K+ tokens), enabling analysis of entire books, codebases, or lengthy conversation hi...

Long Short-Term Memory (LSTM)

A recurrent neural network architecture designed to learn long-range dependencies in sequential data, using gating mechanisms to control informatio...

Long-Context Model

An LLM designed to process very long input sequences, from 128K tokens to over 1 million tokens.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning method that adds small, trainable low-rank matrices to existing model layers instead of updating all weights.

LoRA Adapters

Lightweight trainable modules that adapt pre-trained models to new tasks using low-rank matrix decomposition.

Loss Landscape

The multi-dimensional surface formed by the loss function across all possible model parameter values, which optimization algorithms navigate to fin...

M

Machine Learning

Learn what machine learning is - a method of teaching computers to learn from data and make predictions without explicit programming. Understand tr...

Machine Translation

Automatic translation of text or speech between languages using AI, from early rule-based systems to modern neural approaches achieving near-human ...

Machine Unlearning

Techniques for removing the influence of specific training data from a trained model.

Mamba (State Space Model)

A sequence modeling architecture based on selective state spaces that processes sequences in linear time, offering an alternative to the quadratic ...

MapReduce

A programming model for processing large datasets in parallel across a cluster, and in AI contexts, a pattern for processing data that exceeds a mo...

Mask R-CNN

An extension of Faster R-CNN that adds pixel-level instance segmentation to object detection.

Masked Language Modeling

A pre-training objective where random tokens in the input are replaced with a [MASK] token, and the model must predict the original tokens from con...

Mean Absolute Error (MAE)

A regression metric that calculates the average absolute difference between predicted and actual values, giving equal weight to all errors.

Mean Squared Error (MSE)

A regression loss function and evaluation metric that calculates the average of squared differences between predicted and actual values, penalizing...

Medical AI

AI systems designed for healthcare applications including diagnosis, drug discovery, and clinical decision support.

Meta-Learning

Learning to learn — training AI systems that can quickly adapt to new tasks with minimal data by leveraging experience from previous tasks.

Metric Learning

Learning a distance function that maps similar inputs close together and dissimilar inputs far apart.

Midjourney

A popular AI image generation service known for its distinctive artistic style and high-quality outputs, accessed primarily through a Discord bot i...

Mini-Batch

A small subset of the training dataset used for a single parameter update during gradient descent, balancing the efficiency of batch processing wit...

Mistral AI

A French AI company known for producing efficient, high-performing open-weight language models that punch above their weight class in benchmarks.

Mixed-Precision Training

Training neural networks using both 16-bit and 32-bit floating-point numbers to reduce memory and increase speed.

Mixture of Agents

An approach where multiple LLMs collaborate on a task, with each model contributing its strengths and a combining mechanism selecting or merging th...

Mixture of Experts

A model architecture that routes each input to a subset of specialized sub-networks (experts) for efficient scaling.

ML Compiler

Software that optimizes machine learning model computations for specific hardware, translating high-level model descriptions into efficient low-lev...

ML Pipeline

An automated workflow that orchestrates the steps of training, evaluating, and deploying ML models.

MLflow

An open-source platform for managing the complete machine learning lifecycle, including experiment tracking, model packaging, deployment, and regis...

Model Auditing

Systematic evaluation of AI systems for bias, fairness, safety, and compliance with standards.

Model Calibration

How well a model's predicted probabilities match actual outcome frequencies.

Model Card

A documentation framework for machine learning models that describes their intended use, performance characteristics, limitations, and ethical cons...

Model Collapse

Performance degradation when AI models are trained on data generated by previous AI models.

Model Collapse

A phenomenon where models trained on AI-generated data progressively lose quality and diversity over generations, eventually producing degenerate o...

Model Context Protocol

A standard protocol (MCP) for connecting AI models to external data sources, tools, and services through a unified interface.

Model Distillation

A technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model, achieving comparable performance with few...

Model Evaluation

Learn what model evaluation is in machine learning. Understand accuracy, precision, recall, F1 score, confusion matrices, cross-validation, and how...

Model Evaluation

The systematic process of assessing an AI model's performance using metrics, test sets, and human judgment to determine if it meets quality and saf...

Model Merging

Combining the weights of multiple fine-tuned models into a single model that inherits capabilities from all parent models, without additional train...

Model Monitoring

The continuous tracking of deployed ML model performance to detect degradation and data drift.

Model Parallelism

Distributing a single AI model across multiple GPUs or machines, necessary when a model is too large to fit in the memory of a single device.

Model Registry

A centralized repository for managing, versioning, and tracking machine learning models.

Model Serving

The infrastructure and process of deploying trained ML models to handle real-time or batch prediction requests in production environments.

Model Sharding

Splitting a large model across multiple devices so each device holds only a portion of the parameters.

Model Versioning

Tracking and managing different iterations of ML models with their associated code, data, and configurations.

Monte Carlo Methods

Computational algorithms that use repeated random sampling to estimate numerical results, widely used in reinforcement learning and probabilistic i...

Monte Carlo Tree Search

A search algorithm that uses random sampling to explore decision trees for optimal action selection.

Multi-Agent System

An AI architecture where multiple specialized agents collaborate, debate, or coordinate to solve complex problems that are difficult for a single a...

Multi-Armed Bandit

A simplified RL framework for making sequential decisions among multiple options with uncertain rewards.

Multi-Task Learning

Training a single model to perform multiple related tasks simultaneously, sharing representations across tasks to improve generalization and effici...

Multi-Turn Conversation

An extended dialogue between a user and AI spanning multiple exchanges, requiring context tracking across turns.

Multimodal AI

Learn what Multimodal AI is - a unified AI system that can process and understand multiple types of data like text, images, audio, and video simult...

Multimodal Learning

Training AI models to understand and generate content across multiple data types (text, images, audio, video).

Multimodal Model

An AI model that can process and generate multiple types of data -- text, images, audio, video -- within a single unified architecture.

Extending RAG to retrieve and process multiple data types including text, images, tables, and audio.

N

Named Entity Recognition (NER)

An NLP task that identifies and classifies named entities in text into predefined categories like person names, organizations, locations, dates, an...

Named Entity Recognition (NER)

An NLP task that identifies and classifies named entities in text into predefined categories like person names, organizations, locations, dates, an...

Narrow AI (Weak AI)

AI systems designed to perform specific tasks within a limited domain, as opposed to Artificial General Intelligence which would handle any intelle...

Natural Language Processing (NLP)

Learn what Natural Language Processing (NLP) is -- the field of AI enabling computers to understand, interpret, and generate human language. Covers...

Natural Language Understanding (NLU)

The ability of AI to comprehend the meaning, intent, and context of human language, going beyond surface-level text processing to genuine semantic ...

NeRF

A neural network that synthesizes novel views of 3D scenes from a set of 2D photographs.

Neural Architecture Search (NAS)

Automated methods for discovering optimal neural network architectures, using search algorithms to explore the space of possible designs.

Neural Network

A computing system inspired by biological neural networks, consisting of interconnected layers of artificial neurons that learn patterns from data ...

Neural Scaling Laws

Empirical laws describing how model performance improves predictably with increased compute, data, and parameters.

Neural Style Transfer

A technique that applies the artistic style of one image to the content of another using neural networks.

Next-Token Prediction

The fundamental training objective of autoregressive language models: given a sequence of tokens, predict the probability distribution over possibl...

Noise (in ML)

Random, irrelevant variations in data that don't reflect the true underlying patterns, which models must learn to ignore rather than memorize.

Noise Schedule

The predefined schedule of noise levels added during diffusion model training and removed during generation.

Normalization

Techniques that standardize data or neural network activations to a consistent scale, improving training stability and convergence speed.

O

Object Detection

A computer vision task that identifies and locates objects within an image by drawing bounding boxes around them and classifying each object.

One-Hot Encoding

A representation method that converts categorical variables into binary vectors where exactly one element is 1 (hot) and all others are 0.

Online Learning

A learning method where the model updates continuously as new data arrives, one example at a time.

ONNX (Open Neural Network Exchange)

An open format for representing machine learning models, enabling interoperability between different AI frameworks and deployment platforms.

Open-Source AI

AI models and tools released with open licenses allowing free use, modification, and distribution.

Open-Source AI Model

An AI model whose weights, architecture, and often training details are publicly released, allowing anyone to use, modify, and build upon them.

Open-Weight Model

An AI model whose trained parameters are publicly released, enabling local deployment and fine-tuning.

OpenAI

The AI research company behind GPT, ChatGPT, DALL-E, and Whisper, originally founded as a non-profit in 2015 and pivoting to a capped-profit struct...

Optical Flow

The pattern of apparent motion of objects between consecutive video frames.

Optimizer State

The additional memory used by optimization algorithms to track momentum, adaptive learning rates, and other per-parameter statistics beyond the mod...

Overfitting in Machine Learning

Learn what overfitting is in machine learning. Understand the training vs validation accuracy gap, causes like insufficient data and model complexi...

Overfitting vs Underfitting

The two failure modes of model training: overfitting (memorizing training data, poor generalization) and underfitting (failing to learn patterns, p...

P

Padding (in Neural Networks)

Adding extra values (typically zeros) around the edges of input data to control output dimensions in convolutions, or to the end of sequences to cr...

Paged Attention

A memory management technique for LLM inference that handles KV cache like virtual memory pages.

Parameter (Model Weight)

The learnable values within a neural network that are adjusted during training to minimize the loss function. Model size is typically measured in p...

PEFT (Parameter-Efficient Fine-Tuning)

A family of techniques that fine-tune only a small number of model parameters while keeping most of the pre-trained model frozen, dramatically redu...

Perplexity

A metric for evaluating language models that measures how 'surprised' the model is by test data. Lower perplexity means the model predicts the text...

Pipeline Parallelism

A distributed training strategy that splits a model's layers across multiple GPUs, with each GPU processing a different stage of the forward/backwa...

Point Cloud

A set of 3D data points representing the surface of objects, captured by LiDAR or depth sensors.

Policy Gradient

A class of RL algorithms that directly optimize the policy by estimating the gradient of expected reward.

Pooling Layer

A layer in neural networks that reduces spatial dimensions by aggregating values in local regions, decreasing computation and providing translation...

Positional Encoding

A mechanism that injects information about token positions into transformer models, since the attention mechanism itself has no inherent sense of o...

Pre-Training

The initial phase of training a model on a large, general-purpose dataset to learn broad representations before fine-tuning on specific tasks.

Precision and Recall

Two complementary classification metrics. Precision measures correctness of positive predictions; recall measures completeness of detecting actual ...

Prefix Caching

Reusing computed KV cache from shared prompt prefixes across multiple LLM requests.

Pretraining

Learn what pretraining is in AI and deep learning. Understand self-supervised learning, the two-stage process of pretraining and fine-tuning, and w...

Prompt Caching

Storing and reusing computed representations of repeated prompt prefixes to reduce latency and cost.

Prompt Engineering

Learn prompt engineering: the practice of designing inputs to get optimal AI outputs. Master zero-shot, few-shot, chain-of-thought, system prompts,...

Prompt Injection

A security vulnerability where malicious input tricks a language model into ignoring its instructions and following attacker-provided instructions ...

Prompt Optimization

Systematic techniques for improving prompt effectiveness through testing, iteration, and automated optimization.

Prompt Template

A reusable structure for formatting inputs to language models, with placeholders for dynamic content, ensuring consistent and effective prompting a...

Proximal Policy Optimization

A stable and efficient policy gradient algorithm that constrains policy updates to a trust region.

Pruning

A model compression technique that removes unnecessary weights, neurons, or layers from a trained neural network to reduce size and improve inferen...

Q

Q-Learning

Understand Q-Learning, a foundational reinforcement learning algorithm. Learn about Q-tables, exploration vs exploitation, and Deep Q-Networks in p...

Quantization

Learn what quantization is in AI and deep learning. Understand how reducing precision from FP32 to INT8 shrinks model size, speeds up inference, an...

R

R-Squared (Coefficient of Determination)

A regression metric that indicates what proportion of the variance in the target variable is explained by the model, ranging from 0 (no explanation...

Random Forest

An ensemble learning method that builds many decision trees using random subsets of data and features, then combines their predictions for robust, ...

Reasoning (in AI)

The ability of AI models to draw logical conclusions, solve multi-step problems, and make inferences beyond simple pattern matching.

Reasoning Model

An LLM specifically trained or prompted to perform explicit step-by-step reasoning before producing answers.

Recall@K

A retrieval metric measuring what fraction of relevant items appear in the top-K results returned by a search or recommendation system.

Recurrent Neural Network (RNN)

A neural network architecture with loops that allow information to persist across time steps, designed for processing sequential data.

Red Teaming (AI)

Systematically probing AI systems for vulnerabilities, failures, and harmful behaviors by simulating adversarial attacks and edge cases.

Regression

Learn what regression is in machine learning. Understand linear regression, non-linear regression, how the best-fit line works, loss functions, and...

Regularization

Learn what regularization is in machine learning. Understand L1 (Lasso) and L2 (Ridge) regularization, dropout, and how these techniques prevent ov...

Reinforcement Learning

Learn what Reinforcement Learning (RL) is -- a type of machine learning where an AI agent learns by interacting with an environment and receiving r...

Representation Learning

The automatic discovery of useful features and representations from raw data, eliminating the need for manual feature engineering.

Reservoir Computing

A framework using a fixed random recurrent network (reservoir) with only the output layer trained.

ResNet

A deep CNN architecture using skip connections that enabled training of networks with 100+ layers.

Responsible AI

An approach to AI development that prioritizes safety, fairness, transparency, privacy, and accountability.

Retrieval-Augmented Generation

Enhancing LLM outputs by retrieving relevant external documents and including them as context.

Retrieval-Augmented Generation (RAG)

An architecture that enhances LLM responses by first retrieving relevant documents from a knowledge base, then using them as context for generation.

Reward Model

A model trained to predict human preferences, used in RLHF to provide a scalar reward signal that guides language model training toward more helpfu...

Reward Shaping

Designing intermediate reward signals to guide RL agents toward desired behavior more efficiently.

RLHF

Learn how RLHF (Reinforcement Learning from Human Feedback) aligns large language models with human preferences through reward modeling and PPO opt...

RLHF (Reinforcement Learning from Human Feedback)

A training methodology that aligns language models with human preferences by using human feedback to train a reward model, then optimizing the LM a...

RLHF Overview

A training technique that aligns language models with human preferences by using human feedback to train a reward model that guides further model o...

RoBERTa

An optimized version of BERT that achieves better performance through improved training methodology.

ROC-AUC

A metric measuring a classifier's ability to distinguish between classes across all thresholds.

RoPE (Rotary Position Embedding)

A positional encoding method that encodes position information through rotation of the query and key vectors, enabling relative position awareness ...

ROUGE Score

Recall-Oriented Understudy for Gisting Evaluation -- a set of metrics for evaluating text summarization by measuring overlap between generated and ...

S

Safety Evaluation

Systematic testing of AI models for harmful outputs, vulnerabilities, and alignment with safety requirements.

Safety Filter

A mechanism that screens AI model inputs and outputs to prevent harmful content, typically using classifiers, rule-based systems, or secondary models.

Sampling Strategy

The method used to select the next token during language model text generation, controlling the balance between quality, diversity, and creativity.

SARSA

State-Action-Reward-State-Action -- an on-policy reinforcement learning algorithm that updates Q-values based on the action actually taken in the n...

Scaling Law

Empirical relationships showing that model performance improves predictably as a power law with more compute, data, or parameters.

Self-Attention

The mechanism within transformers where each token in a sequence computes attention scores with every other token, determining how much to 'attend ...

Self-Play

A training approach where an AI agent improves by playing against copies of itself.

Self-Supervised Learning

A training paradigm where models learn from unlabeled data by creating their own supervisory signal from the data itself, such as predicting masked...

Semantic Search

A search approach that understands the meaning and intent behind a query rather than just matching keywords, powered by embeddings and vector simil...

Semantic Segmentation

A computer vision task that assigns a class label to every pixel in an image, creating a detailed understanding of the scene at pixel-level granula...

Semantic Similarity

A measure of how closely related the meaning of two pieces of text (or other data) are, typically computed using embeddings and distance metrics li...

Semi-Supervised Learning

A training approach that combines a small amount of labeled data with a large amount of unlabeled data.

Sentence Transformers

A framework that produces semantically meaningful sentence embeddings for comparison and search.

SentencePiece

A language-independent tokenizer that treats text as raw bytes, requiring no pre-tokenization.

Sentiment Analysis

An NLP task that determines the emotional tone or opinion expressed in text, classifying it as positive, negative, neutral, or along more nuanced d...

Sequence Modeling

The task of processing, understanding, or generating ordered sequences of data, including text, time series, audio, and genomic data.

Sequence-to-Sequence (Seq2Seq)

A model architecture that transforms one sequence into another, originally using encoder-decoder RNNs and now primarily using transformers.

SGD (Stochastic Gradient Descent)

The foundational optimization algorithm that updates model weights using the gradient computed from a single example or mini-batch, introducing ben...

Shadow Deployment

Running a new model in parallel with production, processing real traffic without serving its predictions.

Skip Connection (Residual Connection)

A shortcut that adds a layer's input directly to its output, allowing gradients to flow through deep networks and enabling the training of very dee...

Softmax

A mathematical function that converts a vector of raw scores (logits) into a probability distribution, where all values are between 0 and 1 and sum...

Sora

OpenAI's text-to-video generation model that creates photorealistic videos from text descriptions, demonstrating emergent understanding of physics ...

Sparse Attention

Attention mechanisms that attend to only a subset of positions rather than all positions, reducing the quadratic cost of standard attention to sub-...

Speculative Decoding

Accelerating LLM inference by using a small draft model to predict multiple tokens verified in parallel by the main model.

Speculative Decoding

An inference acceleration technique where a smaller, faster model drafts multiple tokens that are then verified in parallel by the larger model.

Speech Recognition

AI technology that converts spoken language into text, enabling voice interfaces, transcription services, and real-time captioning with increasing ...

Speech-to-Text (ASR)

Automatic Speech Recognition technology that converts spoken audio into written text, using neural networks to transcribe human speech.

Stable Diffusion

An open-source text-to-image generation model that works by diffusing and denoising in a compressed latent space, making it faster and more accessi...

State Space Model

A sequence model architecture based on continuous state space representations, offering linear-time computation.

Statistical Significance

A measure of whether observed differences in model performance are likely real or due to random chance.

Stochastic Gradient Descent

Understand Stochastic Gradient Descent (SGD), the optimization algorithm that powers deep learning. Learn about batch vs mini-batch, the update rul...

Structured Output

Constraining LLM outputs to follow specific formats like JSON, XML, or custom schemas.

Structured Output

Techniques for constraining language model outputs to follow specific formats like JSON schemas, XML, or custom grammars, ensuring machine-parseabl...

Superalignment

The challenge of aligning AI systems that are significantly smarter than humans with human values and intentions.

Supervised Learning

Learn what Supervised Learning is -- the most widely used machine learning approach. Understand training with labeled data, classification vs regre...

Support Vector Machine (SVM)

A classical machine learning algorithm that finds the optimal hyperplane separating different classes, maximizing the margin between the closest da...

Synthetic Data

Artificially generated data that mimics real-world data patterns, used for training when real data is scarce or sensitive.

Synthetic Data

Artificially generated data that mimics the statistical properties of real data, used when real data is scarce, expensive, sensitive, or imbalanced.

Synthetic Data Generation

Creating artificial training data using algorithms or AI models to augment or replace real-world data.

System Prompt

Initial instructions that define an LLM's behavior, personality, and constraints for a conversation.

System Prompt

Instructions given to a language model that set its behavior, persona, constraints, and capabilities for an entire conversation, separate from user...

T

T5

A transformer model that frames all NLP tasks as text-to-text problems with a unified architecture.

Temperature (in LLMs)

A parameter that controls the randomness of language model outputs. Lower temperature produces more focused, deterministic responses; higher temper...

Tensor

A multi-dimensional array of numbers that serves as the fundamental data structure in deep learning frameworks like PyTorch and TensorFlow.

Tensor Parallelism

A distributed training strategy that splits individual layers of a model across multiple GPUs, enabling training of models too large to fit on a si...

Test-Time Compute

Using additional computation during inference (not just training) to improve model outputs, through techniques like chain-of-thought, self-reflecti...

Test-Time Training

Adapting a model's parameters on each test input to improve predictions at inference time.

Text Classification

The task of assigning predefined categories or labels to text documents.

Text Embedding

A dense vector representation of text (word, sentence, or document) that captures semantic meaning in a numerical format suitable for machine learn...

Text Generation

The process of producing coherent, contextually appropriate text using language models, encompassing everything from chatbot responses to creative ...

Text-to-Audio Generation

AI models that generate audio content (speech, music, sound effects) from text descriptions.

Text-to-Image Generation

AI models that create images from natural language text descriptions (prompts).

Text-to-Speech (TTS)

AI technology that converts written text into natural-sounding spoken audio, using neural networks to generate human-like voice patterns.

Text-to-Video Generation

AI systems that generate video content from natural language descriptions.

Textual Inversion

A method that learns new text embeddings to represent specific visual concepts for image generation.

the AI Black Box Problem

Learn about the AI black box problem - why complex AI models make decisions we cannot explain. Discover Explainable AI (XAI) and why transparency m...

the AI Singularity

Understand the AI Singularity: the hypothetical point where artificial intelligence surpasses human intelligence and triggers an unstoppable intell...

the Attention Mechanism

Learn what the attention mechanism is in AI. Understand Query-Key-Value, self-attention vs cross-attention, multi-head attention, scaled dot-produc...

Token

The basic unit of text that language models process -- typically a word, subword, or character, depending on the tokenizer used.

Tokenization

The process of breaking text into smaller units (tokens) that AI models can process, a fundamental preprocessing step that significantly impacts mo...

Tokenization in AI

Learn what tokenization is in AI and NLP. Understand how text is broken into tokens using BPE, WordPiece, and SentencePiece. Discover why token lim...

Tokenizer

A component that converts raw text into a sequence of tokens (integers) that a language model can process, and converts token IDs back into text.

Tokenomics (AI Context)

The economics and pricing structure of AI API services based on input and output token counts.

Tool Use (in LLMs)

The ability of language models to invoke external tools, APIs, and functions to extend their capabilities beyond pure text generation.

Tool-Augmented LLM

LLMs that can invoke external tools (APIs, code execution, search) to extend their capabilities.

Top-P (Nucleus) Sampling

A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p, dynamically adjusting t...

Training Data

Learn what training data is in machine learning, the difference between labeled and unlabeled data, how datasets are split into train/validation/te...

Training Loss

A numerical measure of how wrong the model's predictions are on training data, which the optimization algorithm works to minimize during training.

Transfer Learning

Understand transfer learning in AI: how pre-trained models transfer knowledge between tasks, dramatically reducing training time and data needs. Le...

Tree of Thoughts

An LLM reasoning framework that explores multiple reasoning branches and evaluates which paths are most promising.

Turing Test

A test of machine intelligence proposed by Alan Turing in 1950, where a human evaluator tries to distinguish between a machine and a human based so...

U

U-Net

A neural network architecture shaped like a 'U' with an encoder that downsamples and a decoder that upsamples, connected by skip connections, widel...

Underfitting

When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test sets.

Unsupervised Learning

Learn what unsupervised learning is in AI and machine learning. Understand clustering (k-means, DBSCAN), dimensionality reduction (PCA, t-SNE), ano...

Utility in AI

Understand utility in AI: how AI agents measure the desirability of outcomes using utility functions. Learn about expected utility, decision theory...

V

Vanishing Gradient Problem

A training difficulty where gradients become extremely small as they propagate backward through many layers, preventing early layers from learning ...

Variational Autoencoder (VAE)

A generative model that learns a probabilistic latent space, enabling smooth generation of new data by sampling and decoding from learned distribut...

Vector Search

Finding the most similar items in a database by comparing their vector (embedding) representations using distance metrics like cosine similarity.

VGGNet

A deep CNN that demonstrated the effectiveness of using very small (3x3) convolution filters.

Vision-Language Model

A model that jointly understands and reasons about both visual and textual information.

vLLM

An open-source library for fast LLM inference and serving, using PagedAttention to efficiently manage GPU memory and increase throughput.

W

Weight Decay

A regularization technique that adds a penalty proportional to the magnitude of model weights to the loss function, preventing weights from growing...

Weight Initialization

The method used to set initial values of neural network weights before training begins, crucial for training stability and convergence.

Weight Quantization

Reducing the numerical precision of model weights (e.g., from 32-bit to 4-bit) to decrease model size and increase speed.

Weights & Biases (W&B)

A popular MLOps platform for experiment tracking, model monitoring, dataset versioning, and collaborative machine learning development.

What are Embeddings

Learn what embeddings are in AI. Understand how words, sentences, and images are converted into dense numerical vectors. Explore Word2Vec, GloVe, s...

Whisper

OpenAI's open-source automatic speech recognition model that transcribes and translates speech in 100+ languages with near-human accuracy.

Word2Vec

A neural network model that learns dense vector representations of words from large text corpora.

WordPiece

A subword tokenization method used by BERT that maximizes language model likelihood.

World Model

An internal representation that allows an AI system to simulate and predict how the environment behaves.

World Model

An internal representation learned by an AI system that simulates how the environment works, enabling prediction, planning, and reasoning about con...

X

xAI

Elon Musk's AI company that developed the Grok family of language models, positioning itself as an alternative to OpenAI and Anthropic.

XGBoost

Learn what XGBoost is. Understand gradient boosting, how sequential decision trees correct each other's errors, when to use XGBoost over deep learn...

XLNet

A generalized autoregressive model that captures bidirectional context using permutation-based training.

Y

YAML in AI

Understand what YAML is and how it is used in AI and machine learning. Learn YAML syntax basics, its role in ML pipelines, how it compares to JSON,...

YOLO (You Only Look Once)

Understand YOLO (You Only Look Once), the revolutionary real-time object detection algorithm. Learn how single-shot detection works, the YOLO archi...

Z

ZeRO Optimizer

A memory optimization that partitions optimizer states, gradients, and parameters across GPUs.

Zero-Shot Classification

The ability to classify data into categories the model has never been explicitly trained on, by leveraging general knowledge learned during pre-tra...

Zero-Shot Learning

Understand zero-shot learning: how AI models perform tasks without specific training examples. Learn about zero-shot vs few-shot, CLIP, zero-shot c...

Zero-Shot Prompting

Asking a language model to perform a task without providing any examples, relying entirely on the model's pre-trained knowledge and instruction-fol...

Zero-Shot Reasoning

The ability of LLMs to solve reasoning tasks without any task-specific examples, often triggered by prompts like 'let's think step by step'.