A
Learn what a chatbot is, the difference between rules-based and AI-powered chatbots, and how they use Large Language Models to simulate human conve...
Learn what a Diffusion Model is in AI. Discover how AI art generators like Midjourney and Stable Diffusion create images by reversing noise through...
Learn what a feature is in machine learning and AI. Understand feature types, feature engineering, feature selection, and why the right features ar...
Learn what a Foundation Model is in AI. Understand how massive pre-trained models like GPT-4 and Claude serve as a base for thousands of specialize...
Learn what a GAN (Generative Adversarial Network) is in AI. Understand how two competing neural networks—a Generator and a Discriminator—create rea...
Learn what a GPU (Graphics Processing Unit) is and why it is the essential hardware powering modern AI training and deep learning workloads.
Learn what a hidden layer is in neural networks, why they are called 'hidden,' how they transform data, and the difference between deep and shallow...
Understand what a kernel is in AI and machine learning. Learn about kernels in SVMs, the kernel trick for nonlinear classification, kernels in CNNs...
Learn what a Large Language Model (LLM) is, how it works by predicting the next word, its capabilities in writing and coding, and its key limitations.
Learn what a loss function is in machine learning, common loss functions like MSE and cross-entropy, how loss guides learning through backpropagati...
Learn what a neural network is and how it works. Understand layers, weights, biases, activation functions, forward propagation, and backpropagation...
Learn what a prompt is in AI, the anatomy of a good prompt, zero-shot vs few-shot prompting, and the difference between system and user prompts. A ...
Learn what a Transformer is in AI. Understand the self-attention mechanism, encoder-decoder architecture, and why Transformers power GPT, BERT, Cla...
Learn what a validation set is in machine learning. Understand the train/validation/test split, why you cannot use the test set for tuning, and how...
Learn about vector databases: how they store and search embeddings using similarity search, power RAG systems, and compare popular options like Pin...
Learn what a Vision Transformer (ViT) is. Understand how ViT splits images into patches, applies self-attention, and rivals CNNs for image recognit...
Learn what weights are in neural networks. Understand how weights control the strength of connections between neurons, how they are initialized, an...
Learn what word embeddings are. Understand how words become vectors, how semantic similarity works in vector space, and how Word2Vec, GloVe, and mo...
A controlled experiment comparing two model variants to determine which performs better on real users, essential for validating ML improvements in ...
Comparing two ML model versions in production by routing traffic between them and measuring outcomes.
A systematic experiment where components of a model or system are removed one at a time to measure the contribution of each component to overall pe...
An NLP technique that generates new, concise text capturing key information from a source document, rather than simply extracting existing sentences.
A mathematical function applied to a neuron's output that introduces non-linearity, enabling neural networks to learn complex patterns.
A machine learning approach where the model identifies which unlabeled examples would be most informative to label next, minimizing annotation effort.
An RL architecture combining a policy network (actor) with a value network (critic) for stable training.
An adaptive learning rate optimization algorithm that combines momentum and RMSProp, widely used as the default optimizer for training neural netwo...
Small, trainable modules inserted into a pre-trained model that enable task-specific fine-tuning without modifying the original model weights.
Deliberately crafted inputs designed to cause AI models to make incorrect predictions, often imperceptible to humans but devastating to models.
A model's ability to maintain correct predictions when inputs are intentionally perturbed by an attacker.
An AI system that autonomously plans, executes, and iterates on multi-step tasks using tools and reasoning.
Understand Artificial General Intelligence (AGI) - the theoretical leap from narrow AI specialists to a single, unified intelligence capable of rea...
Specialized hardware designed to speed up AI and machine learning workloads, including GPUs, TPUs, and custom AI chips.
Software libraries that provide the building blocks for creating autonomous AI agents with tools, memory, and planning.
Learn about the AI Alignment problem - the critical challenge of ensuring AI goals match human values. Understand misalignment risks, the paperclip...
Visual artwork created with the assistance of artificial intelligence, typically using generative models like diffusion models or GANs to produce i...
Understand AI bias - how machine learning models can develop systematic prejudices from unbalanced training data. Learn about real-world consequenc...
A specialized processor designed specifically for AI workloads, optimized for the matrix operations and parallel computation that neural networks r...
Specialized semiconductor hardware designed to accelerate AI workloads including training and inference.
AI tools that help developers write, debug, review, and understand code using large language models.
AI systems designed for ongoing conversational interaction, emotional support, or personal assistance.
A virtual replica of a physical system or process, enhanced with AI for simulation, prediction, and optimization.
The significant and growing energy demands of training and running AI models in data centers worldwide.
The study of moral principles and values that should guide the design, development, and deployment of artificial intelligence systems.
Using AI to accelerate scientific discovery across physics, chemistry, biology, and other disciplines.
The frameworks, policies, and practices that organizations and governments use to ensure AI systems are developed and deployed responsibly.
Programmatic safety constraints that filter, validate, or modify LLM inputs and outputs.
When an AI model generates confident but factually incorrect, fabricated, or nonsensical information that has no basis in its training data or real...
Learn what AI inference is - the process of using a trained machine learning model to make predictions on new data. Understand the difference betwe...
The ability to understand, evaluate, and effectively interact with AI systems in daily life and work.
The interdisciplinary field focused on ensuring AI systems behave as intended and do not cause unintended harm, especially as systems become more c...
Isolated environments for testing AI systems safely before deployment, limiting their access and capabilities.
Search engines enhanced with AI that provide synthesized answers, summaries, and conversational search experiences.
The practice of making AI systems' operations, decisions, and limitations understandable to stakeholders.
Embedding hidden signals in AI-generated content to enable detection and attribution.
Physical devices worn on the body that integrate AI for real-time assistance, monitoring, or augmented experiences.
A period of reduced funding and interest in artificial intelligence research, historically occurring when AI fails to meet inflated expectations.
Understand the AI Zeitgeist: the prevailing spirit and dominant trends shaping artificial intelligence today. Explore the current AI wave including...
Using AI and machine learning to automate and enhance IT operations, monitoring, and incident management.
The principle that organizations deploying AI should be answerable for the outcomes of their systems.
Learn about AI agents: autonomous systems that perceive, reason, plan, and act. Understand the agent loop, types of agents, tool use, function call...
Understand what AI hallucinations are, why large language models generate false or fabricated information with confidence, and how to mitigate the ...
Learn what an AI model is - the trained artifact produced by machine learning that contains the knowledge and parameters needed to make predictions...
Learn what an algorithm is with this visual, beginner-friendly guide. Understand how algorithms work as step-by-step instructions, from simple reci...
Learn what an encoder is in AI and deep learning, how it compresses data into meaningful representations, the encoder-decoder architecture, and use...
Understand what an iteration means in AI and machine learning. Learn how training loops, epochs, and batches work together to teach neural networks...
Learn what an optimizer is in machine learning, how gradient descent works, popular optimizers like SGD and Adam, the connection to learning rate, ...
The process of identifying data points, events, or observations that deviate significantly from expected patterns.
An AI safety company founded by former OpenAI researchers, creator of the Claude family of language models, focused on building reliable and safe A...
An Application Programming Interface that allows developers to access AI model capabilities programmatically through structured requests and respon...
A hypothetical AI system with human-level cognitive abilities across all domains, capable of learning any intellectual task that a human can perform.
A computing system inspired by biological neural networks, consisting of interconnected nodes (neurons) organized in layers that process information.
An open-access repository where researchers publish preprints of scientific papers, serving as the primary venue for sharing AI and machine learnin...
An individual attention mechanism within a multi-head attention layer, learning to focus on different aspects of the input like syntax, semantics, ...
An individual attention computation within a multi-head attention layer, each head learning to focus on different types of relationships in the input.
A numerical weight that determines how much a token attends to (or focuses on) another token in the attention mechanism of transformers.
The phenomenon where LLMs disproportionately attend to the first few tokens regardless of their content.
Area Under the Receiver Operating Characteristic Curve -- a metric that measures a classification model's ability to distinguish between classes ac...
A neural network trained to compress input data into a compact representation (encoding) and then reconstruct the original data from that represent...
Automated Machine Learning -- tools and techniques that automate the process of building machine learning models, from feature engineering to model...
An AI system that can independently perceive its environment, make decisions, and take actions to achieve goals without constant human supervision.
AI systems that enable self-driving vehicles through perception, planning, and decision-making algorithms.
C
Gradually rolling out a new model version to a small subset of traffic before full deployment.
A neural network architecture proposed by Geoffrey Hinton that uses groups of neurons (capsules) to better capture spatial hierarchies and part-who...
The tendency of neural networks to abruptly lose previously learned knowledge when trained on new data or tasks.
The tendency of neural networks to abruptly forget previously learned information when trained on new data.
Methods for determining cause-and-effect relationships from data, going beyond correlation to understand why things happen and predict intervention...
A language model that generates text left-to-right, predicting each token based only on the tokens that came before it (never looking ahead).
A prompting technique that encourages language models to show their reasoning step-by-step before arriving at a final answer, significantly improvi...
A prompting technique where LLMs solve problems by generating intermediate reasoning steps.
OpenAI's conversational AI product that brought large language models to mainstream awareness, powered by GPT-3.5 and later GPT-4.
A saved snapshot of a model's weights and training state at a specific point during training, enabling recovery from failures and selection of the ...
AI models developed by Chinese companies including DeepSeek, Qwen, Yi, and Baichuan, increasingly competitive globally.
A dataset condition where some classes have significantly more examples than others, causing models to be biased toward the majority class.
Learn what Classification means in AI. Understand how machine learning models sort data into categories like spam detection and image recognition, ...
A technique that improves conditional generation quality by interpolating between conditional and unconditional predictions.
Anthropic's family of large language models designed with a focus on safety, helpfulness, and honesty, using Constitutional AI alignment techniques.
Contrastive Language-Image Pre-training -- an OpenAI model that learns to connect images and text by training on 400 million image-text pairs from ...
AI services and infrastructure provided through cloud platforms, enabling organizations to train, deploy, and scale AI models without owning specia...
An unsupervised learning technique that groups similar data points together without predefined labels.
AI systems that automatically write programming code from natural language descriptions, transforming software development with tools like GitHub C...
A recommendation technique that predicts user preferences based on the collective behavior of many users, assuming that users who agreed in the pas...
The total computational resources (measured in FLOPs, GPU-hours, or dollars) allocated for training or running an AI model.
A group of interconnected computers (typically GPU servers) working together to train large AI models, connected by high-speed networking.
Learn what Computer Vision means in AI. Discover how machines interpret images through pixels, patterns, and perception with interactive visual exa...
The change in the relationship between input features and the target variable over time.
When an AI model generates plausible-sounding but factually incorrect information, also known as hallucination — a major reliability challenge for ...
A table that visualizes a classification model's predictions versus actual labels, showing true positives, true negatives, false positives, and fal...
An alignment technique developed by Anthropic where an AI system is guided by a set of written principles (a 'constitution') rather than relying so...
An alignment technique where AI systems self-improve using a set of principles (a constitution) as guidance.
A recommendation approach that suggests items similar to what a user has previously liked, based on item features rather than other users' behavior.
Techniques that allow models to handle longer sequences at inference than they saw during training.
The maximum amount of text (measured in tokens) that a language model can process in a single input-output interaction.
The ability of a model to learn from a continuous stream of data over time, accumulating knowledge without forgetting previously learned information.
An inference optimization that dynamically adds and removes requests from a batch as they complete.
Automatically retraining ML models when new data arrives or performance degrades below a threshold.
A self-supervised learning approach that trains models to bring similar examples closer together and push dissimilar examples apart in a learned re...
A neural network that adds spatial conditioning controls (edges, poses, depth) to diffusion models.
A neural network layer that applies learnable filters (kernels) across input data to detect local patterns like edges, textures, and shapes.
A neural network architecture that uses learnable filters to detect spatial patterns in data, primarily used for image processing and computer visi...
A neural network architecture specialized for processing grid-like data such as images, using learned filters that detect features like edges, text...
A metric that measures the angle between two vectors, quantifying how similar their directions are regardless of magnitude. Values range from -1 (o...
The most common loss function for classification tasks, measuring the difference between predicted probability distributions and true labels.
The most common loss function for classification tasks, measuring the difference between predicted probability distributions and actual labels.
A resampling technique that trains and evaluates a model on multiple splits of the data for robust performance estimates.
A training strategy that presents examples to the model in a meaningful order -- typically from easy to hard -- mimicking how humans learn.
D
OpenAI's text-to-image generation model that creates original images from text descriptions, named after Salvador Dali and WALL-E.
Learn what data annotation is and why it matters for AI. Understand classification, object detection, and semantic segmentation - the three levels ...
Techniques for artificially increasing the size and diversity of training data by creating modified versions of existing examples.
The process of selecting, cleaning, and organizing training data to maximize AI model quality.
A change in the statistical properties of the data a model receives in production compared to the data it was trained on, causing model performance...
A virtuous cycle where a product generates data that improves its ML models, which attract more users and generate more data.
Learn what data means in artificial intelligence, the types of data (structured, unstructured, semi-structured), data quality, data pipelines, and ...
The process of annotating raw data with meaningful tags or categories, creating the labeled datasets needed to train supervised machine learning mo...
A centralized repository that stores vast amounts of raw data in its native format until needed for analysis or model training.
When information from outside the training set improperly influences model training, leading to overly optimistic performance estimates that don't ...
A decentralized data architecture where domain teams own and serve their data as products for organization-wide use.
A distributed training strategy that replicates the model on multiple GPUs, with each GPU processing a different subset of the training data.
An automated series of steps that collect, process, transform, and deliver data from source systems to where it's needed for AI model training or i...
An adversarial attack where malicious data is injected into a training dataset to corrupt the model's learned behavior.
A centralized repository that stores structured, processed data from multiple sources for analytics and ML training.
A neuron in a neural network that always outputs zero (or a constant), contributing nothing to the model's predictions, typically caused by large n...
A BERT variant using disentangled attention that separates content and position information.
A supervised learning algorithm that makes predictions by learning a series of if-then-else decision rules from features, visualizable as a tree st...
In transformer architecture, a decoder generates output tokens one at a time using masked self-attention to prevent looking at future tokens, plus ...
Learn what Deep Learning means in AI. Understand how neural networks use multiple layers to build hierarchical understanding, from edges to complex...
A deep learning extension of Q-learning that uses neural networks to approximate the action-value function.
AI-generated synthetic media that replaces a person's likeness in existing images or videos, typically using deep learning techniques like GANs or ...
The process of removing noise from data, a fundamental operation in diffusion models where the network learns to progressively remove Gaussian nois...
A computer vision task that predicts the distance of each pixel from the camera, creating a depth map from a single image or stereo pair.
A deterministic process always produces the same output for the same input, while a stochastic process involves randomness and may produce differen...
A mathematical framework that provides formal guarantees that individual data cannot be identified in model outputs.
Techniques that reduce the number of features in data while preserving the most important information, making data easier to visualize and process.
Techniques that reduce the number of features in a dataset while preserving important information, enabling visualization and combating the curse o...
An alignment technique that optimizes language models from human preferences without training a separate reward model.
An alignment technique that trains language models directly on human preference data without needing a separate reward model, simplifying the RLHF ...
Training AI models across multiple GPUs or machines simultaneously, essential for large models that exceed the memory and compute capacity of a sin...
A fine-tuning technique that teaches diffusion models to generate specific subjects from a few photos.
E
A regularization technique that halts training when validation performance stops improving, preventing the model from overfitting to the training d...
Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud for processing.
A pre-training method that uses replaced token detection instead of masked language modeling.
A ranking system adapted from chess that scores AI models based on head-to-head comparisons from human evaluators, used in the Chatbot Arena leader...
A model specifically designed to convert text, images, or other data into dense vector representations (embeddings) that capture semantic meaning.
A continuous vector space where items (words, images, users) are represented as points, with spatial relationships encoding semantic similarity and...
A capability that appears in large AI models but is absent in smaller ones, seemingly arising unpredictably as model scale increases.
Unexpected capabilities that appear in large AI models but are absent in smaller versions of the same architecture.
A transformer architecture with separate encoder and decoder components, where the encoder processes input and the decoder generates output conditi...
A transformer architecture that processes the entire input bidirectionally to produce rich contextual representations, used for understanding tasks...
A technique that combines predictions from multiple models to achieve better performance than any single model, leveraging the 'wisdom of crowds' p...
The process of identifying and categorizing key information elements (entities) from unstructured text, such as names, dates, amounts, and organiza...
One complete pass through the entire training dataset during model training.
The practice of developing and deploying AI systems that are fair, transparent, accountable, and aligned with human values and societal well-being.
The practice of designing, developing, and deploying AI systems that are fair, transparent, accountable, and aligned with human values and societal...
A quantitative measure used to assess how well a machine learning model performs on a given task, guiding model selection and improvement.
The systematic recording of ML experiment parameters, metrics, and artifacts for comparison and reproducibility.
Techniques and methods for making AI model decisions understandable to humans, answering the question 'why did the model make this prediction?'
Learn what Explainable AI (XAI) is. Understand why AI transparency matters, how methods like SHAP, LIME, and attention visualization work, and how ...
The fundamental RL dilemma between trying new actions (exploration) and using known good actions (exploitation).
An NLP technique that creates summaries by selecting and combining the most important sentences directly from the source text, without generating n...
M
Learn what machine learning is - a method of teaching computers to learn from data and make predictions without explicit programming. Understand tr...
Automatic translation of text or speech between languages using AI, from early rule-based systems to modern neural approaches achieving near-human ...
Techniques for removing the influence of specific training data from a trained model.
A sequence modeling architecture based on selective state spaces that processes sequences in linear time, offering an alternative to the quadratic ...
A programming model for processing large datasets in parallel across a cluster, and in AI contexts, a pattern for processing data that exceeds a mo...
An extension of Faster R-CNN that adds pixel-level instance segmentation to object detection.
A pre-training objective where random tokens in the input are replaced with a [MASK] token, and the model must predict the original tokens from con...
A regression metric that calculates the average absolute difference between predicted and actual values, giving equal weight to all errors.
A regression loss function and evaluation metric that calculates the average of squared differences between predicted and actual values, penalizing...
AI systems designed for healthcare applications including diagnosis, drug discovery, and clinical decision support.
Learning to learn — training AI systems that can quickly adapt to new tasks with minimal data by leveraging experience from previous tasks.
Learning a distance function that maps similar inputs close together and dissimilar inputs far apart.
A popular AI image generation service known for its distinctive artistic style and high-quality outputs, accessed primarily through a Discord bot i...
A small subset of the training dataset used for a single parameter update during gradient descent, balancing the efficiency of batch processing wit...
A French AI company known for producing efficient, high-performing open-weight language models that punch above their weight class in benchmarks.
Training neural networks using both 16-bit and 32-bit floating-point numbers to reduce memory and increase speed.
An approach where multiple LLMs collaborate on a task, with each model contributing its strengths and a combining mechanism selecting or merging th...
A model architecture that routes each input to a subset of specialized sub-networks (experts) for efficient scaling.
Software that optimizes machine learning model computations for specific hardware, translating high-level model descriptions into efficient low-lev...
An automated workflow that orchestrates the steps of training, evaluating, and deploying ML models.
An open-source platform for managing the complete machine learning lifecycle, including experiment tracking, model packaging, deployment, and regis...
Systematic evaluation of AI systems for bias, fairness, safety, and compliance with standards.
How well a model's predicted probabilities match actual outcome frequencies.
A documentation framework for machine learning models that describes their intended use, performance characteristics, limitations, and ethical cons...
Performance degradation when AI models are trained on data generated by previous AI models.
A phenomenon where models trained on AI-generated data progressively lose quality and diversity over generations, eventually producing degenerate o...
A standard protocol (MCP) for connecting AI models to external data sources, tools, and services through a unified interface.
A technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model, achieving comparable performance with few...
Learn what model evaluation is in machine learning. Understand accuracy, precision, recall, F1 score, confusion matrices, cross-validation, and how...
The systematic process of assessing an AI model's performance using metrics, test sets, and human judgment to determine if it meets quality and saf...
Combining the weights of multiple fine-tuned models into a single model that inherits capabilities from all parent models, without additional train...
The continuous tracking of deployed ML model performance to detect degradation and data drift.
Distributing a single AI model across multiple GPUs or machines, necessary when a model is too large to fit in the memory of a single device.
A centralized repository for managing, versioning, and tracking machine learning models.
The infrastructure and process of deploying trained ML models to handle real-time or batch prediction requests in production environments.
Splitting a large model across multiple devices so each device holds only a portion of the parameters.
Tracking and managing different iterations of ML models with their associated code, data, and configurations.
Computational algorithms that use repeated random sampling to estimate numerical results, widely used in reinforcement learning and probabilistic i...
A search algorithm that uses random sampling to explore decision trees for optimal action selection.
An AI architecture where multiple specialized agents collaborate, debate, or coordinate to solve complex problems that are difficult for a single a...
A simplified RL framework for making sequential decisions among multiple options with uncertain rewards.
Training a single model to perform multiple related tasks simultaneously, sharing representations across tasks to improve generalization and effici...
An extended dialogue between a user and AI spanning multiple exchanges, requiring context tracking across turns.
Learn what Multimodal AI is - a unified AI system that can process and understand multiple types of data like text, images, audio, and video simult...
Training AI models to understand and generate content across multiple data types (text, images, audio, video).
An AI model that can process and generate multiple types of data -- text, images, audio, video -- within a single unified architecture.
Extending RAG to retrieve and process multiple data types including text, images, tables, and audio.
P
Adding extra values (typically zeros) around the edges of input data to control output dimensions in convolutions, or to the end of sequences to cr...
A memory management technique for LLM inference that handles KV cache like virtual memory pages.
The learnable values within a neural network that are adjusted during training to minimize the loss function. Model size is typically measured in p...
A family of techniques that fine-tune only a small number of model parameters while keeping most of the pre-trained model frozen, dramatically redu...
A metric for evaluating language models that measures how 'surprised' the model is by test data. Lower perplexity means the model predicts the text...
A distributed training strategy that splits a model's layers across multiple GPUs, with each GPU processing a different stage of the forward/backwa...
A set of 3D data points representing the surface of objects, captured by LiDAR or depth sensors.
A class of RL algorithms that directly optimize the policy by estimating the gradient of expected reward.
A layer in neural networks that reduces spatial dimensions by aggregating values in local regions, decreasing computation and providing translation...
A mechanism that injects information about token positions into transformer models, since the attention mechanism itself has no inherent sense of o...
The initial phase of training a model on a large, general-purpose dataset to learn broad representations before fine-tuning on specific tasks.
Two complementary classification metrics. Precision measures correctness of positive predictions; recall measures completeness of detecting actual ...
Reusing computed KV cache from shared prompt prefixes across multiple LLM requests.
Learn what pretraining is in AI and deep learning. Understand self-supervised learning, the two-stage process of pretraining and fine-tuning, and w...
Storing and reusing computed representations of repeated prompt prefixes to reduce latency and cost.
Learn prompt engineering: the practice of designing inputs to get optimal AI outputs. Master zero-shot, few-shot, chain-of-thought, system prompts,...
A security vulnerability where malicious input tricks a language model into ignoring its instructions and following attacker-provided instructions ...
Systematic techniques for improving prompt effectiveness through testing, iteration, and automated optimization.
A reusable structure for formatting inputs to language models, with placeholders for dynamic content, ensuring consistent and effective prompting a...
A stable and efficient policy gradient algorithm that constrains policy updates to a trust region.
A model compression technique that removes unnecessary weights, neurons, or layers from a trained neural network to reduce size and improve inferen...
R
A regression metric that indicates what proportion of the variance in the target variable is explained by the model, ranging from 0 (no explanation...
An ensemble learning method that builds many decision trees using random subsets of data and features, then combines their predictions for robust, ...
The ability of AI models to draw logical conclusions, solve multi-step problems, and make inferences beyond simple pattern matching.
An LLM specifically trained or prompted to perform explicit step-by-step reasoning before producing answers.
A retrieval metric measuring what fraction of relevant items appear in the top-K results returned by a search or recommendation system.
A neural network architecture with loops that allow information to persist across time steps, designed for processing sequential data.
Systematically probing AI systems for vulnerabilities, failures, and harmful behaviors by simulating adversarial attacks and edge cases.
Learn what regression is in machine learning. Understand linear regression, non-linear regression, how the best-fit line works, loss functions, and...
Learn what regularization is in machine learning. Understand L1 (Lasso) and L2 (Ridge) regularization, dropout, and how these techniques prevent ov...
Learn what Reinforcement Learning (RL) is -- a type of machine learning where an AI agent learns by interacting with an environment and receiving r...
The automatic discovery of useful features and representations from raw data, eliminating the need for manual feature engineering.
A framework using a fixed random recurrent network (reservoir) with only the output layer trained.
A deep CNN architecture using skip connections that enabled training of networks with 100+ layers.
An approach to AI development that prioritizes safety, fairness, transparency, privacy, and accountability.
Enhancing LLM outputs by retrieving relevant external documents and including them as context.
An architecture that enhances LLM responses by first retrieving relevant documents from a knowledge base, then using them as context for generation.
A model trained to predict human preferences, used in RLHF to provide a scalar reward signal that guides language model training toward more helpfu...
Designing intermediate reward signals to guide RL agents toward desired behavior more efficiently.
Learn how RLHF (Reinforcement Learning from Human Feedback) aligns large language models with human preferences through reward modeling and PPO opt...
A training methodology that aligns language models with human preferences by using human feedback to train a reward model, then optimizing the LM a...
A training technique that aligns language models with human preferences by using human feedback to train a reward model that guides further model o...
An optimized version of BERT that achieves better performance through improved training methodology.
A metric measuring a classifier's ability to distinguish between classes across all thresholds.
A positional encoding method that encodes position information through rotation of the query and key vectors, enabling relative position awareness ...
Recall-Oriented Understudy for Gisting Evaluation -- a set of metrics for evaluating text summarization by measuring overlap between generated and ...
S
Systematic testing of AI models for harmful outputs, vulnerabilities, and alignment with safety requirements.
A mechanism that screens AI model inputs and outputs to prevent harmful content, typically using classifiers, rule-based systems, or secondary models.
The method used to select the next token during language model text generation, controlling the balance between quality, diversity, and creativity.
State-Action-Reward-State-Action -- an on-policy reinforcement learning algorithm that updates Q-values based on the action actually taken in the n...
Empirical relationships showing that model performance improves predictably as a power law with more compute, data, or parameters.
The mechanism within transformers where each token in a sequence computes attention scores with every other token, determining how much to 'attend ...
A training approach where an AI agent improves by playing against copies of itself.
A training paradigm where models learn from unlabeled data by creating their own supervisory signal from the data itself, such as predicting masked...
A search approach that understands the meaning and intent behind a query rather than just matching keywords, powered by embeddings and vector simil...
A computer vision task that assigns a class label to every pixel in an image, creating a detailed understanding of the scene at pixel-level granula...
A measure of how closely related the meaning of two pieces of text (or other data) are, typically computed using embeddings and distance metrics li...
A training approach that combines a small amount of labeled data with a large amount of unlabeled data.
A framework that produces semantically meaningful sentence embeddings for comparison and search.
A language-independent tokenizer that treats text as raw bytes, requiring no pre-tokenization.
An NLP task that determines the emotional tone or opinion expressed in text, classifying it as positive, negative, neutral, or along more nuanced d...
The task of processing, understanding, or generating ordered sequences of data, including text, time series, audio, and genomic data.
A model architecture that transforms one sequence into another, originally using encoder-decoder RNNs and now primarily using transformers.
The foundational optimization algorithm that updates model weights using the gradient computed from a single example or mini-batch, introducing ben...
Running a new model in parallel with production, processing real traffic without serving its predictions.
A shortcut that adds a layer's input directly to its output, allowing gradients to flow through deep networks and enabling the training of very dee...
A mathematical function that converts a vector of raw scores (logits) into a probability distribution, where all values are between 0 and 1 and sum...
OpenAI's text-to-video generation model that creates photorealistic videos from text descriptions, demonstrating emergent understanding of physics ...
Attention mechanisms that attend to only a subset of positions rather than all positions, reducing the quadratic cost of standard attention to sub-...
Accelerating LLM inference by using a small draft model to predict multiple tokens verified in parallel by the main model.
An inference acceleration technique where a smaller, faster model drafts multiple tokens that are then verified in parallel by the larger model.
AI technology that converts spoken language into text, enabling voice interfaces, transcription services, and real-time captioning with increasing ...
Automatic Speech Recognition technology that converts spoken audio into written text, using neural networks to transcribe human speech.
An open-source text-to-image generation model that works by diffusing and denoising in a compressed latent space, making it faster and more accessi...
A sequence model architecture based on continuous state space representations, offering linear-time computation.
A measure of whether observed differences in model performance are likely real or due to random chance.
Understand Stochastic Gradient Descent (SGD), the optimization algorithm that powers deep learning. Learn about batch vs mini-batch, the update rul...
Constraining LLM outputs to follow specific formats like JSON, XML, or custom schemas.
Techniques for constraining language model outputs to follow specific formats like JSON schemas, XML, or custom grammars, ensuring machine-parseabl...
The challenge of aligning AI systems that are significantly smarter than humans with human values and intentions.
Learn what Supervised Learning is -- the most widely used machine learning approach. Understand training with labeled data, classification vs regre...
A classical machine learning algorithm that finds the optimal hyperplane separating different classes, maximizing the margin between the closest da...
Artificially generated data that mimics real-world data patterns, used for training when real data is scarce or sensitive.
Artificially generated data that mimics the statistical properties of real data, used when real data is scarce, expensive, sensitive, or imbalanced.
Creating artificial training data using algorithms or AI models to augment or replace real-world data.
Initial instructions that define an LLM's behavior, personality, and constraints for a conversation.
Instructions given to a language model that set its behavior, persona, constraints, and capabilities for an entire conversation, separate from user...
T
A transformer model that frames all NLP tasks as text-to-text problems with a unified architecture.
A parameter that controls the randomness of language model outputs. Lower temperature produces more focused, deterministic responses; higher temper...
A multi-dimensional array of numbers that serves as the fundamental data structure in deep learning frameworks like PyTorch and TensorFlow.
A distributed training strategy that splits individual layers of a model across multiple GPUs, enabling training of models too large to fit on a si...
Using additional computation during inference (not just training) to improve model outputs, through techniques like chain-of-thought, self-reflecti...
Adapting a model's parameters on each test input to improve predictions at inference time.
The task of assigning predefined categories or labels to text documents.
A dense vector representation of text (word, sentence, or document) that captures semantic meaning in a numerical format suitable for machine learn...
The process of producing coherent, contextually appropriate text using language models, encompassing everything from chatbot responses to creative ...
AI models that generate audio content (speech, music, sound effects) from text descriptions.
AI models that create images from natural language text descriptions (prompts).
AI technology that converts written text into natural-sounding spoken audio, using neural networks to generate human-like voice patterns.
AI systems that generate video content from natural language descriptions.
A method that learns new text embeddings to represent specific visual concepts for image generation.
Learn about the AI black box problem - why complex AI models make decisions we cannot explain. Discover Explainable AI (XAI) and why transparency m...
Understand the AI Singularity: the hypothetical point where artificial intelligence surpasses human intelligence and triggers an unstoppable intell...
Learn what the attention mechanism is in AI. Understand Query-Key-Value, self-attention vs cross-attention, multi-head attention, scaled dot-produc...
The basic unit of text that language models process -- typically a word, subword, or character, depending on the tokenizer used.
The process of breaking text into smaller units (tokens) that AI models can process, a fundamental preprocessing step that significantly impacts mo...
Learn what tokenization is in AI and NLP. Understand how text is broken into tokens using BPE, WordPiece, and SentencePiece. Discover why token lim...
A component that converts raw text into a sequence of tokens (integers) that a language model can process, and converts token IDs back into text.
The economics and pricing structure of AI API services based on input and output token counts.
The ability of language models to invoke external tools, APIs, and functions to extend their capabilities beyond pure text generation.
LLMs that can invoke external tools (APIs, code execution, search) to extend their capabilities.
A text generation strategy that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p, dynamically adjusting t...
Learn what training data is in machine learning, the difference between labeled and unlabeled data, how datasets are split into train/validation/te...
A numerical measure of how wrong the model's predictions are on training data, which the optimization algorithm works to minimize during training.
Understand transfer learning in AI: how pre-trained models transfer knowledge between tasks, dramatically reducing training time and data needs. Le...
An LLM reasoning framework that explores multiple reasoning branches and evaluates which paths are most promising.
A test of machine intelligence proposed by Alan Turing in 1950, where a human evaluator tries to distinguish between a machine and a human based so...