Feature Stores: Managing ML Features at Scale

Feature engineering consumes the majority of time in most machine learning projects, yet the features themselves are often managed ad hoc: computed in notebooks, duplicated across teams, and implemented differently between training and serving environments. Feature stores solve this problem by providing a centralized repository for storing, managing, and serving machine learning features, ensuring consistency, reusability, and correctness across the ML lifecycle.

The Problem Feature Stores Solve

Training-Serving Skew

The most insidious problem in production ML is training-serving skew: when the features used during training differ subtly from those used during inference. Perhaps the training pipeline computes a rolling average over 30 days of history, but the serving pipeline inadvertently includes the current day. Or the training code handles missing values differently from the serving code. These discrepancies cause models to perform worse in production than in evaluation, and they are extremely difficult to diagnose.

Feature Duplication

Without a central feature store, different teams independently compute the same features. A fraud detection team and a credit scoring team might both compute "average transaction amount over 7 days" using slightly different logic, different time windows, or different data sources. This duplication wastes compute, introduces inconsistencies, and makes it impossible to know which version of a feature is "correct."

Point-in-Time Correctness

Training ML models requires point-in-time correct features: the feature values that would have been available at each historical timestamp, without future information leaking into past examples. Computing this correctly is surprisingly tricky, especially for time-windowed aggregations. Feature stores handle this complexity through time-travel capabilities.

"A feature store is to ML what a database is to application development: the centralized, reliable layer that ensures data consistency and enables teams to build on each other's work."

How Feature Stores Work

Dual Storage: Offline and Online

Feature stores maintain two storage layers. The offline store contains historical feature values, optimized for batch reads during model training. This is typically a data warehouse or data lake. The online store contains the latest feature values, optimized for low-latency reads during real-time inference. This is typically a key-value store like Redis or DynamoDB.

The feature store keeps both stores synchronized, ensuring that the features served for real-time predictions match those used during training, eliminating training-serving skew by design.

Feature Definitions

Features are defined as code, specifying the data source, transformation logic, entity keys, and metadata. These definitions serve as documentation, enable version control, and provide the contract between feature producers and consumers.

Key Takeaway

Feature stores solve training-serving skew by providing a single source of truth for features that serves both training (batch, historical) and inference (real-time, current) from consistent definitions and synchronized storage.

Popular Feature Store Platforms

Feast (Open Source)

Feast is the most popular open-source feature store. It provides a Python SDK for defining features, materialization jobs for populating online stores from offline data sources, and low-latency serving for real-time inference. Feast integrates with various data sources (BigQuery, Snowflake, Redshift) and online stores (Redis, DynamoDB, SQLite). Its lightweight architecture makes it suitable for teams of all sizes.

Tecton

Tecton is a managed feature platform built by the creators of Uber's Michelangelo ML platform. It excels at real-time feature engineering, computing features from streaming data with exactly-once semantics. Tecton handles the complexity of time-windowed aggregations, backfills, and monitoring automatically. It is the most feature-complete commercial option but comes with significant cost.

Hopsworks

Hopsworks provides an open-source feature store with a managed cloud option. It integrates feature store capabilities with a broader ML platform including model serving and experiment tracking. Hopsworks' feature pipeline framework supports both batch and streaming transformations.

Cloud-Native Options

Each major cloud provider offers feature store capabilities: SageMaker Feature Store on AWS, Vertex AI Feature Store on GCP, and Azure ML Feature Store. These integrate tightly with their respective ML platforms, offering convenience for teams already committed to a cloud provider.

Feature Engineering Patterns

Batch Features

Features computed on a schedule from historical data. Examples include daily aggregations, weekly summaries, and periodic model predictions used as features for downstream models. Batch features are straightforward to compute and manage.

Streaming Features

Features computed in real time from event streams. Examples include "number of transactions in the last 5 minutes" for fraud detection or "current page views per second" for content ranking. Streaming features are more complex to implement but essential for time-sensitive applications.

On-Demand Features

Features computed at request time from the input data itself. These include transformations of the raw request (text length, URL parsing) and lookups from external services. On-demand features do not need storage because they are computed fresh for each prediction.

When Do You Need a Feature Store?

Not every team needs a feature store. Consider adopting one when:

Multiple models share features: If only one model uses each feature, the overhead of a feature store may not be justified
Real-time serving is required: If all your models run batch predictions, a feature store's online serving layer adds unnecessary complexity
Training-serving skew is a real problem: If you have experienced production issues caused by feature inconsistencies, a feature store provides structural prevention
Multiple teams produce and consume features: Feature stores provide the most value in organizations where feature sharing accelerates development

For small teams with a few models and batch-only inference, a well-organized data pipeline may be sufficient. As your ML practice grows in scale and complexity, the investment in a feature store increasingly pays for itself through reduced duplication, fewer production issues, and faster time to deployment for new models.

Key Takeaway

Feature stores centralize feature management, eliminate training-serving skew, and enable feature reuse across teams. Start with Feast for open-source simplicity, consider Tecton for advanced real-time features, or use cloud-native options for tight platform integration.

Feature Stores: Managing ML Features at Scale

The Problem Feature Stores Solve

Training-Serving Skew

Feature Duplication

Point-in-Time Correctness

How Feature Stores Work

Dual Storage: Offline and Online

Feature Definitions

Key Takeaway

Popular Feature Store Platforms

Feast (Open Source)

Tecton

Hopsworks

Cloud-Native Options

Feature Engineering Patterns

Batch Features

Streaming Features

On-Demand Features

When Do You Need a Feature Store?

Key Takeaway

References & Sources

Related Glossary Terms

The Problem Feature Stores Solve

Training-Serving Skew

Feature Duplication

Point-in-Time Correctness

How Feature Stores Work

Dual Storage: Offline and Online

Feature Definitions

Key Takeaway

Popular Feature Store Platforms

Feast (Open Source)

Tecton

Hopsworks

Cloud-Native Options

Feature Engineering Patterns

Batch Features

Streaming Features

On-Demand Features

When Do You Need a Feature Store?

Key Takeaway

References & Sources

Related Glossary Terms

Related Articles

MLOps: Managing Machine Learning in Production

Experiment Tracking with MLflow: Organizing ML Research

Model Monitoring in Production: Detecting Drift and Degradation