AI Glossary

Data Lake

A centralized repository that stores vast amounts of raw data in its native format until needed for analysis or model training.

ML Data Lakes

For machine learning, data lakes store raw training data (text, images, logs), processed features, model artifacts, and experiment metadata. They use formats like Parquet, Delta Lake, or Iceberg for efficient querying.

Data Lakehouse

The modern 'lakehouse' pattern combines data lake storage with data warehouse querying capabilities. Platforms like Databricks, Snowflake, and BigQuery implement this for ML workflows.

← Back to AI Glossary

Last updated: March 5, 2026