Data Lake
A centralized repository that stores vast amounts of raw data in its native format until needed for analysis or model training.
ML Data Lakes
For machine learning, data lakes store raw training data (text, images, logs), processed features, model artifacts, and experiment metadata. They use formats like Parquet, Delta Lake, or Iceberg for efficient querying.
Data Lakehouse
The modern 'lakehouse' pattern combines data lake storage with data warehouse querying capabilities. Platforms like Databricks, Snowflake, and BigQuery implement this for ML workflows.