Back to FAQ library

Data Warehouse vs. Lake vs. Lakehouse?

  • Data Warehouse (1986, IBM): a massive, structured store of enterprise data. Uses “schema on write” with ETL. Can become a data silo as data is hard to unify due to strict schemas and extensive ETL code.
  • Data Lake (2011, Pentaho/Hitachi): a flexible, low-cost store of structured and unstructured data, typically built on Apache Hadoop. Uses “schema on read” with ELT. It can end up becoming a “Data Swamp” due to lack of structure and governance leading to unusability.
  • Data Lakehouse (2017, Jellyvision/Snowflake): has best of both worlds. Formalized in the 2021 Lakehouse Paper published in collaboration between Databricks, UC Berkeley, and Standford University.
  • Engineering Blog Content

Diagram