To build a data warehouse, Tubular ingests raw data from multiple sources using Kafka and stores it permanently. The data is normalized using Spark - duplicates are removed, data is partitioned by time, and sources are joined. A metadata storage using Hive Metastore allows unified access to datasets discovered across various storage formats like Parquet and Avro. This centralized repository helps engineers, analysts and services access and analyze disparate data.