The document provides an overview of key concepts related to data warehousing and online analytical processing (OLAP). It discusses the components and architecture of a data warehouse, including source systems, data staging, the data warehouse layer containing data marts and a metadata repository, and analytical tools. It also covers OLAP concepts like MOLAP, ROLAP, and HOLAP systems as well as common OLAP operations like roll-up, drill-down, slice, dice, and pivot. Dimensional data models including star, snowflake, and fact constellation schemas are also summarized.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
51 views
DWM Unit 1 (2023)
The document provides an overview of key concepts related to data warehousing and online analytical processing (OLAP). It discusses the components and architecture of a data warehouse, including source systems, data staging, the data warehouse layer containing data marts and a metadata repository, and analytical tools. It also covers OLAP concepts like MOLAP, ROLAP, and HOLAP systems as well as common OLAP operations like roll-up, drill-down, slice, dice, and pivot. Dimensional data models including star, snowflake, and fact constellation schemas are also summarized.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38
Unit I
1.1 Data Warehousing:
Data warehousing Components, Building a Data warehouse, Data Warehouse Architecture, DBMS Schemas for Decision Support, Data Extraction, Clean up, and Transformation Tools, Metadata, reporting, Query tools and Applications 1.2 Online Analytical Processing (OLAP) – OLAP and OLTP, Concept Hierarchies, Characteristics of OLAP Systems, Typical OLAP Operations, Multidimensional Data Analysis. Data Warehouse • A Data Warehouse is built by combining data from multiple diverse sources that support analytical reporting, structured and unstructured queries, and decision making for the organization, and Data Warehousing is a step-by-step approach for constructing and using a Data Warehouse. Data warehouse • A data warehouse is mainly a data management system that’s designed to enable and support business intelligence (BI) activities, particularly analytics. • Data warehouses are alleged to perform queries, cleaning, manipulating, transforming and analyzing the data and they also contain large amounts of historical data. • Need of Data Warehousing • Characteristics of Data warehouse • Subject Oriented • Time-Variant • Non-volatile • Integrated Architecture & Components of Data Warehouse • The architecture of the data warehouse mainly consists of the proper arrangement of its elements, to build an efficient data warehouse with software and hardware components. • The elements and components may vary based on the requirement of organizations. ARCHITECTURE • • Architecture • Source layer • Data Staging (ETL) • Data Warehouse layer (Data Marts & Meta data repository) • Analysis: Issues reports, dynamically analyze information, and simulate hypothetical business scenarios. Source Data Component • External Data • Internal Data • Operational System data • Flat files Data Staging • The data staging contains three primary functions that take place in this part Data Storage in Warehouse • Metadata Meta-data repositories store information on sources, access procedures, data staging, users, data mart schema, and so on. • Meta data helps the users to understand content and find the data. • Meta data are stored in a separate data stores which is known as informational directory or Meta data repository which helps to integrate, maintain and view the contents of the data warehouse Extraction, Transformation, and Loading • Data extraction, which typically gathers data from multiple, heterogeneous, and external sources. • Data cleaning, which detects errors in the data and rectifies them when possible. • Data transformation, which converts data from legacy or host format to warehouse format. • Load, which sorts, summarizes, consolidates, computes views, checks integrity, and builds indices and partitions. • Refresh, which propagates the updates from the data sources to the warehouse. ETL • Loading • The Load is the process of writing the data into the target database • Loading can be carried in two ways: 1.Refresh: Data Warehouse data is completely rewritten. 2.Update: Only those changes applied to source information are added to the Data Warehouse. Update • An update is typically carried out without deleting or modifying preexisting data. • This method is used in combination with incremental extraction to update data warehouses regularly. Data Warehouse • Data warehouses are built using dimensional data models which consist of fact and dimension tables. • Fact tables are used for analysis and decision- making process, • While Dimension tables help store information about a business and its processes. Fact Table • In data warehousing, a Fact Table is one that contains the measurements, metrics, or facts of a business operation. • It is surrounded by Dimension Tables and is found at the core of a star or snowflake schema. • A fact table is a table that contains summarized numerical and historical data (facts) and a multipart index composed of foreign keys from the primary keys of related dimension tables. • A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain numeric facts. • Dimensions are categories by which summarized data can be viewed. E.g. a profit summary in a fact table can be viewed by a Time dimension (profit by month, quarter, year), Region dimension Fact and Dimension • DBMS Schemas for Decision Support • Schema is a logical description of the entire database • A data warehouse uses following schema: • Star schema • Snowflake schema • Fact Constellation schema. •Each dimension in a star Star Schema schema is represented with only one-dimension table. • •This dimension table contains the set of attributes. •There is a fact table at the center. It contains the keys to each of four dimensions. •The fact table also contains the attributes, namely dollars sold and units sold. Snowflake Schema •Some dimension tables in the Snowflake schema are normalized. • •The normalization splits up the data into additional tables. Due to normalization in the Snowflake schema, • The redundancy is reduced and therefore, • It becomes easy to maintain and the save storage space. Fact Constellation Schema • • A fact constellation has multiple fact tables. • It is also known as galaxy schema. OLAP • Online Analytical Processing Server (OLAP) is based on the multidimensional data model. • Online Analytical Processing (OLAP) is a category of software that allows users to analyze information from multiple database systems at the same time. There are four types of OLAP servers − 1. Relational OLAP (ROLAP) 2. Multidimensional OLAP (MOLAP) 3. Hybrid OLAP (HOLAP) ROLAP • ROLAP works with data that exist in a relational database. Facts and dimension tables are stored as relational tables. MOLAP • MOLAP uses array-based multidimensional storage engines to display multidimensional views of data. Basically, they use an OLAP cube. HOLAP • Hybrid OLAP is a mixture of both ROLAP and MOLAP. It offers fast computation of MOLAP and higher scalability of ROLAP. HOLAP uses two databases. • Aggregated or computed data is stored in a multidimensional OLAP cube • Detailed information is stored in a relational database. • OLAP cube • The OLAP cube is a data • structure optimized for very quick data analysis. • The OLAP Cube consists of numeric facts called measures which are categorized by dimensions. • OLAP Cube is also called the hypercube. Cube • 3-D table can be represented as 3-D data cube OLAP Operations The list of OLAP operations : • Roll-up • Drill-down • Slice and dice • Pivot (rotate) Roll-up • Roll-up is also known as “consolidation” or • “aggregation.” The Roll- up operation can be performed in 2 ways 1. Reducing dimensions 2. Climbing up concept hierarchy. 3. Concept hierarchy is a system of grouping things based on their order or level. Drill-down • In drill-down data is • fragmented into smaller parts. It is the opposite of the rollup process. It can be done via • Moving down the concept hierarchy • Increasing a dimension Slice • One dimension is selected, and a new sub-cube is created. •Dimension Time is Sliced with Q1 as the filter. Dice • The dice operation • selects 2 or more dimensions that result in the creation of a sub- cube. Pivot • In Pivot the data is rotated in axes to provide a substitute presentation of data.