In shot what is DWH (Data warehouse)?
A DWH is a Large Database which
consists of Dimension and Fact tables.
Or Collection of DataMart is called as
a DWH.
Father of Data Warehouse:
Dr. William H. Inmon
Dr. Ralph Kimball
-Introduced Dimension Modelling to design
DWH in 1990.
What is Data Warehouse?
As per Dr. William Inmon,
Data Warehouse is a
- Subject
Oriented (Loans, Mutual Funds etc.)
- Integrated
(Get data from various sources)
- Time
Variant (Different time period data)
·
*Datamart is Subject Oriented
·
*DWH is a collection of Datamart
·
*DWH is a superset of Datamart
·
*Datamart is also another database.
Top-down
Approach: (DWH->Datamarts)
- Proposed
by Dr. Bill Inmon (used by 20% companies)
- DWH is created first and then the Data Marts are derived from
it.
- Data Marts have consistent data and subject Oriented Data
- It is robust against business changes. The cost of the
project is very high.
Bottom-up
Approach: (Datamarts->DWH)
- Proposed
by Dr.Ralph Kimball (used by 80% companies)
- Data Marts are created first and then the integrated and a
comprehensive DWH is created.
- As the data marts are created first, business solutions can
be answered quickly.
Dimension
Modeling:
- Developed by Dr. Kimball
- It’s a DWH design concept
- It is a process
to convert OLTP data into OLAP data in the form of Dimension tables and
Fact tables and maintain relation b/w them.
- Dimension Modeling is heart of data warehouse.
What is the dimension modeling?
- Divide the data into dimension and Facts
- Have relation between them.
Fact or Measure:
- Measures are the
numeric values that we want to aggregate, slice, dice and analyze.
- Measure has following properties:
è How they
should be formatted
è How they
should aggregate up
è How they
interact with specific dimensions and so on
ð Measure
always answers questions for ‘How much’ or ‘How many’ etc
ð Data in
the fact table can be filtered and grouped (“Sliced and diced”)
Dimension:
- Dimension is a string,
which always answer the questions like ‘When’,’Where’,’What’,’Who’,’Which’
etc
- Examples for Dimensions:
- Customer,
Product
- Time,
Geography etc
- Dimension Table PK is FK of Fact table.
- Dimension contains multiple Hierarchies
- Hierarchy supports drilling up and down.
- Hierarchy speeds the aggregation in a cube while processing.
- Even though Hierarchies are optional; these are recommended.
Master tables -> Dimension table (20% data)
Transactional tables-> Fact tables (80% data)
Master/Dimension
tables:
- Primary key
- String data
- Less Data
Transactional/Fact
tables:
- Foreign Keys
- Numeric data
- Large data
To create
best DWH:
- Study OLTP database
- Study Client data Analytics/Reports/Data science/Dashboards/
MIS/Visualizations requirement.
OLTP to
OLAP Table conversion:
ð OLTP
contains
o Master
Tables
§ Product
§ Course
§ Student
etc
o Transactional
Tables
§ Payment
Details
§ Order
Details etc
ð In OLAP
o Master
Table converted as Dimensional Tables
o Transactional
Tables converted as Fact Tables
Why DWH must contain less no. of tables?
=> More tables’ means more joins, more joins means reading will
be slow.
DWH – Best Practice to create a DWH:
=========================
- What does Fact table contain?
Ans: FK and Measures only.
- DWH must contain less no of
tables.
Dim
-- More --- reduce no of Dim tables
Fact
-- Less
- Reduce no of columns in Fact tables (Fact table size need to
be reduced)
DWH
Design Best Practices:
==================
- Reduce no of Tables in DWH.
- Study fully OLTP DB
- Study Reporting Requirements/Data Analytics
- Proper Relationships b/w tables (Dimension and Fact tables)
- Reduce no. of columns in Fact Table
- Reduce size of keys (PK and FKs)
- Have Junk Dimensions
- Have Degenerate Dimensions