Skip to main content

Introduction to Datawarehouse

In shot what is DWH (Data warehouse)?

A DWH is a Large Database which consists of Dimension and Fact tables.

Or Collection of DataMart is called as a DWH.

Father of Data Warehouse:

Dr. William H. Inmon

Dr. Ralph Kimball

-Introduced Dimension Modelling to design DWH in 1990.

What is Data Warehouse?

As per Dr. William Inmon,

Data Warehouse is a

-      Subject Oriented (Loans, Mutual Funds etc.)

-      Integrated (Get data from various sources)

-      Time Variant (Different time period data)

 

·         *Datamart is Subject Oriented

·         *DWH is a collection of Datamart

·         *DWH is a superset of Datamart

·         *Datamart is also another database.

Top-down Approach: (DWH->Datamarts)

 

  • Proposed by Dr. Bill Inmon (used by 20% companies)
  • DWH is created first and then the Data Marts are derived from it.
  • Data Marts have consistent data and subject Oriented Data
  • It is robust against business changes. The cost of the project is very high.

 

Bottom-up Approach: (Datamarts->DWH)

 

  • Proposed by Dr.Ralph Kimball (used by 80% companies)
  • Data Marts are created first and then the integrated and a comprehensive DWH is created.
  • As the data marts are created first, business solutions can be answered quickly.

Dimension Modeling:

 

  • Developed by Dr. Kimball
  • It’s a DWH design concept
  • It is a process to convert OLTP data into OLAP data in the form of Dimension tables and Fact tables and maintain relation b/w them.
  • Dimension Modeling is heart of data warehouse.

 

What is the dimension modeling?

  • Divide the data into dimension and Facts
  • Have relation between them.

 

Fact or Measure:

 

  • Measures are the numeric values that we want to aggregate, slice, dice and analyze.
  • Measure has following properties:

è How they should be formatted

è How they should aggregate up

è How they interact with specific dimensions and so on

 

ð  Measure always answers questions for ‘How much’ or ‘How many’ etc

ð  Data in the fact table can be filtered and grouped (“Sliced and diced”)

 

Dimension:

 

  • Dimension is a string, which always answer the questions like ‘When’,’Where’,’What’,’Who’,’Which’ etc
  • Examples for Dimensions:

-      Customer, Product

-      Time, Geography etc

  • Dimension Table PK is FK of Fact table.
  • Dimension contains multiple Hierarchies
  • Hierarchy supports drilling up and down.
  • Hierarchy speeds the aggregation in a cube while processing.
  • Even though Hierarchies are optional; these are recommended.

 

Master tables -> Dimension table (20% data)

Transactional tables-> Fact tables (80% data)

 

Master/Dimension tables:

  • Primary key
  • String data
  • Less Data

 

Transactional/Fact tables:

  • Foreign Keys
  • Numeric data
  • Large data

 

To create best DWH:

  1. Study OLTP database
  2. Study Client data Analytics/Reports/Data science/Dashboards/ MIS/Visualizations requirement.

 

OLTP to OLAP Table conversion:

 

ð  OLTP contains

o   Master Tables

§  Product

§  Course

§  Student etc

 

o   Transactional Tables

§  Payment Details

§  Order Details etc

ð  In OLAP

o   Master Table converted as Dimensional Tables

o   Transactional Tables converted as Fact Tables

 

Why DWH must contain less no. of tables?

=> More tables’ means more joins, more joins means reading will be slow.

 

DWH – Best Practice to create a DWH:

=========================

 

  1. What does Fact table contain?

Ans: FK and Measures only.

 

  1. DWH must contain less no of tables.

                        Dim -- More  --- reduce no of Dim tables

                                Fact --  Less

  1. Reduce no of columns in Fact tables (Fact table size need to be reduced)

 

DWH Design Best Practices:

==================

  • Reduce no of Tables in DWH.
  • Study fully OLTP DB
  • Study Reporting Requirements/Data Analytics
  • Proper Relationships b/w tables (Dimension and Fact tables)
  • Reduce no. of columns in Fact Table
  • Reduce size of keys (PK and FKs)
  • Have Junk Dimensions
  • Have Degenerate Dimensions


Popular posts from this blog

DBMS

Database : database can be defined as a collection of information stored at a particular place written in a predetermined manner, belongs to a particular topic.  DBMS: (DATA BASE MANAGEMENT SYSTEM):  it is defined as software which is present inside the database which is used o maintain or manage the data within the database .                                 In the above diagram, if the user needs any info from the database. He has to contact with  dbms , then the dbms goes to the database, takes the information from the  database and gives it back to the user. Database models :  these models are broadly divided into the following types: 1.         FMS/FDMS 2.         HMS/HDMS 3.         NDBMS 4.         RDBMS 5.      ...

About Me

I am a Data Analyst /Azure Data Engineer with more than 4 years of full-time work experience and around 3.4+ years of experience in different Microsoft Technologies ( Azure Data Factory , Azure SQL, Azure Data lake Gen 2).  I currently work mainly with Azure SQL server, Power bi / Tableau ,Azure Data Factory, Azure SQL, T-SQL, Blob Storages as well as XML and working on Development and Deployment.  My Technical skills are Azure SQL, Power Bi, Tableau ADLS,  Azure Data Factory,    SSIS, SSRS. Keywords Relevant to me : • Business Intelligence Developer • Data warehouse Developer •   Data Analyst • Azure Data engineer I like Learning new Tech, so in free time i explore learning about tech , taking courses on Digital Platforms and attending online workshops. I like Singing and Cycling as my hobby. If you want to get in touch, find me at 'Gauravthamada gmail com'