Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Data Warehouse Design: Understanding Dimensional Modeling and Indexing, Slides of Data Warehousing

University of Lucknow Data Warehousing

An in-depth exploration of data warehouse design, focusing on dimensional modeling and indexing. Topics include the differences between operational and decision support systems, multi-dimensional data, dimensional modeling using relational dbms and special-purpose models, and index structures like inverted, bit map, and join indexes. The document also covers aggregation and data cube data models.

Typology: Slides

2012/2013

Uploaded on 04/25/2013

khushia 🇮🇳

4.3

(9)

112 documents

1 / 39

This page cannot be seen from the preview

Don't miss anything!

Data Warehousing Design

Docsity.com

Partial preview of the text

Download Data Warehouse Design: Understanding Dimensional Modeling and Indexing and more Slides Data Warehousing in PDF only on Docsity!

Data Warehousing Design

Warehouse Design

 What to materialize in the warehouse

– Which source data?

– Which summary tables?

– Which indices?

 Influenced by both querying and

maintenance

 Trade storage space and update time for

query speed

Why is DW Design Different?

 DSS: few transactions, each accessing a large

number of records

 Typical ER designs tend to be complex and

difficult to navigate

Topic/Function Operational Decision Support

Data Content Current values Archival data

Data Organization Application by application Subject areas across enterprises

Nature of Data Dynamic Static until refreshed

Data Structure, Format Complex: suitable for operational

computation

Simple: suitable for business

analysis

Random Access Probability High Moderate to low

Data Update Updated on a field-by-field basis Accessed and read: no direct

update

Usage Highly structured repeptive

processing

Highly unstructured analytical

processing

Response Time Sub-second to 2-3 seconds Seconds to minutes, hours

Multi-Dimensional Data

 Measures - numerical data being tracked

 Dimensions - business parameters that define

a transaction

 Example: Analyst may want to view sales

data (measure) by geography , by time , and by

product (dimensions)

 Dimensional modeling is a technique for

structuring data around the business concepts

 ER models describe “entities” and

“relationships”

 Dimensional models describe “measures”

and “dimensions”

Dimensional Modeling Using

Special-Purpose Model (MDDB)

 Facts stored in multi-dimensional arrays

 Dimensions used to index array

 Sometimes on top of relational DB

 Products

– Pilot, Arbor Essbase, Gentia

Example

―Sales by product line over the past six months‖

―Sales by account between 1990 and 1995‖

Prod Code Time Code Acct Code Sales Qty

Account Info

Product Info

Time Info

Numerical Measures

Key columns joining fact table

to dimension tables

Fact table for

measures

Dimension tables

Example Cont’d

Sales

Time

Time Code

Quarter Code

Quarter Name

Week Code

Day Code

Day name

Account

Account Code

Key Account Code

Account Name

Account Type

Account Market

Geography

Geography Code

Region Code

Region Mgr

City Code

City Name

Product

Product Code

Product Name

Brand Mgr

Brand Code

Prod. Line Code

Prod. Line Name

Prod. Name

...

Geography Code

Time Code

Account Code

Product Code

Dollar Amount

Units

Dimensional Modeling Cont’d

 Fact tables are fully normalized

 Dimension tables are denormalized

– Repetitively stored for sake of simplicity and

performance

Product_Code Product_Name Product_Color Brand_Code 101 Widget Blue XYZ 102 Gadget Blue XYZ 103 Snicket Orange ABC 104 Graplit Orange ABC

Brand_Code Brand_Mgr XYZ J. Smith ABC T. Jones

Product_Code Product_Name Product_Color Brand_Code Brand_Mgr 101 Widget Blue XYZ J. Smith 102 Gadget Blue XYZ J. Smith 103 Gadget Green XYZ J. Smith 104 Snicket Orange ABC T. Jones 105 Graplit Orange ABC T. Jones

Advantages of Dimensional Modeling

 Define complex, multi-dimensional data with

simple model

 Reduces the number of phycial joins a query

has to process

 Allows the data warehouse to evolve with rel.

low maintenance

 HOWEVER! Star schema and rel. DBMS are

not the magic solution

– Query optimization is still problematic

Index Structures

 Traditional access methods

– B-trees, hash tables, grid files, etc.

 Popular in warehouses

– Inverted indexes (lists)

– Bit map indexes

– Join indexes

Bit Map Index

 Developed for Model 204 DBMS in 1987

Id Name Age 1 Joe 20 2 Fred 20 3 Sally 21 4 Nancy 20 5 Tom 20 6 Pat 25 7 Dave 21

Data Warehouse Design: Understanding Dimensional Modeling and Indexing, Slides of Data Warehousing

Related documents

Partial preview of the text

Download Data Warehouse Design: Understanding Dimensional Modeling and Indexing and more Slides Data Warehousing in PDF only on Docsity!

Data Warehousing Design

Warehouse Design

 What to materialize in the warehouse

– Which source data?

– Which summary tables?

– Which indices?

 Influenced by both querying and

maintenance

 Trade storage space and update time for

query speed

Why is DW Design Different?

 DSS: few transactions, each accessing a large

number of records

 Typical ER designs tend to be complex and

difficult to navigate

Topic/Function Operational Decision Support

Data Content Current values Archival data

Data Organization Application by application Subject areas across enterprises

Nature of Data Dynamic Static until refreshed

Data Structure, Format Complex: suitable for operational

computation

Simple: suitable for business

analysis

Random Access Probability High Moderate to low

Data Update Updated on a field-by-field basis Accessed and read: no direct

update

Usage Highly structured repeptive

processing

Highly unstructured analytical

processing

Response Time Sub-second to 2-3 seconds Seconds to minutes, hours

Multi-Dimensional Data

 Measures - numerical data being tracked

 Dimensions - business parameters that define

a transaction

 Example: Analyst may want to view sales

data (measure) by geography , by time , and by

product (dimensions)

 Dimensional modeling is a technique for

structuring data around the business concepts

 ER models describe “entities” and

“relationships”

 Dimensional models describe “measures”

and “dimensions”

Dimensional Modeling Using

Special-Purpose Model (MDDB)

 Facts stored in multi-dimensional arrays

 Dimensions used to index array

 Sometimes on top of relational DB

 Products

– Pilot, Arbor Essbase, Gentia

Example

―Sales by product line over the past six months‖

―Sales by account between 1990 and 1995‖

Prod Code Time Code Acct Code Sales Qty

Account Info

Product Info

Time Info

Fact table for

measures

Dimension tables

Example Cont’d

Sales

Time

Account

Geography

Product

Dimensional Modeling Cont’d

 Fact tables are fully normalized

 Dimension tables are denormalized

– Repetitively stored for sake of simplicity and

performance

Advantages of Dimensional Modeling

 Define complex, multi-dimensional data with

simple model

 Reduces the number of phycial joins a query