Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Warehouse Design: Understanding Dimensional Modeling and Indexing, Slides of Data Warehousing

An in-depth exploration of data warehouse design, focusing on dimensional modeling and indexing. Topics include the differences between operational and decision support systems, multi-dimensional data, dimensional modeling using relational dbms and special-purpose models, and index structures like inverted, bit map, and join indexes. The document also covers aggregation and data cube data models.

Typology: Slides

2012/2013

Uploaded on 04/25/2013

khushia
khushia 🇮🇳

4.3

(9)

112 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Warehousing Design
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Data Warehouse Design: Understanding Dimensional Modeling and Indexing and more Slides Data Warehousing in PDF only on Docsity!

Data Warehousing Design

Warehouse Design

 What to materialize in the warehouse

– Which source data?

– Which summary tables?

– Which indices?

 Influenced by both querying and

maintenance

 Trade storage space and update time for

query speed

Why is DW Design Different?

 DSS: few transactions, each accessing a large

number of records

 Typical ER designs tend to be complex and

difficult to navigate

Topic/Function Operational Decision Support

Data Content Current values Archival data

Data Organization Application by application Subject areas across enterprises

Nature of Data Dynamic Static until refreshed

Data Structure, Format Complex: suitable for operational

computation

Simple: suitable for business

analysis

Random Access Probability High Moderate to low

Data Update Updated on a field-by-field basis Accessed and read: no direct

update

Usage Highly structured repeptive

processing

Highly unstructured analytical

processing

Response Time Sub-second to 2-3 seconds Seconds to minutes, hours

Multi-Dimensional Data

 Measures - numerical data being tracked

 Dimensions - business parameters that define

a transaction

 Example: Analyst may want to view sales

data (measure) by geography , by time , and by

product (dimensions)

 Dimensional modeling is a technique for

structuring data around the business concepts

 ER models describe “entities” and

“relationships”

 Dimensional models describe “measures”

and “dimensions”

Dimensional Modeling Using

Special-Purpose Model (MDDB)

 Facts stored in multi-dimensional arrays

 Dimensions used to index array

 Sometimes on top of relational DB

 Products

– Pilot, Arbor Essbase, Gentia

Example

―Sales by product line over the past six months‖

―Sales by account between 1990 and 1995‖

Prod Code Time Code Acct Code Sales Qty

Account Info

Product Info

Time Info

Numerical Measures

Key columns joining fact table

to dimension tables

Fact table for

measures

Dimension tables

Example Cont’d

Sales

Time

Time Code

Quarter Code

Quarter Name

Week Code

Day Code

Day name

Account

Account Code

Key Account Code

Account Name

Account Type

Account Market

Geography

Geography Code

Region Code

Region Mgr

City Code

City Name

Product

Product Code

Product Name

Brand Mgr

Brand Code

Prod. Line Code

Prod. Line Name

Prod. Name

...

Geography Code

Time Code

Account Code

Product Code

Dollar Amount

Units

Dimensional Modeling Cont’d

 Fact tables are fully normalized

 Dimension tables are denormalized

– Repetitively stored for sake of simplicity and

performance

Product_Code Product_Name Product_Color Brand_Code 101 Widget Blue XYZ 102 Gadget Blue XYZ 103 Snicket Orange ABC 104 Graplit Orange ABC

Brand_Code Brand_Mgr XYZ J. Smith ABC T. Jones

Product_Code Product_Name Product_Color Brand_Code Brand_Mgr 101 Widget Blue XYZ J. Smith 102 Gadget Blue XYZ J. Smith 103 Gadget Green XYZ J. Smith 104 Snicket Orange ABC T. Jones 105 Graplit Orange ABC T. Jones

Advantages of Dimensional Modeling

 Define complex, multi-dimensional data with

simple model

 Reduces the number of phycial joins a query

has to process

 Allows the data warehouse to evolve with rel.

low maintenance

 HOWEVER! Star schema and rel. DBMS are

not the magic solution

– Query optimization is still problematic

Index Structures

 Traditional access methods

– B-trees, hash tables, grid files, etc.

 Popular in warehouses

– Inverted indexes (lists)

– Bit map indexes

– Join indexes

Bit Map Index

 Developed for Model 204 DBMS in 1987

Id Name Age 1 Joe 20 2 Fred 20 3 Sally 21 4 Nancy 20 5 Tom 20 6 Pat 25 7 Dave 21

Using Bit Maps

 Query:

– “Get people with age=20 and name =„Fred‟”

 (1) Bit map for age =20: 1101100

 (2) Bit map for name=„Fred‟: 0100000

 (3) Answer is intersection: 0100000

 Good if domain cardinality is small

 Bit vectors can be compressed

Aggregation

 Process by which low-level data is summarized in

advanced and placed into intermediate tables

 Speeds up query processing, less ad-hoc

– “Show me total US sales for 1990”

 How much to aggregate?

 Data cube data model

– All possible aggregations along all dimensions

– Cells contain aggregated values

– How much of the cells in cube should be pre-computed?

Aggregation Cont’d

 Special operators to navigate the hierarchies

– Roll-up: remove a dimension element

– e.g., Roll-up products to brands

– Drill-down (opposite of roll-up),

– Slice (defines a subcube)

– Various visualization ops (e.g., pivot)