Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

CS 347: Database Concurrency Control and Recovery Notes - Locking and Transactions, Slides of Distributed Database Management Systems

Notes on database concurrency control and recovery, focusing on locking and transactions. It covers topics such as write locks, read locks, commit protocols, and the difference between local and distributed commit. The document also discusses the importance of serializability and the potential problem of lost transactions.

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam
sambandam 🇮🇳

4.3

(37)

162 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2
CS 347 Notes07 7
Write(X):
get exclusive X1 lock
get exclusive X2 lock
get exclusive X3 lock
write new value into X1, X2, X3
at end of transaction, release X1, X2, X3 locks
X1 X3
X2
lock lock
lock
write write write
CS 347 Notes07 8
Correctness OK
–2PL serializability
–2PC atomic transactions
Problem: Low availability
X1 X3
X2 down!
cannot access X!
CS 347 Notes07 9
Readers lock and access a single copy
Writers lock all copies
and update all copies
Good availability for reads
Poor availability for writes
Basic Solution — Improvement
X1 X2 X3
reader has lock writer will conflict!
CS 347 Notes07 10
Reminder
With basic solution
use standard 2PL
use standard commit protocols
CS 347 Notes07 11
Variation on Basic: Primary copy
Select primary site (static for now)
Readers lock and access primary copy
Writers lock primary copy
and update all copies
X1 * X2 X3
reader
write
lock
writer
CS 347 Notes07 12
Commit Options for Primary Site Scheme
Local Commit
X1 * X2 X3
propagate update
lock,
write, commit
writer
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download CS 347: Database Concurrency Control and Recovery Notes - Locking and Transactions and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes07 7

  • Write(X):
    • get exclusive X1 lock
    • get exclusive X2 lock
    • get exclusive X3 lock
    • write new value into X1, X2, X
    • at end of transaction, release X1, X2, X3 locks

X1 X2 X

lock lock lock

write (^) write (^) write

CS 347 Notes07 8

  • Correctness OK
    • 2PL  serializability
    • 2PC  atomic transactions
  • Problem: Low availability

X1 X2 X

down!

 cannot access X!

CS 347 Notes07 9

  • Readers lock and access a single copy
  • Writers lock all copies

and update all copies

  • Good availability for reads
  • Poor availability for writes

Basic Solution — Improvement

X1 X2 X

reader has lock (^) writer will conflict!

CS 347 Notes07 10

Reminder

  • With basic solution
    • use standard 2PL
    • use standard commit protocols

CS 347 Notes07 11

Variation on Basic: Primary copy

  • Select primary site (static for now)
  • Readers lock and access primary copy
  • Writers lock primary copy

and update all copies

X1 * X2 X

reader

lock write writer

CS 347 Notes07 12

Commit Options for Primary Site Scheme

  • Local Commit

X1 * X2 X

propagate update

lock, write, commit writer

CS 347 Notes07 13

Commit Options for Primary Site Scheme

  • Local Commit

X1 * X2 X

propagate update

lock, write, commit writer

Write(X):

  • Get exclusive X1* lock
  • Write new value into X1*

•Commit at primary; get sequence number

•Perform X2, X3 updates in sequence number order

...

...

CS 347 Notes07 14

Example t = 0

X1: 0

Y1: 0

Z1: 0

* X2: 0

Y2: 0

Z2: 0

T1: X  1; Y  1;

T2: Y  2;

T3: Z  3;

CS 347 Notes07 15

Example t = 1

X1: 1

Y1: 1

Z1: 3

* X2: 0

Y2: 0

Z2: 0

T1: X  1; Y  1;

T2: Y  2;

T3: Z  3;

active at node 1

waiting for lock at node 1

active at node 1

CS 347 Notes07 16

Example t = 2

X1: 1

Y1: 2

Z1: 3

* X2: 0

Y2: 0

Z2: 0

T1: X  1; Y  1;

T2: Y  2;

T3: Z  3;

committed

active at 1

committed

#2: X  1; Y  1
#1: Z  3

CS 347 Notes07 17

Example t = 3

X1: 1

Y1: 2

Z1: 3

* X2: 1

Y2: 2

Z2: 3

T1: X  1; Y  1;

T2: Y  2;

T3: Z  3;

committed

committed

committed

#1: Z  3
#2: X  1; Y  1
#3: Y  2

CS 347 Notes07 18

What good is RPWP-LC?

primary backup backup

updates can’t read!

CS 347 Notes07 25

Comparison

N = number of nodes with copies P = probability that a node is operational

RAWA
ROWA
RPWP:LC
RPWP:DC

Probability can read

Probability can write P N P N

P N
P N
1 - (1-P)N
P P
P

CS 347 Notes07 26

Comparison

N = 5 = number of nodes with copies P = 0.99 = probability that a node is operational

Read Prob. Write Prob.

RAWA 0.9510 0.

ROWA 1.0000 0.

RPWP:LC 0.9900 0.

RPWP:DC 0.9900 0.

CS 347 Notes07 27

Comparison

N = 100 = number of nodes with copies P = 0.99 = probability that a node is operational

Read Prob. Write Prob.

RAWA 0.3660 0.

ROWA 1.0000 0.

RPWP:LC 0.9900 0.

RPWP:DC 0.9900 0.

CS 347 Notes07 28

Comparison

N = 5 = number of nodes with copies P = 0.90 = probability that a node is operational

Read Prob. Write Prob.

RAWA 0.5905 0.

ROWA 1.0000 0.

RPWP:LC 0.9000 0.

RPWP:DC 0.9000 0.

CS 347 Notes07 29

Outline

  • Basic Algorithms
  • Improved (Higher Availability) Algorithms
    • Mobile Primary
    • Available Copies
  • Multiple Fragments & Other Issues

CS 347 Notes07 30

Mobile Primary (with RPWP)

(1) Elect new primary

(2) Ensure new primary has seen all

previously committed transactions

(3) Resolve pending transactions

(4) Resume processing

primary backup backup

CS 347 Notes07 31

(1) Elections

  • Can be tricky...
  • One idea:
    • Nodes have IDs
    • Largest ID wins

CS 347 Notes07 32

(1) Elections: One scheme

(a) Broadcast “I want to be primary, ID=X”

(b) Wait long enough so anyone with

larger ID can stop my takeover

(c) If I see “I want to be primary” message

with smaller ID, kill that takeover

(d) After wait without seeing bigger ID,

I am new primary!

CS 347 Notes07 33

(1) Elections: Epoch Number

It is useful to attach an epoch or version

number to messages:

primary: n3 epoch# = 1

primary: n5 epoch# = 2

primary: n3 epoch# = 3

CS 347 Notes07 34

(2) Ensure new primary has

previously committed transactions

primary new primary backup

committed: T1, T

need to get and apply: T1, T

 How can we make sure new primary

is up to date? More on this coming up...

CS 347 Notes07 35

(3) Resolve pending transactions

primary new primary backup

T3? T3 in “W” state

T3 in “W” state

CS 347 Notes07 36

Failed Nodes: Example

now: primary backup backup

  • down- -down-

later:

-down- -down- -down-

later primary backup

still:

P1 P2 P

P1 P2 P

P1 P2 P -down-

commits T 1

commit T 2 (unaware of T1!)

CS 347 Notes07 43

Guarantee

  • After transaction T commits at primary,

any future primary will “see” T

primary backup 1 backup 2

primary next primary backup 2

now:

T1, T2, T

T1, T2, T3, T

later:

CS 347 Notes07 44

Performance Hit

  • 3PC is very expensive
    • many messages
    • locks held longer (less concurrency)

[Note: group commit may help]

  • Can use 2PC
    • may have blocking
    • 2PC still expensive

[up to 1 second reported]

CS 347 Notes07 45

Alternative: 1-safe (RPWP:LC)

  • Commit transactions unilaterally at primary
  • Send updates to backups as soon as possible

primary backup 1 backup 2

(1) T end work (2) T commit (3) send data (4) purge data

time

CS 347 Notes07 46

Problem: Lost Transactions

primary backup 1 backup 2

primary next primary backup 2

now:

T1, T2, T

T1, T4, T

later:

T1 T

T1, T

CS 347 Notes07 47

Claim

  • Lost transaction problem tolerable
    • failures rare
    • only a “few” transactions lost

CS 347 Notes07 48

Primary Recovery with 1-safe

  • When failed primary recovers, need to

“compensate” for missed transactions

primary backup 2

backup 3 next primary^ backup 2

now:

T1, T2, T

T1, T4, T

later:

T1, T4, T5 T1, T

T1, T4, T

next primary

T1, T2, T3, T3-1, T2-1, T4, T compensation

CS 347 Notes07 49

Log Shipping

  • “Log shipping:” propagate updates to backup
    • Backup replays log
    • How to replay log efficiently?
      • e.g., elevator disk sweeps
      • e.g., avoid overwrites

primary log backup

CS 347 Notes07 50

So Far in Data Replication

  • RAWA
  • ROWA
  • Primary copy
    • static
      • local commit
      • distributed commit
    • mobile primary
      • 2-safe (distributed commit) blocking or non-blocking
      • 1-safe (local commit)

CS 347 Notes07 51

Outline

  • Basic Algorithms
  • Improved (Higher Availability) Algorithms
    • Mobile Primary
    • Available Copies
  • Multiple Fragments & Other Issues

CS 347 Notes07 52

PC-lock available copies

  • Transactions write lock at all available copies
  • Transactions read lock at any available copy
  • Primary site (static) manages

U – set of available copies

X1 X2 X
X

primary down

CS 347 Notes07 53

Update Transaction

(1) Get U from primary

(2) Get write locks from U nodes

(3) Commit at U nodes

C 0

Primary

C 1

Backup

C 2

Backup

Trans T 3 , U={C 0 , C 1 }

U={C 0 , C 1 }
U

updates, 2PC

CS 347 Notes07 54

A potential problem - example

Now:

U={C 0 , C 1 }

-recovering-

C 0

Primary

C 1

Backup

C 2

Backup

Trans T 3 , U={C 0 , C 1 }

I am recovering

CS 347 Notes07 61

Node Recovery (no primary)

  • Get missed updates from any active node
  • No unique sequence of transactions
  • If all nodes fail, wait for - all to recover
    • majority to recover

CS 347 Notes07 62

recovering node

 How much information (update values) must be

remembered? By whom?

Committed: A,B,C,D,E,F

Pending: G

Committed: A,C,B,E,D

Pending: F,G,H

Committed: A,B

Example

CS 347 Notes07 63

Correctness with replicated data

S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ]

 Is this schedule serializable?

X 1 X^2

CS 347 Notes07 64

S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ]

 Is this schedule serializable?

X 1 X^2

One idea: Require transactions to update all copies S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ] w 1 [X 2 ]  w 2 [X 1 ]

CS 347 Notes07 65

S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ]

 Is this schedule serializable?

X 1 X^2

One idea: Require transactions to update all copies S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ] w 1 [X 2 ]  w 2 [X 1 ] (not a good idea for high-availability algorithms)

Another idea: Build in copy-semantics into notion of serializability

CS 347 Notes07 66

One copy serializable (1SR)

A schedule S on replicated data is 1SR if

it is equivalent to a serial history of the

same transactions on a one-copy

database

CS 347 Notes07 67

To check 1SR

  • Take schedule
  • Treat ri[Xj ] as ri[X] Xj is copy of X

w i[Xj ] as w i[X]

  • Compute P(S)
  • If P(S) acyclic, S is 1SR

CS 347 Notes07 68

S 1 : r 1 [X 1 ]  r 2 [X 2 ]  w 1 [X 1 ]  w 2 [X 2 ]

S 1 ’: r 1 [X]  r 2 [X]  w 1 [X]  w 2 [X]

S 1 is not 1SR!

Example

T 1 T 2

T 2 T 1

CS 347 Notes07 69

Second example

S 2 : r 1 [X 1 ]  w 1 [X 1 ]  w 1 [X 2 ]

r 2 [X 1 ]  w 2 [X 1 ]  w 2 [X 2 ]

S 2 ’: r 1 [X]  w 1 [X]  w 1 [X]

r 2 [X]  w 2 [X]  w 2 [X]

P(S 2 ): T 1  T 2

S 2 is 1SR

CS 347 Notes07 70

Second example

S 2 : r 1 [X 1 ]  w 1 [X 1 ]  w 1 [X 2 ]

r 2 [X 1 ]  w 2 [X 1 ]  w 2 [X 2 ]

S 2 ’: r 1 [X]  w 1 [X]  w 1 [X]

r 2 [X]  w 2 [X]  w 2 [X]

  • Equivalent serial schedule

SS : r 1 [X]  w 1 [X]

r 2 [X]  w 2 [X]

CS 347 Notes07 71

Question: Is this a “good” schedule?

S 3 : r 1 [X 1 ]  w 1 [X 1 ]  w 1 [X 2 ]

r 2 [X 1 ]  w 2 [X 1 ]

CS 347 Notes07 72

Question: Is this a “good” schedule?

S 3 : r 1 [X 1 ]  w 1 [X 1 ]  w 1 [X 2 ]

r 2 [X 1 ]  w 2 [X 1 ]

S 3 : r 1 [X]  w 1 [X]  w 1 [X]

r 2 [X]  w 2 [X]

S 3 : r 1 [X]  w 1 [X]

r 2 [X]  w 2 [X]

to be valid schedule, need precedence edge between w1(X) and w2(X)

CS 347 Notes07 79

Outline

  • Basic Algorithms
  • Improved (Higher Availability) Algorithms
    • Mobile Primary
    • Available Copies (and 1SR)
  • Multiple Fragments & Other Issues

CS 347 Notes07 80

Multiple fragments

Fragment 1

Fragment 2

Node 1 Node 2 Node 3 Node 4

  • A transaction spanning multiple fragments must
    • follow locking rules for each fragment
    • commit must involve “majority” in each fragment

CS 347 Notes07 81

  • Careful with update transactions that

read but do not modify a fragment

Example:

F

T T

C F

C

C3 C

Read lock F1 at C1 (^) Read lock F2 at C

CS 347 Notes07 82

C1,C3 fail…

F

T

T

C

F

C

C3 C

Writes F1 at C Commits F1 at C2 (^) Writes F2 at C Commits F2 at C

CS 347 Notes07 83

Equivalent history:

r 1 [F 1 ] r 2 [F 2 ] w 1 [F 2 ] w 2 [F 1 ]

not serializable!

Solution: commit at read sites too

CS 347 Notes07 84

C1,C3 fail…

F

T

T

C

F

C

C3 C

Writes F1 at C Commits F1 at C2 (^) Writes F2 at C Commits F2 at C cannot commit at F because U= {C1} is out of date...

CS 347 Notes07 85

Read-Only Transactions

  • Can provide “weaker correctness”
  • Does not impact values stored in DB

0 0

0 0

C1: primary C2: backup

T1: A  3
T2: B  5
A:
B: B:
A:

CS 347 Notes07 86

Later on:

0 3 0 5

0 3 0

C1: primary C2: backup

T1: A  3
T2: B  5

R1 read transaction sees current sate

R2 read transaction at backup sees “old” but “valid” state

B  5
A:
B: :B
:A

CS 347 Notes07 87

States Are Equivalent

  • States at Primary:
    • no transactions
    • T
    • T1, T
    • T1, T2, T
  • States at Backup:
    • no transactions
    • T
    • T1, T
    • T1, T2, T

CS 347 Notes07 88

States Are Equivalent

  • States at Primary:
    • no transactions
    • T
    • T1, T
    • T1, T2, T
  • States at Backup:
    • no transactions
    • T
    • T1, T
    • T1, T2, T

at this point in time, backup may be behind...

CS 347 Notes07 89

Schedule is Serializable

  • S1 = T1 R1 T2 T3 R2 T4 ...(R1)...(R2)...

CS 347 Notes07 90

Example 2

  • A, B have different primaries now
  • 1-safe protocol used

0 0

0 0

C1 C
T1: A  3
T2: B  5

primary B: B: primary

:A :A

Paxos

  • Paxos is a replicated data management

algorithm supposedly used in

ZooKeeper.

CS 347 Notes07 97

Paxos with 2 Phases

CS 347 Notes07 98

Paxos with 2 Phases

CS 347 Notes07 99

primary (leader)

3 phase commit!

  • uses majority voting
  • can handle partitions