Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Sequential Consistency in Shared Memory Multiprocessors, Slides of Computer Science

An introduction to the concept of sequential consistency (sc) in the context of shared memory multiprocessors. It explains how sc ensures that all processors see the same total order of memory operations and discusses the importance of write atomicity. The document also includes examples and sufficient conditions for achieving sc.

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekanath
ekanath 🇮🇳

3.8

(4)

80 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Objectives_template
file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture11/11_1.htm[6/14/2012 11:57:04 AM]
Module 6: Shared Memory Multiprocessors: Consistency and Coherence
Lecture 11: Introduction to Snoopy Coherence
The Lecture Contains:
Snoopy Protocols
Write Through Caches
State Transition
Ordering Memory op
Write Through is Bad
Memory Consistency
Consistency Model
Sequential Consistency
What is Program Order?
OOO and SC
SC Example
Implementing SC
Write Atomicity
Summary of SC
Back to Shared Bus
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Understanding Sequential Consistency in Shared Memory Multiprocessors and more Slides Computer Science in PDF only on Docsity!

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

The Lecture Contains:

Snoopy Protocols

Write Through Caches

State Transition

Ordering Memory op

Write Through is Bad

Memory Consistency

Consistency Model

Sequential Consistency

What is Program Order?

OOO and SC

SC Example

Implementing SC

Write Atomicity

Summary of SC

Back to Shared Bus

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

Snoopy Protocols

Cache coherence protocols implemented in bus-based machines are called snoopy protocols The processors snoop or monitor the bus and take appropriate protocol actions based on snoop results Cache controller now receives requests both from processor and bus Since cache state is maintained on a per line basis that also dictates the coherence granularity Cannot normally take a coherence action on parts of a cache line The coherence protocol is implemented as a finite state machine on a per cache line basis The snoop logic in each processor grabs the address from the bus and decides if any action should be taken on the cache line containing that address (only if the line is in cache)

Write Through Caches

There are only two cache line states Invalid (I): not in cache Valid (V): present in cache, may be present in other caches also Read access to a cache line in I state generates a BusRd request on the bus Memory controller responds to the request and after reading from memory launches the line on the bus Requester matches the address and picks up the line from the bus and fills the cache in V state A store to a line always generates a BusWr transaction on the bus (since write through); other sharers either invalidate the line in their caches or update the line with new value

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

Write Through is Bad

High bandwidth requirement Every write appears on the bus Assume a 3 GHz processor running application with 10% store instructions, assume CPI of 1 If the application runs for 100 cycles it generates 10 stores; assume each store is 4 bytes; 40 bytes are generated per 100/3 ns i.e. BW of 1.2 GB/s A 1 GB/s bus cannot even support one processor There are multiple processors and also there are read misses Writeback caches absorb most of the write traffic Writes that hit in cache do not go on bus (not visible to others) Complicated coherence protocol with many choices

Memory Consistency

Need a more formal description of memory ordering How to establish the order between reads and writes from different processors? The most clear way is to use synchronization

P0: A=1; flag= P1: while (!flag); print A;

Another example (assume A=0, B=0 initially)

P0: A=1; print B; P1: B=1; print A;

What do you expect? Memory consistency model is a contract between programmer and hardware regarding memory ordering

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

Consistency Model

A multiprocessor normally advertises the supported memory consistency model This essentially tells the programmer what the possible correct outcome of a program could be when run on that machine Cache coherence deals with memory operations to the same location, but not different locations Without a formally defined order across all memory operations it often becomes impossible to argue about what is correct and what is wrong in shared memory Various memory consistency models Sequential consistency (SC) is the most intuitive one and we will focus on it now (more consistency models later)

Sequential Consistency

Total order achieved by interleaving accesses from different processors The accesses from the same processor are presented to the memory system in program order Essentially, behaves like a randomly moving switch connecting the processors to memory Picks the next access from a randomly chosen processor Lamport's definition of SC A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program

(A, B) = (0,1); interleaving: B=1; print A; A=1; print B (A, B) = (1,0); interleaving: A=1; print B; B=1; print A (A, B) = (1,1); interleaving: A=1; B=1; print A; print B A=1; B=1; print B; print A

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

SC Example

Consider the following example

P0: A=1; print B; P1: B=1; print A;

Possible outcomes for an SC machine

(A, B) = (0,0) is impossible: read of A must occur before write of A and read of B must occur before write of B i.e. print A < A=1 and print B < B=1, but A=1 < print B and B=1 < print A; thus print B < B=1 < print A < A=1 < print B which implies print B < print B, a contradiction

Implementing SC

Two basic requirements Memory operations issued by a processor must become visible to others in program order Need to make sure that all processors see the same total order of memory operations: in the previous example for the (0,1) case both P0 and P1 should see the same interleaving: B=1; print A; A=1; print B The tricky part is to make sure that writes become visible in the same order to all processors Write atomicity : as if each write is an atomic operation Otherwise, two processors may end up using different values (which may still be correct from the viewpoint of cache coherence, but will violate SC)

Module 6: Shared Memory Multiprocessors: Consistency and Coherence

Lecture 11: Introduction to Snoopy Coherence

Write Atomicity

Example (A=0, B=0 initially)

P0: A=1; P1: while (!A); B=1; P2: while (!B); print A;

A correct execution on an SC machine should print A= A=0 will be printed only if write to A is not visible to P2, but clearly it is visible to P since it came out of the loop Thus A=0 is possible if P1 sees the order A=1 < B=1 and P2 sees the order B=1 < A= i.e. from the viewpoint of the whole system the write A=1 was not “atomic” Without write atomicity P2 may proceed to print 0 with a stale value from its cache

Summary of SC

Program order from each processor creates a partial order among memory operations Interleaving of these partial orders defines a total order Sequential consistency: one of many total orders A multiprocessor is said to be SC if any execution on this machine is SC compliant Sufficient but not necessary conditions for SC Issue memory operation in program order Every processor waits for write to complete before issuing the next operation Every processor waits for read to complete and the write that affects the returned value to complete before issuing the next operation (important for write atomicity)