





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to the concept of sequential consistency (sc) in the context of shared memory multiprocessors. It explains how sc ensures that all processors see the same total order of memory operations and discusses the importance of write atomicity. The document also includes examples and sufficient conditions for achieving sc.
Typology: Slides
1 / 9
This page cannot be seen from the preview
Don't miss anything!
Cache coherence protocols implemented in bus-based machines are called snoopy protocols The processors snoop or monitor the bus and take appropriate protocol actions based on snoop results Cache controller now receives requests both from processor and bus Since cache state is maintained on a per line basis that also dictates the coherence granularity Cannot normally take a coherence action on parts of a cache line The coherence protocol is implemented as a finite state machine on a per cache line basis The snoop logic in each processor grabs the address from the bus and decides if any action should be taken on the cache line containing that address (only if the line is in cache)
There are only two cache line states Invalid (I): not in cache Valid (V): present in cache, may be present in other caches also Read access to a cache line in I state generates a BusRd request on the bus Memory controller responds to the request and after reading from memory launches the line on the bus Requester matches the address and picks up the line from the bus and fills the cache in V state A store to a line always generates a BusWr transaction on the bus (since write through); other sharers either invalidate the line in their caches or update the line with new value
High bandwidth requirement Every write appears on the bus Assume a 3 GHz processor running application with 10% store instructions, assume CPI of 1 If the application runs for 100 cycles it generates 10 stores; assume each store is 4 bytes; 40 bytes are generated per 100/3 ns i.e. BW of 1.2 GB/s A 1 GB/s bus cannot even support one processor There are multiple processors and also there are read misses Writeback caches absorb most of the write traffic Writes that hit in cache do not go on bus (not visible to others) Complicated coherence protocol with many choices
Need a more formal description of memory ordering How to establish the order between reads and writes from different processors? The most clear way is to use synchronization
P0: A=1; flag= P1: while (!flag); print A;
Another example (assume A=0, B=0 initially)
P0: A=1; print B; P1: B=1; print A;
What do you expect? Memory consistency model is a contract between programmer and hardware regarding memory ordering
A multiprocessor normally advertises the supported memory consistency model This essentially tells the programmer what the possible correct outcome of a program could be when run on that machine Cache coherence deals with memory operations to the same location, but not different locations Without a formally defined order across all memory operations it often becomes impossible to argue about what is correct and what is wrong in shared memory Various memory consistency models Sequential consistency (SC) is the most intuitive one and we will focus on it now (more consistency models later)
Total order achieved by interleaving accesses from different processors The accesses from the same processor are presented to the memory system in program order Essentially, behaves like a randomly moving switch connecting the processors to memory Picks the next access from a randomly chosen processor Lamport's definition of SC A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program
(A, B) = (0,1); interleaving: B=1; print A; A=1; print B (A, B) = (1,0); interleaving: A=1; print B; B=1; print A (A, B) = (1,1); interleaving: A=1; B=1; print A; print B A=1; B=1; print B; print A
Consider the following example
P0: A=1; print B; P1: B=1; print A;
Possible outcomes for an SC machine
(A, B) = (0,0) is impossible: read of A must occur before write of A and read of B must occur before write of B i.e. print A < A=1 and print B < B=1, but A=1 < print B and B=1 < print A; thus print B < B=1 < print A < A=1 < print B which implies print B < print B, a contradiction
Two basic requirements Memory operations issued by a processor must become visible to others in program order Need to make sure that all processors see the same total order of memory operations: in the previous example for the (0,1) case both P0 and P1 should see the same interleaving: B=1; print A; A=1; print B The tricky part is to make sure that writes become visible in the same order to all processors Write atomicity : as if each write is an atomic operation Otherwise, two processors may end up using different values (which may still be correct from the viewpoint of cache coherence, but will violate SC)
Example (A=0, B=0 initially)
P0: A=1; P1: while (!A); B=1; P2: while (!B); print A;
A correct execution on an SC machine should print A= A=0 will be printed only if write to A is not visible to P2, but clearly it is visible to P since it came out of the loop Thus A=0 is possible if P1 sees the order A=1 < B=1 and P2 sees the order B=1 < A= i.e. from the viewpoint of the whole system the write A=1 was not “atomic” Without write atomicity P2 may proceed to print 0 with a stale value from its cache
Program order from each processor creates a partial order among memory operations Interleaving of these partial orders defines a total order Sequential consistency: one of many total orders A multiprocessor is said to be SC if any execution on this machine is SC compliant Sufficient but not necessary conditions for SC Issue memory operation in program order Every processor waits for write to complete before issuing the next operation Every processor waits for read to complete and the write that affects the returned value to complete before issuing the next operation (important for write atomicity)