Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Understanding Cache Coherence in Multi-core Architecture, Slides of Computer Science

Dhirubhai Ambani Institute of Information and Communication Technology Computer Science

An in-depth exploration of cache coherence in multi-core architecture. The lecture covers four different organizations of shared memory systems: bus-based smp, shared cache, dancehall, and distributed shared memory. The advantages and disadvantages of each organization and explains the concept of cache coherence to ensure that all processors see the same memory values. The lecture also includes examples and explanations of write-through and writeback caches, as well as the importance of ordering memory operations.

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekanath 🇮🇳

3.8

(4)

80 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Objectives_template

file:///D|/...audhary,%20Dr.%20Sanjeev%20K%20Aggrwal%20&%20Dr.%20Rajat%20Moona/Multi-core_Architecture/lecture10/10_1.htm[6/14/2012 11:56:42 AM]

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

The Lecture Contains:

Four Organizations

Hierarchical Design

Cache Coherence

Example

What Went Wrong?

Definitions

Ordering Memory op

Bus-based SMP

Partial preview of the text

Download Understanding Cache Coherence in Multi-core Architecture and more Slides Computer Science in PDF only on Docsity!

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

The Lecture Contains:

Four Organizations

Hierarchical Design

Cache Coherence

Example

What Went Wrong?

Definitions

Ordering Memory op

Bus-based SMP

s

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Shared Memory Multiprocessors

Four Organizations

Shared cache

The switch is a simple controller for granting access to cache banks

Interconnect is between the processors and the shared cache Which level of cache hierarchy is shared depends on the design: Chip multiprocessors today normally share the outermost level (L2 or L3 cache) The cache and memory are interleaved to improve bandwidth by allowing multiple concurrent accesses Normally small scale due to heavy bandwidth demand on switch and shared cache

Bus-based SMP

Scalability is limited by the shared bus bandwidth

Interconnect is a shared bus located between the private cache hierarchies and memory controller The most popular organization for small to medium-scale servers Possible to connect 30 or so processors with smart bus design Bus bandwidth requirement is lower compared to shared cache approach Why?

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Four Organizations

In all four organizations caches play an important role in reducing latency and bandwidth requirement If an access is satisfied in cache, the transaction will not appear on the interconnect and hence the bandwidth requirement of the interconnect will be less (shared L1 cache does not have this advantage) In distributed shared memory (DSM) cache and local memory should be used cleverly Bus-based SMP and DSM are the two designs supported today by industry vendors In bus-based SMP every cache miss is launched on the shared bus so that all processors can see all transactions In DSM this is not the case

Hierarchical Design

Possible to combine bus-based SMP and DSM to build hierarchical shared memory Sun Wildfire connects four large SMPs (28 processors) over a scalable interconnect to form a 112p multiprocessor IBM POWER4 has two processors on-chip with private L1 caches, but shared L2 and L3 caches (this is called a chip multiprocessor); connect these chips over a network to form scalable multiprocessors Next few lectures will focus on bus-based SMPs only

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Cache Coherence

Intuitive memory model For sequential programs we expect a memory location to return the latest value written to that location For concurrent programs running on multiple threads or processes on a single processor we expect the same model to hold because all threads see the same cache hierarchy (same as shared L1 cache) For multiprocessors there remains a danger of using a stale value: in SMP or DSM the caches are not shared and processors are allowed to replicate data independently in each cache; hardware must ensure that cached values are coherent across the system and they satisfy programmers' intuitive memory model

Example

Assume a write-through cache i.e. every store updates the value in cache as well as in memory P0: reads x from memory, puts it in its cache, and gets the value 5 P1: reads x from memory, puts it in its cache, and gets the value 5 P1: writes x=7, updates its cached value and memory value P0: reads x from its cache and gets the value 5 P2: reads x from memory, puts it in its cache, and gets the value 7 (now the system is completely incoherent) P2: writes x=10, updates its cached value and memory value

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

What Went Wrong?

Need to formalize the intuitive memory model In sequential programs the order of read/write is defined by the program order; the notion of “last write” is well-defined For multiprocessors how do you define “last write to a memory location” in presence of independent caches? Within a processor it is still fine, but how do you order read/write across processors?

Definitions

Memory operation : a read (load), a write (store), or a read-modify-write Assumed to take place atomically A memory operation is said to issue when it leaves the issue queue and looks up the cache A memory operation is said to perform with respect to a processor when a processor can tell that from other issued memory operations A read is said to perform with respect to a processor when subsequent writes issued by that processor cannot affect the returned read value A write is said to perform with respect to a processor when a subsequent read from that processor to the same address returns the new value

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Ordering Memory op

A memory operation is said to complete when it has performed with respect to all processors in the system Assume that there is a single shared memory and no caches Memory operations complete in shared memory when they access the corresponding memory locations Operations from the same processor complete in program order: this imposes a partial order among the memory operations Operations from different processors are interleaved in such a way that the program order is maintained for each processor: memory imposes some total order (many are possible)

Example

P0: x=8; u=y; v=9; P1: r=5; y=4; t=v; Legal total order: x=8; u=y; r=5; y=4; t=v; v=9; Another legal total order: x=8; r=5; y=4; u=y; v=9; t=v;

“Last” means the most recent in some legal total order A system is coherent if Reads get the last written value in the total order All processors see writes to a location in the same order

Understanding Cache Coherence in Multi-core Architecture, Slides of Computer Science

Related documents

Partial preview of the text

Download Understanding Cache Coherence in Multi-core Architecture and more Slides Computer Science in PDF only on Docsity!

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

The Lecture Contains:

Four Organizations

Hierarchical Design

Cache Coherence

Example

What Went Wrong?

Definitions

Ordering Memory op

Bus-based SMP

s

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Shared Memory Multiprocessors

Four Organizations

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Four Organizations

Hierarchical Design

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Cache Coherence

Example

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

What Went Wrong?

Definitions

Module 5: Performance Issues in Shared Memory and Introduction to Coherence

Lecture 10: Introduction to Coherence

Ordering Memory op

Example