Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ECE 320 - Performance Optimization and Amdahl's Law: Solutions to Assignment 1, Assignments of Computer Architecture and Organization

Detailed solutions to assignment 1 in ece 320, focusing on performance optimization concepts and amdahl's law. It explores the impact of optimizing different portions of a program on overall system speedup, demonstrating the importance of optimizing common case improvements. The solutions delve into the application of amdahl's law to calculate speedup and analyze the trade-offs between different optimization strategies. The document also examines the relationship between clock rate, mips, and execution time, highlighting the importance of mips for faster execution.

Typology: Assignments

2024/2025

Uploaded on 02/10/2025

seannn
seannn 🇨🇦

1 document

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Assignment 1 - Performance - Solutions
ECE 320
Question 1
(Question 1.17 from Hennessy and Patterson 5th Edition)
Your company has just bought a new Intel Core i5 dual-core processor, and you have been tasked with
optimizing your software for this processor. You will run two applications on this processor, Application A
and Application B. When run together, Application A requires 80% of the resources while Application B
requires only 20%. 40% of Application A and 99% of Application B can be optimized. When a portion of a
program is optimized, that portion is sped up by a factor of two. The application details are summarized in
the table below.
Application A Application B
Resource Requirement 80% 20%
Optimizable Portion 40% 99%
(a) Assume only Application A is optimized. How much speedup would you achieve with Application A if
it is run in isolation? How much overall system speedup would you observe?
(b) Assume only Application B is optimized. How much speedup would you achieve with Application B if
it is run in isolation? How much overall system speedup would you observe?
(c) What observations can you make between the results from (a) and (b)?
Solution
Recall Amdahl’s Law, which states
S=1
(1 f) + f
s
where
Srepresents the speedup,
frepresents the fraction of the program that is optimized, and
srepresents the speedup of the optimized portion.
(a) The speedup of Application A in isolation, SA, is
SA=1
(1 0.4) + 0.4
2
= 1.25
1
pf3
pf4
pf5

Partial preview of the text

Download ECE 320 - Performance Optimization and Amdahl's Law: Solutions to Assignment 1 and more Assignments Computer Architecture and Organization in PDF only on Docsity!

Assignment 1 - Performance - Solutions

ECE 320

Question 1

(Question 1.17 from Hennessy and Patterson 5th Edition) Your company has just bought a new Intel Core i5 dual-core processor, and you have been tasked with optimizing your software for this processor. You will run two applications on this processor, Application A and Application B. When run together, Application A requires 80% of the resources while Application B requires only 20%. 40% of Application A and 99% of Application B can be optimized. When a portion of a program is optimized, that portion is sped up by a factor of two. The application details are summarized in the table below.

Application A Application B Resource Requirement 80% 20% Optimizable Portion 40% 99%

(a) Assume only Application A is optimized. How much speedup would you achieve with Application A if it is run in isolation? How much overall system speedup would you observe?

(b) Assume only Application B is optimized. How much speedup would you achieve with Application B if it is run in isolation? How much overall system speedup would you observe?

(c) What observations can you make between the results from (a) and (b)?

Solution

Recall Amdahl’s Law, which states S =

(1 − f ) + fs where

  • S represents the speedup,
  • f represents the fraction of the program that is optimized, and
  • s represents the speedup of the optimized portion.

(a) The speedup of Application A in isolation, SA, is

SA =

(1 − 0 .4) + 02.^4

The total speedup of the overall system, Stot−A, is

Stot−A =

Note that SA is used as the speedup of the optimized portion, s, of the overall system.

(b) The same approach can be used as in (a), starting with the speedup of Application B in isolation.

SB =

The total speedup is Stot−B =

(c) Stot−A = 1.19 and Stot−B = 1.11. Despite 99% of Application B having been sped up, the overall speedup is less than that of the Application A case. This demonstrates that, when considering Amdahl’s Law, the highest f should be chosen to be optimized, favouring common case improvements.

Question 2

(Question 1.15 from Hennessy and Patterson 5th Edition) Assume that we make an enhancement to a computer that improves some mode of execution by a factor of ten. Enhanced mode is used 50% of the time, measured as a percentage of the execution time when the enhanced mode is in use.

(a) What percentage of the original unenhanced execution time has been converted to enhanced mode?

(b) What is the speedup we have obtained from enhanced mode?

Solution

Recall Amdahl’s Law, which states S = Told Tnew

(1 − f ) + fs where

  • S represents the speedup,
  • Told and Tnew represent the application time before and after optimization,
  • f represents the fraction of the program that is optimized, and
  • s represents the speedup of the optimized portion.

(a) Based on the problem statement, enhanced mode is used 50% of the time when the enhanced mode is in use. This means the optimized portion execution time is 50% of Tnew. To get an expression in terms of Told, we have f , which is the fraction of the original application without enhanced mode which can be optimized. The value f × Told corresponds to the amount of execution time of the original application

(a) 1.25 times faster than that of CPU B?

(b) 1.1 times faster than that of CPU B?

Solution

Let t represent the clock cycle time of each CPU.

CPU A Instruction Type CPI IC Compare 1 20% × ICA Branch 2 20% × ICA Other 1 60% × ICA

CPU B

Instruction Type CPI IC Compare + Branch 2 20% × ICA Other 1 60% × ICA

(a) The total execution time is computed using the data for each CPU in the tables above.

TA = tA

∑^3

i=

[ICi × CP Ii] TB = tB

∑^2

i=

[ICi × CP Ii]

= tA × ICA(0. 2 × 1 + 0. 2 × 2 + 0. 6 × 1) = 1. 25 tA × ICA(0. 2 × 2 + 0. 6 × 1) = 1. 2 × ICA × tA = 1. 25 × ICA × tA

Because TA < TB , CPU A is faster.

(b) The total execution time of TA is the same as (a).

TB = tB

∑^2

i=

[ICi × CP Ii]

= 1. 1 tA × ICA(0. 2 × 2 + 0. 6 × 1) = 1. 1 × ICA × tA

Because TA > TB , CPU B is faster.

Question 4

(a) Consider two competing processors. Processor A has a higher clock rate and a higher MIPS (millions of instructions per second) than Processor B. Under what conditions, if any, will Processor A always execute faster than Processor B?

(b) Suppose that there are two implementations of the same instruction set architecture, Machine A and Machine B. The table below shows their effective CPIs for a particular program and the clock cycle time for each machine.

Machine A Machine B Clock Cycle Time 20 ns 15 ns CPI 1.5 1.

Which machine is faster for this program and by how much?

Solution

(a) The total execution time of a program is Tprog = IC × CP I × tclk, where tclk is the clock cycle time. The CPI can be expressed in terms of MIPS as follows, where f represents the clock rate.

MIPS =

f CP I × 106 =⇒ CP I =

f MIPS × 106

Tprog = IC × CP I × tclk

= IC ×

f MIPS × 106

× tclk

= IC ×

MIPS × 106

Therefore, the clock rate does not have an effect on overall execution time. The processor with the higher MIPS is always faster if

  • the number of instructions in the program is constant, and
  • both processors use the same benchmarks, ISA, compiler, and OS.

(b) The total execution time of a program is Tprog = IC × CP I × tclk, where tclk is the clock cycle time. Given that both Machine A and B execute the same program, the instruction count will be the same; thus, IC is not needed.

TA = 1. 5 × IC × 20 ns TB = 1. 0 × IC × 15 ns = 30 × ICns = 15 × ICns

Since TB < TA, Machine B is faster.

TA

TB

Therefore, Machine B is twice as fast as Machine A.