



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Detailed solutions to assignment 1 in ece 320, focusing on performance optimization concepts and amdahl's law. It explores the impact of optimizing different portions of a program on overall system speedup, demonstrating the importance of optimizing common case improvements. The solutions delve into the application of amdahl's law to calculate speedup and analyze the trade-offs between different optimization strategies. The document also examines the relationship between clock rate, mips, and execution time, highlighting the importance of mips for faster execution.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!
(Question 1.17 from Hennessy and Patterson 5th Edition) Your company has just bought a new Intel Core i5 dual-core processor, and you have been tasked with optimizing your software for this processor. You will run two applications on this processor, Application A and Application B. When run together, Application A requires 80% of the resources while Application B requires only 20%. 40% of Application A and 99% of Application B can be optimized. When a portion of a program is optimized, that portion is sped up by a factor of two. The application details are summarized in the table below.
Application A Application B Resource Requirement 80% 20% Optimizable Portion 40% 99%
(a) Assume only Application A is optimized. How much speedup would you achieve with Application A if it is run in isolation? How much overall system speedup would you observe?
(b) Assume only Application B is optimized. How much speedup would you achieve with Application B if it is run in isolation? How much overall system speedup would you observe?
(c) What observations can you make between the results from (a) and (b)?
Recall Amdahl’s Law, which states S =
(1 − f ) + fs where
(a) The speedup of Application A in isolation, SA, is
The total speedup of the overall system, Stot−A, is
Stot−A =
Note that SA is used as the speedup of the optimized portion, s, of the overall system.
(b) The same approach can be used as in (a), starting with the speedup of Application B in isolation.
The total speedup is Stot−B =
(c) Stot−A = 1.19 and Stot−B = 1.11. Despite 99% of Application B having been sped up, the overall speedup is less than that of the Application A case. This demonstrates that, when considering Amdahl’s Law, the highest f should be chosen to be optimized, favouring common case improvements.
(Question 1.15 from Hennessy and Patterson 5th Edition) Assume that we make an enhancement to a computer that improves some mode of execution by a factor of ten. Enhanced mode is used 50% of the time, measured as a percentage of the execution time when the enhanced mode is in use.
(a) What percentage of the original unenhanced execution time has been converted to enhanced mode?
(b) What is the speedup we have obtained from enhanced mode?
Recall Amdahl’s Law, which states S = Told Tnew
(1 − f ) + fs where
(a) Based on the problem statement, enhanced mode is used 50% of the time when the enhanced mode is in use. This means the optimized portion execution time is 50% of Tnew. To get an expression in terms of Told, we have f , which is the fraction of the original application without enhanced mode which can be optimized. The value f × Told corresponds to the amount of execution time of the original application
(a) 1.25 times faster than that of CPU B?
(b) 1.1 times faster than that of CPU B?
Let t represent the clock cycle time of each CPU.
CPU A Instruction Type CPI IC Compare 1 20% × ICA Branch 2 20% × ICA Other 1 60% × ICA
Instruction Type CPI IC Compare + Branch 2 20% × ICA Other 1 60% × ICA
(a) The total execution time is computed using the data for each CPU in the tables above.
TA = tA
i=
[ICi × CP Ii] TB = tB
i=
[ICi × CP Ii]
= tA × ICA(0. 2 × 1 + 0. 2 × 2 + 0. 6 × 1) = 1. 25 tA × ICA(0. 2 × 2 + 0. 6 × 1) = 1. 2 × ICA × tA = 1. 25 × ICA × tA
Because TA < TB , CPU A is faster.
(b) The total execution time of TA is the same as (a).
TB = tB
i=
[ICi × CP Ii]
= 1. 1 tA × ICA(0. 2 × 2 + 0. 6 × 1) = 1. 1 × ICA × tA
Because TA > TB , CPU B is faster.
(a) Consider two competing processors. Processor A has a higher clock rate and a higher MIPS (millions of instructions per second) than Processor B. Under what conditions, if any, will Processor A always execute faster than Processor B?
(b) Suppose that there are two implementations of the same instruction set architecture, Machine A and Machine B. The table below shows their effective CPIs for a particular program and the clock cycle time for each machine.
Machine A Machine B Clock Cycle Time 20 ns 15 ns CPI 1.5 1.
Which machine is faster for this program and by how much?
(a) The total execution time of a program is Tprog = IC × CP I × tclk, where tclk is the clock cycle time. The CPI can be expressed in terms of MIPS as follows, where f represents the clock rate.
f CP I × 106 =⇒ CP I =
f MIPS × 106
Tprog = IC × CP I × tclk
= IC ×
f MIPS × 106
× tclk
Therefore, the clock rate does not have an effect on overall execution time. The processor with the higher MIPS is always faster if
(b) The total execution time of a program is Tprog = IC × CP I × tclk, where tclk is the clock cycle time. Given that both Machine A and B execute the same program, the instruction count will be the same; thus, IC is not needed.
TA = 1. 5 × IC × 20 ns TB = 1. 0 × IC × 15 ns = 30 × ICns = 15 × ICns
Since TB < TA, Machine B is faster.
Therefore, Machine B is twice as fast as Machine A.