5.7 Understanding Modern Processors

728x90

As we seek to push the performance further, we must consider optimizations that exploit the microarchitecture of the processor—that is, the underlying system design by which a processor executes instructions.
We will find that two different lower bounds characterize the maximum performance of a program
The latency bound is encountered when a series of operations must be performed in strict sequence, because the result of one operation is required before the next one can begin.
The throughput bound characterizes the raw computing capacity of the processor’s functional units. This bound becomes the ultimate limit on program performance.

5.7.1 Overall Operation

The overall design has two main parts: the instruction control unit (ICU), which is responsible for reading a sequence of instructions from memory and generating from these a set of primitive operations to perform on program data
the execution unit (EU), which then executes these operations. Compared to the simple in-order pipeline we studied in Chap- ter 4, out-of-order processors require far greater and more complex hardware, but they are better at achieving higher degrees of instruction-level parallelism.
The ICU reads the instructions from an instruction cache—a special high- speed memory containing the most recently accessed instructions
the ICU fetches well ahead of the currently executing instructions, so that it has enough time to decode these and send operations down to the EU.
The EU receives operations from the instruction fetch unit

728x90

From Machine-Level Code to Data-Flow Graphs (0)	2023.05.20
5.7.2 Functional Unit Performance (0)	2023.05.18
5.3 Program Example (2)	2023.05.16
5. Optimizing ProgramPerformance (0)	2023.05.15
Diminishing Returns of Deep Pipelining (0)	2023.05.08

정구지닷컴