728x90
- As we seek to push the performance further, we must consider optimizations that exploit the microarchitecture of the processor—that is, the underlying system design by which a processor executes instructions.
- We will find that two different lower bounds characterize the maximum performance of a program
- The latency bound is encountered when a series of operations must be performed in strict sequence, because the result of one operation is required before the next one can begin.
- The throughput bound characterizes the raw computing capacity of the processor’s functional units. This bound becomes the ultimate limit on program performance.
5.7.1 Overall Operation
- The overall design has two main parts: the instruction control unit (ICU), which is responsible for reading a sequence of instructions from memory and generating from these a set of primitive operations to perform on program data
- the execution unit (EU), which then executes these operations. Compared to the simple in-order pipeline we studied in Chap- ter 4, out-of-order processors require far greater and more complex hardware, but they are better at achieving higher degrees of instruction-level parallelism.
- The ICU reads the instructions from an instruction cache—a special high- speed memory containing the most recently accessed instructions
- the ICU fetches well ahead of the currently executing instructions, so that it has enough time to decode these and send operations down to the EU.
- The EU receives operations from the instruction fetch unit
728x90
'csapp' 카테고리의 다른 글
From Machine-Level Code to Data-Flow Graphs (0) | 2023.05.20 |
---|---|
5.7.2 Functional Unit Performance (0) | 2023.05.18 |
5.3 Program Example (2) | 2023.05.16 |
5. Optimizing ProgramPerformance (0) | 2023.05.15 |
Diminishing Returns of Deep Pipelining (0) | 2023.05.08 |
댓글