5. Optimizing ProgramPerformance

728x90

5.1 Capabilities and Limitations of Optimizing Compilers

Invoking gcc with option -O1 or higher (e.g., -O2 or -O3) will cause it to apply more extensive optimizations.
These can further improve program performance, but they may expand the program size and they may make the program more difficult to debug using standard debugging tools.
To appreciate the challenges of deciding which program transformations are safe or not, consider the following two procedures:

function twiddle2 is more efficient. It requires only three memory references (read *xp, read *yp, write *xp), whereas twiddle1 requires six (two reads of *xp, two reads of *yp, and two writes of *xp).

위의 예시에서도 func1 == 0 + 1 +2 +3 == 6이 나오지만 func2 == 4* 0 == 0 이나와 이것도 겉보기에는 같아 보이지만 이러한 반례가 나와서 최적화를 할수가 없다.
Among compilers, gcc is considered adequate, but not exceptional, in terms of its optimization capabilities. It performs basic optimizations, but it does not per- form the radical transformations on programs that more “aggressive” compilers do. As a consequence, programmers using gcc must put more effort into writing programs in a way that simplifies the compiler’s task of generating efficient code.

5.2 Expressing Program Performance

We introduce the metric cycles per element, abbreviated CPE, to express program performance in a way that can guide us in improving the code
psum2 함수는 loop unrolling 기술을 써서 CPE를 낮췄다.
loop unrolling 란?

We focus our efforts on minimizing the CPE for our computations. By this measure, psum2, with a CPE of 6.0, is superior to psum1, with a CPE of 9.0.

728x90

5.7 Understanding Modern Processors (0)	2023.05.17
5.3 Program Example (2)	2023.05.16
Diminishing Returns of Deep Pipelining (0)	2023.05.08
4.4 General Principles of Pipelining (0)	2023.05.05
4.2.5 Memory and Clocking (0)	2023.05.04

정구지닷컴