본문 바로가기
csapp

5. Optimizing ProgramPerformance

by 정구지개발자 2023. 5. 15.
728x90

5.1 Capabilities and Limitations of Optimizing Compilers

  • Invoking gcc with option -O1 or higher (e.g., -O2 or -O3) will cause it to apply more extensive optimizations.
  • These can further improve program performance, but they may expand the program size and they may make the program more difficult to debug using standard debugging tools.
  • To appreciate the challenges of deciding which program transformations are safe or not, consider the following two procedures:

  • function twiddle2 is more efficient. It requires only three memory references (read *xp, read *yp, write *xp), whereas twiddle1 requires six (two reads of *xp, two reads of *yp, and two writes of *xp).

  • 첫번쨰 사진처럼 twiddle1을 twiddle2로 쓸려고 하니 위의사진의 경우에는 답이 달라져서 최적화를 할수가 없다.

  • 위의 예시에서도 func1 == 0 + 1 +2 +3  == 6이 나오지만 func2 == 4* 0 == 0 이나와 이것도 겉보기에는 같아 보이지만 이러한 반례가 나와서 최적화를 할수가 없다.
  • Among compilers, gcc is considered adequate, but not exceptional, in terms of its optimization capabilities. It performs basic optimizations, but it does not per- form the radical transformations on programs that more “aggressive” compilers do. As a consequence, programmers using gcc must put more effort into writing programs in a way that simplifies the compiler’s task of generating efficient code.

 

5.2 Expressing Program Performance

  • We introduce the metric cycles per element, abbreviated CPE, to express program performance in a way that can guide us in improving the code
  • psum2 함수는 loop unrolling 기술을 써서 CPE를 낮췄다. 
  • loop unrolling 란?
  • We focus our efforts on minimizing the CPE for our computations. By this measure, psum2, with a CPE of 6.0, is superior to psum1, with a CPE of 9.0.
728x90

'csapp' 카테고리의 다른 글

5.7 Understanding Modern Processors  (0) 2023.05.17
5.3 Program Example  (2) 2023.05.16
Diminishing Returns of Deep Pipelining  (0) 2023.05.08
4.4 General Principles of Pipelining  (0) 2023.05.05
4.2.5 Memory and Clocking  (0) 2023.05.04

댓글