728x90
5.1 Capabilities and Limitations of Optimizing Compilers
- Invoking gcc with option -O1 or higher (e.g., -O2 or -O3) will cause it to apply more extensive optimizations.
- These can further improve program performance, but they may expand the program size and they may make the program more difficult to debug using standard debugging tools.
- To appreciate the challenges of deciding which program transformations are safe or not, consider the following two procedures:
- function twiddle2 is more efficient. It requires only three memory references (read *xp, read *yp, write *xp), whereas twiddle1 requires six (two reads of *xp, two reads of *yp, and two writes of *xp).
- 첫번쨰 사진처럼 twiddle1을 twiddle2로 쓸려고 하니 위의사진의 경우에는 답이 달라져서 최적화를 할수가 없다.
- 위의 예시에서도 func1 == 0 + 1 +2 +3 == 6이 나오지만 func2 == 4* 0 == 0 이나와 이것도 겉보기에는 같아 보이지만 이러한 반례가 나와서 최적화를 할수가 없다.
- Among compilers, gcc is considered adequate, but not exceptional, in terms of its optimization capabilities. It performs basic optimizations, but it does not per- form the radical transformations on programs that more “aggressive” compilers do. As a consequence, programmers using gcc must put more effort into writing programs in a way that simplifies the compiler’s task of generating efficient code.
5.2 Expressing Program Performance

- We introduce the metric cycles per element, abbreviated CPE, to express program performance in a way that can guide us in improving the code
- psum2 함수는 loop unrolling 기술을 써서 CPE를 낮췄다.
- loop unrolling 란?
- We focus our efforts on minimizing the CPE for our computations. By this measure, psum2, with a CPE of 6.0, is superior to psum1, with a CPE of 9.0.
728x90
'csapp' 카테고리의 다른 글
5.7 Understanding Modern Processors (0) | 2023.05.17 |
---|---|
5.3 Program Example (2) | 2023.05.16 |
Diminishing Returns of Deep Pipelining (0) | 2023.05.08 |
4.4 General Principles of Pipelining (0) | 2023.05.05 |
4.2.5 Memory and Clocking (0) | 2023.05.04 |
댓글