본문 바로가기

csapp34

From Machine-Level Code to Data-Flow Graphs We focus just on the computation performed by the loop, since this is the dominating factor in performance for large vectors. The compiled code for this loop consists of four instructions, with registers %rdx holding a pointer to the ith element of array data, %rax holding a pointer to the end of the array, and %xmm0 holding the accumulated value acc. with the initial multiplication instruction .. 2023. 5. 20.
5.7.2 Functional Unit Performance These timings are typical for other proces- sors as well. Each operation is characterized by its latency, meaning the total time required to perform the operation the issue time, meaning the minimum number of clock cycles between two independent operations of the same type the capacity, indicating the number of functional units capable of performing that operation. This short issue time is achie.. 2023. 5. 18.
5.7 Understanding Modern Processors As we seek to push the performance further, we must consider optimizations that exploit the microarchitecture of the processor—that is, the underlying system design by which a processor executes instructions. We will find that two different lower bounds characterize the maximum performance of a program The latency bound is encountered when a series of operations must be performed in strict seque.. 2023. 5. 17.
5.3 Program Example The declaration uses data_t to designate the data type of the underlying elements. We allocate the data array block to store the vector elements as an array of len objects of type data_t. An important feature to note is that get_vec_element, the vector access routine, performs bounds checking for every vector reference 5.4 Eliminating Loop Inefficiencies This optimization is an instance of a gen.. 2023. 5. 16.
5. Optimizing ProgramPerformance 5.1 Capabilities and Limitations of Optimizing Compilers Invoking gcc with option -O1 or higher (e.g., -O2 or -O3) will cause it to apply more extensive optimizations. These can further improve program performance, but they may expand the program size and they may make the program more difficult to debug using standard debugging tools. To appreciate the challenges of deciding which program trans.. 2023. 5. 15.
Diminishing Returns of Deep Pipelining Comb logic의 시간을 줄인다 하더라도 파이프라인 레지스터의 시간은 줄일수 없기 떄문에 한계가 생긴다. 현대의 프로세서들은 매우 깊은 파이프라인을 구성한다. (주로 15단계나 그보다 많은 단계들로) 프로세서 설계자들은 명령실행을 많은수의 단순한 단계로 나눈다. 이렇게 함으로써 각각의 단계들은 최소한의 딜레이를 갖게 된다 4.4.4 Pipelining a System with Feedback x86-64 : 인텔(INTEL)의 CPU 시리즈 이름이자 그 CPU의 명령체계 아키텍쳐 이름, 뒤에 64는 64비트를 의미함 y86-64 : x86-64를 단순화한 버전 4.5 Pipelined Y86-64 Implementations 4.5.1 SEQ+: Rearranging the Computation St.. 2023. 5. 8.
4.4 General Principles of Pipelining A key feature of pipelining is that it increases the throughput of the system (i.e., the number of customers served per unit time), but it may also slightly increase the latency (i.e., the time required to service an individual customer) ex) 단위시간당 손님들의 수가 늘어나는 반면 손님 한명당 서비스해주는데 필요한 시간은 늘어난다. 4.4.1 Computational Pipelines Combinational logic : It consists of some logic that performs a computation.. 2023. 5. 5.
4.2.5 Memory and Clocking In hardware, a register is directly connected to the rest of the circuit by its input and output wires. In machine-level programming, the registers represent a small collection of addressable words in the CPU, where the addresses consist of register IDs. A key point is that the registers serve as barriers between the combinational logic in different parts of the circuit. This register file has t.. 2023. 5. 4.
4.2.2 Combinational Circuits and HCL Boolean Expressions Upper And 는 s == 0 일때 signal b를 보낸다. Lower And 는 s == 1 일떄 signal a 를 보낸다. C논리와 HCL표현의 차이에는 몇가지 차이가 있는데 주목할만하다. C expression : 1. 프로그램이 실행하는동안에만 값이 구해진다. 2. arbitrary integers를 허용한다. 3. 부분적으로 계산이 허용된다. ex) (a && !a) && func(b,c) 부분계산으로 (a && !a) ==0 이므로 func함수 호 출이 되지않는다. HCL exprssion : 1.output이 계속 변한다 input이 변할 떄마다. 2. arbitrary integers 허용 하지 않는다. 오직 0과 1로 작동 3. 부분적인 계산이 허용되지 않는다. gate는 오직 i.. 2023. 5. 3.
4.2 Logic Design and the Hardware Control Language HCL 하드웨어 디자인에서, 전자회로는 비트로되어진 함수를 계산하거나 메모리의요소와는 다른종류로서의 비트를 저장하곤한다. 1 의미 == voltage 1.0volts 의미 0 의미 == 0.0volts 의미 4.2.1 Logic Gates 논리 회로는 digital circuits를 기본적으로 계산하는 요소이다. 위의 그림은 input a,b 둘다 비트가 1일 경우 == output 1 a,b 둘다 비트가 0일 경우 == output 1 4.2.2 Combinational Circuits and HCL Boolean Expressions Every logic gate input must be connected to exactly one of the following: (1) one of the system in.. 2023. 5. 3.
728x90