Computer Systems - Georgia Institute of Technology
COMPUTER SYSTEMS An Integrated Approach to Architecture and Operating Systems Chapter 3 Processor Implementation Copyright 2008 Umakishore Ramachandran and William D. Leahy Jr. Processor Implementation Implementation given an instruction set Instruction-set is not a description of the implementation of the processor Contract between hardware and software Allows a compiler writer to generate code for different high-level languages to execute on a processor that implements this contract Can there be different implementations of the same instruction set?
3.1 Architecture versus Implementation Market demands Why? Parallel hardware and software development Maintain compatibility for legacy software compatibility 3.2 What is involved in Processor Implementation? Organization of the electrical components (ALUs, buses, registers, etc.) commensurate with the expected price/performance characteristic of the processor. Thermal and mechanical aspects including cooling and physical geometry for placement in mother boards. Super Computers
High performance primary objective Servers Intermediate performance and cost Desktops & PCs Low cost primary objective Embedded Small size, low cost, and low power consumption primary objectives
3.3 Key hardware concepts A review of important design principles 3.3.1 Circuits Combinational logic For a given set of inputs there is one unique output Sequential logic Circuits contain elements that remember state Output demands on inputs and state 3.3.2 Hardware resources of the datapath
Memory ALU Register file Program Counter Instruction Register 3.3.3 Logic Triggering outputs clock inputs Level Triggering Outputs change based on inputs whenever clock is high Memory will be considered to be level triggered (for cost reasons)
Edge Triggering Outputs change based on inputs only when clock transitions Positive edge triggered logic when leading edge cause triggering Negative edge triggered when trailing edge causes triggering 3.3.4 Connecting the datapath elements PC Addr Din Memory Dout IR
Register-file ALU 3.3.5 Towards bus-based Design In principle we must make connections between circuit elements for every instruction Numerous connections are expensive and take up valuable space Have a set of wires that all elements can connect to and share in order to transfer information Single Bus Design PC MAR Register-file (DPRF)
IR Addr Din Memory Dout ALU Dual Bus Design PC MAR Register-file (DPRF) IR
Addr Din Memory Dout1 Dout2 ALU 3.3.6 Finite State Machine (FSM) Abstraction of a sequential logic circuit which captures States Outputs while in each state Designated start state Possible transitions Inputs which will trigger transitions Fetch Decode
Execut e 3.4 Datapath Design Processing Unit (CPU) consists of the Datapath and the Control Unit Datapath is the combination of hardware resources and their connections Example for LC-2200 ALU capable of ADD, NAND, SUB, Register file with 16 registers (32-bit) shown in Figure 3.14 PC (32-bit) Memory with 232 X 32 bit words Sample Datapath LC-2200 Datapath 32
PC LdPC A LdA B LdB LdMAR IR LdIR MAR 32
Din WrREG 2 ALU: func 00: ADD 01: NAND 10: A - B 11: A + 1 DrPC 4 regno
Ry: Rz: OP: IR[31..0] Din sign extend DrOFF 4 -bit register number to control logic 4 -bit register number to control logic 4 -bit register number to control logic 4-bit opcode to control logic Z: 1-bit boolean to control logic
3.4.1 ISA and Datapath Width We normally define a size for instructions, addresses and data operands (e.g. 32 bits) Implementation could use bus and/or interconnects of smaller size (e.g. 8 or 16 bits) Would require more operations to move a 32 bit value. Would require less chip real estate Tradeoff speed vs. price 3.4.2 Width of the Clock Pulse Combinational logic elements have a propagation delay. Register files have an access time Writing to a register requires input to be stable both before and after the leading edge of the clock arrives (set up time and hold time) Wires have a transmission delay Clock pulse must be wide enough to allow for all of the above
3.4.3 Checkpoint You should now understand the following basic concepts Basics of logic design including combinational and sequential logic circuits Hardware resources for a datapath such as register file, ALU, and memory Edge-triggered logic and how to arrive at the width of a clock cycle Datapath interconnection and buses Finite State Machines 3.5 Control Unit Design The control unit is an implementation of the Finite State Machine Depending on the current state and inputs it moves to the correct next state Typical outputs from control unit (e.g. LC-2200) Drive signals: DrPC, DrALU, DrREG, DrMEM, DrOFF Load signals: LdPC, LdA, LdB, LdMAR, LdIR Write Memory signal: WrMEM
Write Registers signal: WrREG ALU function selector: func Register selector: regno Several alternatives exist for implementation 3.5.1 ROM plus state register PC AL U Drive Signals Re ME g M OFF
P C Load Signals A B MA R IR Write Signals MEM REG Func RegSel 3.5.2 FETCH macro state Need to do We need to send PC to the memory
Read the memory contents Bring the memory contents read into the IR Increment the PC Microstates to accomplish ifetch1 PC MAR ifetch2 MEM[MAR] IR ifetch3 PC A ifetch4 A+1 PC 3.5.2 FETCH macro state (Simplifying) ifetch1 PC MAR
PC A ifetch2 MEM[MAR] IR ifetch3 A+1 PC 3.5.2 FETCH macro state Adding in control signals ifetch1 PC MAR PC A Control signals needed: DrPC LdMAR LdA ifetch2 MEM[MAR] IR
Control signals needed: DrMEM LdIR ifetch3 A+1 PC Control signals needed: func = 11 DrALU LdPC 3.5.3 DECODE macro state Fetch O-Type R-Type I-Type J-Type
3.5.4 EXECUTE macro state: ADD instruction (part of R-Type) RX RY + RZ 3.5.4 EXECUTE macro state: ADD instruction (part of R-Type) add1 Ry A Control signals needed: RegSel = 01 DrREG LdA ifetch1 add2 Rz B Control signals needed:
WrREG add1 add2 add3 3.5.5 EXECUTE macro state: NAND instruction (part of R-Type) What must be changed in ADD to implement NAND? 3.5.6 EXECUTE macro state: JALR instruction (part of J-Type) JALR instruction does the following: RY PC + 1 PC RX
jalr1 PC Ry Control signals needed: DrPC RegSel = 01 WrREG jalr2 Rx PC Control signals needed: RegSel = 00 DrREG LdPC 3.5.7 EXECUTE macro state: LW instruction (part of I-Type) RX MEMORY[RY + signed address-offset] 3.5.7 EXECUTE macro state: LW instruction
(part of I-Type) lw1 lw3 Ry A Control signals needed: RegSel = 01 DrREG LdA lw2 Sign-extended offset B Control signals needed: DrOFF LdB A+B MAR Control signals needed:
func = 00 DrALU LdMAR lw4 MEM[MAR] Rx Control signals needed: DrMEM RegSel = 00 WrREG 3.5.8 EXECUTE macro state: SW and ADDI instructions (part of I-Type) SW similar to LW ADDI similar to ADD 3.5.9 EXECUTE macro state: BEQ instruction (part of I-Type) 32 BEQ instruction has the following semantics: If (RX == RY)
PC PC + 1 + signed offset else Nothing* *PC remains unchanged so execution continues to next instruction in memory 3.5.9 EXECUTE macro state: BEQ instruction (part of I-Type) 32 beq1 Rx A Control signals needed: RegSel = 00 DrREG LdA beq2 Ry B Control signals needed: RegSel = 01 DrREG
LdB beq3 AB Load Z register with result of zero detect logic Control signals needed: func = 10 DrALU LdZ These microsteps execute only if we are taking the branch beq4 PC A Control signals needed: DrPC LdA beq5 Sign-extended offset B
Control signals needed: DrOFF LdB beq6 A+B PC Control signals needed: func = 00 DrALU LdPC 3.5.10 Engineering a conditional branch in the microprogram ifetch1 beq1
beq2 beq3 beq4 beq5 beq6 3.5.10 Engineering a conditional branch in the microprogram Z PC AL U
Drive Signals Re ME g M OFF P C Load Signals A B MA R IR Write Signals MEM REG
Func RegSel 3.5.11 DECODE macro state revisited PC AL U Drive Signals Re ME g M OFF P
C Load Signals A B MA R IR Write Signals MEM REG Func RegSel 3.6 Alternative Style of Control Unit Design A number of different approaches may be used to implement the Control Unit
3.6.1 Microprogrammed Control As presented our design works Problem: Too slow Solution: Prefetch the next microinstruction Problem: Too much memory required Solution: Have bit positions control different things as a function of opcode 3.6.2 Hardwired control State machine can be represented as sequential logic truth table Thus can be implemented using normal logic or FPGA 3.6.3 Choosing between the two control design styles Control Regime Pros
Microprogrammed Simplicity, maintainability, flexibility Rapid prototyping Hardwired Cons Potential for space and time inefficiency Comment Space inefficiency may be mitigated with vertical microcode Time inefficiency may be mitigated with prefetching
When to use For complex instructions, and for quick nonpipelined prototyping of architectures Examples PDP 11 series, IBM 360 and 370 series, Motorola 68000, complex instructions in Intel x86 architecture Amenable for pipelined Potentially harder to Maintainability can For High performance implementation change the design be increased with the pipelined implementation Potential for higher Longer design time use of structured of architectures
performance hardware such as PLAs and FPGAs Most modern processors including Intel Pentium series, IBM PowerPC, MIPS 3.7 Historical Perspective Hardware Expensive Memory Expensive Hardware Less Expensive Memory Expensive Accumulators
Hardware and Memory Cheap Microprocessors Compilers getting good Register Oriented Machines (2 address) Register-Memory CISC VAX IBM 360 Motorola 68000 DEC PDP-11 Intel 80x86 Also RISC Fringe Element Berkley RISCSparc Stack Machines Dave Patterson
Burroughs B-5000 Stanford MIPS SGI John Hennessy (Banks) EDSAC IBM 701 IBM 801 1940 1950 1960 1970 1980
First Person Shooters. What is a First Person Shooter? ... Indirect Fire. Different determiners can be used to predict where to throw grenades. ... Procedural tactics are easier to adjust, and work with - compared to scripts/hints. Removes extra ray-casts.
Chapter 1 An Overview of Marketing * Chapter 1 An Overview of Marketing * NOTES: When an organization creates a high level of employee satisfaction, this leads to greater effort, which leads to higher quality, and so on… For example,...
IEEE 802.11ac is the latest Wi-Fi standard operated in 5GHz. This next (5th) generation technology have best speed, capacity, coverage and battery lifethan 802.11n.. Expected PHY rate up to . 1300Mbps(80MHz, 3 spatial streams and 256 QAM 5/6)
The keep was separated from the surrounding area by a moat, or water-filled ditch. en.wikipedia.org By the 1100s, monarchs and nobles owned sprawling stone castles with high walls, towers, and drawbridges over wide moats.
17 data points What's the probability of randomly drawing a red ball from the group above? P(red) = 2/7 P (red) = red Total balls What's the probability of not selecting a red ball? ... Toppings or Pizza Toppings How...
This session is designed to provide you with a more in depth view of the topic, how to deal with potential or actual violence in the workplace, and offer informational resources. Plan to keep this reference guide in a handy...
And that seems to signify what? Sophilos, The Wedding Procession of Peleas and Thetis (IB) Annotate Handout Berlin Painter (IB) Flourished 500-460 bce, Athenian vase painter who, is considered one of the outstanding vase painters of the Late Archaic period.
Log-in to FFIT (ffit.stanford.edu) Select the Organizational Accounts tab. Drill down on your Operating Budget Award. Funding Reconciliation - FFIT1st Quarter Review. Compare Transfers to the Expense Control/Budget for each OB Project.
Ready to download the document? Go ahead and hit continue!