32-bit Pipelined RISC Processor Group 1 aka Go

32-bit Pipelined RISC Processor Group 1 aka Go Us CS m152b Lab section 1 Alice Wang Ann Ho Jason Fong General Review of a Pipelined Processor IF/ID PC Control ID/EX Register File EX/MEM

MEM/WB ALU Data Memory Instruction Memory Memory Controller Design Design challenge: 32-bit processor with 16-bit memory interface On every memory access, need to get two words from memory One request Data Memory ? 16-bit interface

Two 16-bit words 32-bit word Solution: Clock memory controller twice as fast as rest of processor Results in a memory access on the rising and falling edge of the processors clock cycle Request 32-bits 32-bit Processor Memory 16-bits 16-bits

Instruction Format General instruction format 4 bits remaining 28 bits vary according to instruction type R-type instruction opcode opcode 31 28 rs rs 27 rt rt 25

24 rd rd 22 21 funct funct unused unused 19 33 00 I-type instruction rs

rs opcode opcode 31 28 27 rt rt 25 24 imm16 imm16 22 unused

unused 15 0 J-type instruction imm16 imm16 opcode opcode 31 28 unused unused 15

0 R-type J-type I-type instructions instructions ALU with Multiplier Our_mult Multiplier Uses a series of shifts and additions 0 0 Example: 13 x 11 = 01101 x 01011 0

0 0 0 0 0 = 1 0 1 0 1 1 0 11 0

1 0 0 0 LO HI 01101 multiplicand x = 01101 + 01011 11 multiplier 0

0 0 1 1 1 1 = 143 multiplier (more efficient, but more hardware) Data Forwarding Forward From ALU output Forward From memory output

ID/EX EX/MEM MEM/WB Hardware NOP Insertion PC Adder 1 Hold PC value Insert NOP NOP IF/ID Data Forwarding and Stall Insertions: Observed Results

Sample program: Bubble-sort 6 numbers Assembler insertion of NOPs Machine code size: 66 words of memory Execution time: ~750 clock cycles Hardware data forwarding and NOP insertion: Machine code size: 35 words of memory Execution time: ~400 clock cycles Data Forwarding and Stall Insertions: Observed Results Benefits: Savings in memory and execution time Much simpler assembler Drawbacks: Hardware is now more complex Tradeoff between hardware complexity and software complexity Also demonstrates benefits of understanding the underlying architecture when designing an assembler

Conclusion Some problems we encountered: Off by one stage in pipeline Lack of experience with VHDL Order of bits from memory In Conclusion... Knowledge from previous courses Further research Simple RISC processor Pipelining Multiplier Data Forwarding and Hardware NOPs References Hennessey and Patterson, Computer Organization and Design (2nd Ed.), 1998, pages 476-495

Donaldson, John L., Pipeline Hazards, http://occs.cs.oberlin.edu/faculty/jdonalds/317/lecture08.html Ercegovac, Intro To Digital Systems Institute of Electronics, Information and Communication Engineers. "High Speed and Very Compact Two's Complement Serial/Parallel Multipliers Using Xilinx's FPGA"., Abdelkrim Kamel Oudjida 19 Nov. 2002, http://search.ieice.org/2001/pdf/e84-a_5_1339.pdf Applause Please Thank You

