Advanced ALU Lecture 5 CS365 RoadMap Implementation of MIPS ALU Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix B.5, B.6) Multiplication (Section 3.4) Division (Section 3.5) This lecture Floating point (Section 3.6) Advanced ALU CS465 F08 2 D. Barbara Unsigned Multiply Paper and pencil example (unsigned): Multiplicand 1000 Multiplier 1001 1000 0000 0000

1000 Product 01001000 m bits x n bits = m+n bit product (at most) Binary makes it easy: 0 => place 0 1 => place a copy ( 0 x multiplicand) ( 1 x multiplicand) 4 versions of multiply hardware & algorithm: Successive refinement Advanced ALU CS465 F08 3 D. Barbara Unsigned Combinational Multiplier 0 A3 A3 A3

A3 P7 P6 A2 A2 A1 P5 A2 A1 0 A2 A1 0 A1 0 A0 B0 A0 B1

A0 B2 A0 P4 B3 P3 P2 P1 P0 Stage i accumulates A * 2i if Bi == 1 Advanced ALU CS465 F08 4 D. Barbara How Does It Work? 0 0 0 A3 A3 A3 P7

P6 A2 P5 A2 A1 P4 0 A3 A2 A1 0 A2 A1 0 A1 0 A0 A0 B1 A0 B2 A0 P3

B0 B3 P2 P1 P0 At each stage shift A left ( x 2) Use next bit of B to determine whether to add in shifted multiplicand Accumulate 2n bit partial product at each stage Advanced ALU CS465 F08 5 D. Barbara Unsigned Multiplier (Version 1) Shift-add multiplier 64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit Multiplier reg Shift Left Multiplicand 64 bits Multiplier 64-bit ALU

Product Shift Right 32 bits Write 64 bits Control Figure 3.5 Advanced ALU CS465 F08 6 D. Barbara Multiply Algorithm Version 1 Start Figure 3.6 Multiplier0 = 1 1. Test Multiplier0 Multiplier0 = 0 1a. Add multiplicand to product & place the result in Product register 0010 x 0011 Product 0000 0000 0000 0010

0000 0110 0000 0110 0000 0110 Multiplier Multiplicand 0011 0000 0010 0001 0000 0100 0000 0000 1000 0000 0001 0000 Advanced ALU 2. Shift the Multiplicand register left 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions CS465 F08 Done 7 D. Barbara Observations on Multiply Version 1 One clock cycle per step => ~100 clocks per

multiply Half bits in multiplicand always 0 => 64-bit adder is wasted 0 is inserted when shift the multiplicand left => least significant bits of product never changed once formed Instead of shifting multiplicand to left, shift product to right? Advanced ALU CS465 F08 8 D. Barbara How Does It Work? 0 0 0 0 A3 A2 A1

A0 A3 A2 A1 A0 B0 B1 A3 A2 A1 A0 A3 A2 A1 A0 P7 P6 P5 B2

B3 P4 P3 P2 P1 P0 Multiplicand stays still and product moves right Advanced ALU CS465 F08 9 D. Barbara Unsigned Multiplier Version 2 Refined from the first version: 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg Multiplican d 32 bits Multiplier 32-bit ALU Shift Right 32 bits

Shift Right Product 64 bits Advanced ALU Control Write CS465 F08 10 D. Barbara Multiply Algorithm Version 2 Start Multiplier0 = 1 1: 2: 3: 1: 2: 3: 1: 2: 3: 1: 2: 3: 1. Test Multiplier0 0010 x 0011

1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier 0000 0000 0011 0010 0000 0011 0001 0000 0011 0001 0000 0001 0011 0000 0001 0001 1000 0001 0001 1000 0000 0001 1000 0000 0000 1100 0000 0000 1100 0000 0000 1100 0000 0000 0110 0000 0000 0110 0000 Multiplicand 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 0010 Advanced ALU CS465 F08

Multiplier0 = 0 2. Shift the Product register right 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done 11 D. Barbara Observations on Multiply Version 2 Useful portion of multiplier shrinks while valid portion of product grows in the process of multiplication Product register wastes space that exactly matches size of multiplier =>Combine Multiplier register and Product register? Advanced ALU CS465 F08 12

D. Barbara MULTIPLY Hardware Version 3 32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg) Multiplican d 32 bits 32-bit ALU Shift Right Product (Multiplier) Write 64 bits Advanced ALU Control CS465 F08 13 D. Barbara Multiply Algorithm Version 3 Start Product0 = 1 1. Test Product0 Product0 = 0

0010 x 0011 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Multiplicand 0010 0010 0010 0010 0010 Advanced ALU Product 0000 0011 0010 0011 0001 0001 0011 0001 0001 1000 0000 1100 0000 0110 2. Shift the Product register right 1 bit. 32nd repetition? CS465 F08 No: < 32 repetitions Yes: 32 repetitions 14 Done D. Barbara Multiply Version 3

Two steps per bit because Multiplier & Product combined MIPS multiplication (unsigned) multu $t1, $t2 # t1 * t2 No destination register: product could be ~2 64; need two special registers to hold it Registers Hi and Lo are left and right half of Product To use the result: mfhi $t3 mflo $t4 Advanced ALU CS465 F08 15 D. Barbara Signed Multiplication What about signed multiplication? Easiest solution is to make both positive & remember whether to complement product when done Apply definition of 2s complement Need to sign-extend partial products Booths algorithm is an elegant way to multiply

signed and unsigned numbers using the same hardware and save cycles Can handle multiple bits at a time Advanced ALU CS465 F08 16 D. Barbara Motivation for Booths Algorithm Example 2 x 6 = 0010 x 0110: 0010 Can get the same x 0110 result in more than 0000 shift (0 in multiplier) one way:+ 6= +2 + 80010 add (1 in multiplier) + = 0010 0110 0010 + add 1000(1 in multiplier) + 0000 shift (0 in multiplier) 00001100

Basic idea: replace a string of 1s with an initial subtract on seeing a one and add after last one 0010 x 0110 0000 shift (0 in multiplier) x (-2)sub (first 1 in multiplier) 0010 0010 0000 shift (mid string of 1s) +0010 0010 add (prior step had last 1) x8 00001100 Advanced ALU CS465 F08 18 D. Barbara Booths Algorithm end of run Current bit 1 1 0 0 0

1 1 0 0 1 1 1 1 0 Explanation Example Begins run of 1s Middle of run of 1s End of run of 1s Middle of run of 0s beginning of run Op 0001111000 0001111000 0001111000 0001111000 sub none add none Originally for speed when shift was faster than add Bit to right middle of run

Big advantage: it handles signed number Why it works? Advanced ALU CS465 F08 1 + 10000 01111 19 D. Barbara Booths Algorithm 1. Depending on the current and previous bits, do one of the following: 00:Middle of a string of 0s, no arithmetic op 01:End of a string of 1s, so add multiplicand to the left half of the product 10:Beginning of a string of 1s, so subtract multiplicand from the left half of the product 11:Middle of a string of 1s, so no arithmetic op 2. As in the previous algorithm, shift the Product register right (arithmetically) 1 bit Advanced ALU CS465 F08 20 D. Barbara Booths Example (2 x 7) step

Multiplicand Product 0. initial value 0010 1. P = P - m 2. 3. 4. 1110 0010 0010 0010 0010 0010 Advanced ALU 0000 0111 0 +1110 1110 0111 0 1111 0011 1 1111 1001 1 1111 1100 1 +0010 0001 1100 1 0000 1110 0 CS465 F08 operation 10 -> sub shift P(sign ext) 11 -> nop, shift 11 -> nop, shift 01 -> add shift done

21 D. Barbara Booths Example (2 x -3) step Multiplicand Product 0. initial value 0010 1. P = P - m 1110 2. 0010 0010 3. 0010 1110 0010 0010 4. 0010 Advanced ALU 0000 1101 0 +1110 1110 1101 0 1111 0110 1 + 0010 0001 0110 1 0000 1011 0

+1110 1110 1011 0 1111 0101 1 1111 0101 1 1111 1010 1 CS465 F08 operation 10 -> sub shift P(sign ext) 01 -> add shift P 10 -> sub shift 11 -> nop shift done 22 D. Barbara Outline Implementation of MIPS ALU Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix B.5, B.6) Multiplication (Section 3.4) Booths algorithm: CD: 3.23 In More Depth Division

(Section 3.5) Floating point (Section 3.6) Advanced ALU CS465 F08 23 D. Barbara DIVIDE: Paper & Pencil 1001 Divisor 1000 1001010 1000 10 101 1010 1000 10 Remainder (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Quotient Dividend Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder

3 versions of divide, successive refinement Advanced ALU CS465 F08 24 D. Barbara DIVIDE Hardware Version 1 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Shift Right Divisor 64 bits Quotient 64-bit ALU Remainder 32 bits Write 64 bits Advanced ALU Shift Left CS465 F08 Control 25

D. Barbara Divide Algorithm Version 1 Start: Place Dividend in Remainder Takes n+1 iterations for nbit Quotient & Remainder Quot. Divisor Rem. 0000 00100000 00000111 11100111 00000111 0000 00010000 00000111 11110111 00000111 0000 00001000 00000111 11111111 00000111 0000 00000100 00000111 00000011 0001 00000011 0001 00000010 00000011 00000001 0011 00000001 0011 00000001 00000001 Advanced ALU 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder 0 2a. Shift the

Quotient register to the left, setting the new rightmost bit to 1. Test Remainder Remainder < 0 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right 1 bit. n+1 repetition? No: < n+1 repetitions Yes: n+1 repetitions (n = 4 here) Done CS465 F08 26 D. Barbara Observations on Divide Version 1

1st step cannot produce a 1 in quotient bit (otherwise quotient is too big for the register) => switch order to shift first and then subtract => save 1 iteration 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? Advanced ALU CS465 F08 27 D. Barbara DIVIDE Hardware Version 2 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Divisor 32 bits Quotient 32-bit ALU Shift Left 32 bits Shift Left Remainder 64 bits Advanced ALU

Control Write CS465 F08 28 D. Barbara Divide Algorithm Version 2 Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. Test Remainder Remainder < 0 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. nth repetition? No: < n repetitions

Yes: n repetitions Done Advanced ALU CS465 F08 29 D. Barbara Observations on Divide Version 2 Remainder shrinks while Quotient grows Eliminate Quotient register by combining with Remainder register Start by shifting the Remainder left as before Only shifting once for both Remainder(left half) and Quotient(right half): 2 steps each iteration Another consequence is that the remainder will be shifted left one more time Thus the final correction step must shift back only the remainder in the left half of the register Advanced ALU CS465 F08 30 D. Barbara

DIVIDE HARDWARE Version 3 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Divisor 32 bits 32-bit ALU HI LO Shift Left Remainder (Quotient) Write 64 bits Advanced ALU Control CS465 F08 31 D. Barbara Divide Algorithm Version 3 Step 0 1.1 1.2 1.3b 2.2 2.3b 3.2

3.3a Remainder Div. 0000 0111 0010 0000 1110 1110 1110 0001 1100 1111 1100 0011 1000 0001 1000 0011 0001 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. 4.2 0001 0001 4.3a 0010 0011 0001 0011 Advanced ALU Start: Place Dividend in Remainder 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder 0 Test Remainder Remainder < 0 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder

register. Also shift the Remainder register to the left, setting the new least significant bit to 0. nth No: < n repetitions repetition? Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit CS465 F08 32 D. Barbara More Division Issues Signed divides: simplest is to remember signs, make positive, and complement quotient and remainder if necessary Satisfy Dividend =Quotient x Divisor + Remainder Note: Dividend and Remainder must have same sign Note: Quotient negated if Divisor sign & Dividend sign disagree, e.g., 7 2 = 3, remainder = 1 Quotient = 4 and remainder =1 is not correct Possible for quotient to be too large If divide 64-bit integer by 1, quotient is 64 bits (called saturation) MIPS ignores overflow, SW detection and processing

Advanced ALU CS465 F08 33 D. Barbara Recap Same hardware for Divide as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide div $s2, $s3 #Lo=$S2/$s3, Hi=$s2 mod $s3 Successive refinement to see final design Booths algorithm to handle signed multiplies Overflow Both MIPS multiply and divide instructions ignore overflow Software must check for overflow scenarios Advanced ALU CS465 F08

34 D. Barbara Outline Implementation of MIPS ALU Signed and unsigned numbers (Section 3.2) Addition and subtraction (Section 3.3) Constructing an arithmetic logic unit (Appendix B.5, B.6) Multiplication (Section 3.4, CD: 3.23 In More Depth) Division (Section 3.5) Floating point (Section 3.6) Advanced ALU CS465 F08 36 D. Barbara Floating-Point: Motivation What can be represented in N bits? Unsigned 2s Complement 1s Complement Binary Coded Decimal

0 to - 2N-1 to -2N-1+1to 0 to 2N-1 2N-1 - 1 2N-1-1 10N/4 1 But, what about? Very large numbers? 9,349,398,989,787,762,244,859,087,678 Very small number? 0.0000000000000000000000045691 Rational numbers 2/3 Irrational numbers pi Advanced ALU CS465 F08 37 D. Barbara Recall Scientific Notation: Binary binary point exponent

1.01 x 223 Significand(Mantissa) Normalized form: no leading 0s, exactly one digit to left of decimal/binary point Alternatives to represent 1/1,000,000,000 radix (base) Normalized: 1.0 x 10-9 Not normalized: 0.1 x 10-8, 10.0 x 10-10 Computer arithmetic that supports scientific notation numbers is called floating point, because the binary point is not fixed, as it is for integers Advanced ALU CS465 F08 38 D. Barbara FP Representation Normalized format: 1.xxxxxxxxxxtwo 2yyyytwo Want to put it into multiple words: 32 bits for single-precision and 64 bits for double-precision

A simple single-precision representation: 3130 2322 0 S Exponent Fraction 1 bit 8 bits 23 bits S represents sign Exponent represents ys Fraction represents xs Represent numbers as small as 2.0 x 10 -38 to as large as 2.0 x 1038 39 Advanced ALU CS465 F08 D. Barbara Double Precision Representation Next multiple of word size (64 bits) 3130 20 19 S Exponent Fraction 1 bit 11 bits 20 bits Fraction (contd) 32 bits

0 Double precision (vs. single precision) Represent numbers almost as small as 2.0 x 10-308 to almost as large as 2.0 x 10308 But primary advantage is greater accuracy due to larger fraction Advanced ALU CS465 F08 40 D. Barbara IEEE 754 Standard (1/4) Regarding single precision, DP similar Sign bit: 1 means negative 0 means positive Significand: To pack more bits, leading 1 implicit for normalized numbers Significand = 1 + fraction: 1 + 23 bits single, 1 + 52 bits double Always true: 0 < fraction < 1 (for normalized numbers)

Note: 0 has no leading 1, so reserve exponent value 0 just for number 0 Advanced ALU CS465 F08 41 D. Barbara IEEE 754 Standard (2/4) Exponent: Need to represent positive and negative exponents Also want to compare FP numbers as if they were integers, to help in value comparisons If use 2s complement to represent? e.g., 1.0 x 2-1 versus 1.0 x2+1 (1/2 versus 2) 000 0000 0000 0000 0000 0000 1/2 0 1111 1111 000 0000 0000 0000 0000 0000 2 0 0000 0001 If we use integer comparison for these two words, we will conclude that 1/2 > 2!!! Advanced ALU CS465 F08 42 D. Barbara Biased (Excess) Notation

Remapping: shift all values by subtracting a bias from the unsigned representation Biased 7 Bit-pattern biased-7 value 0000 0001 0010 0011 0100 0101 0110 0111 Advanced ALU -7 -6 -5 -4 -3 -2 -1 0 Bit-pattern biased-7 value All-zero 1000 1 pattern is the 1001 2 smallest; all1010 3

one pattern is 1011 4 the largest 1100 5 Now integer comparison can 1101 6 be used 1110 7 directly to 1111 8 compare exponents CS465 F08 43 D. Barbara IEEE 754 Standard (3/4) Use biased notation for exponents, where bias is the number subtracted to get the real number IEEE 754 uses bias of 127 for single precision: subtract 127 from Exponent field to get actual value for exponent 1023 is the bias for double precision 000 0000 0000 0000 0000 0000 1/2 0 0111 1110 000 0000 0000 0000 0000 0000

2 0 1000 0000 All-zero and all-one exponent patterns are reserved for special numbers (Fig 3.15) Advanced ALU CS465 F08 44 D. Barbara IEEE 754 Standard (4/4) Summary (single precision): 3130 2322 S Exponent 1 bit 8 bits Fraction 23 bits 0 (-1)S x (1.Fraction) x 2(Exponent-127) Double precision identical, except longer fraction and exponent segments with exponent bias of 1023 Advanced ALU CS465 F08

45 D. Barbara Example: FP to Decimal 0 0110 1000 101 0101 0100 0011 0100 0010 Sign: 0 => positive Exponent: 0110 1000two = 104ten Bias adjustment: 104 - 127 = -23 Fraction: 1+2-1+2-3 +2-5 +2-7 +2-9 +2-14 +2-15 +2-17 +2-22 = 1.0 + 0.666115 Represents: 1.666115ten2-23 1.986 10-7 Advanced ALU CS465 F08 46 D. Barbara

Example 1: Decimal to FP Number = - 0.75 = - 0.11two 20 (scientific notation) = - 1.1two 2-1 (normalized scientific Sign: negative => 1 Exponent: Bias adjustment: -1 +127 = 126 126 ten = 0111 1110two notation) 1 0111 1110 100 0000 0000 0000 0000 0000 Advanced ALU CS465 F08 47 D. Barbara Example 2: Decimal to FP A more difficult case: representing 1/3? = 0.3333310 = 0.0101010101 2 20 = 1.0101010101 2 2-2 Sign: 0 Exponent = -2 + 127 = 12510=011111012

Fraction = 0101010101 0 0111 1101 0101 0101 0101 0101 0101 010 Advanced ALU CS465 F08 48 D. Barbara Special Numbers What have we defined so far? (single precision) Exponent Fraction 0 00 0 nonzero ??? 1-254 anything 255 0 ??? 255 nonzero Object +/- floating-point ??? Range:

1.0 2-126 1.8 10-38 What if result too small? (>0, < 1.8x10-38 => Underflow!) (2 2-23) 2127 3.4 1038 What if result too large? (> 3.4x1038 => Overflow!) Advanced ALU CS465 F08 49 D. Barbara Denormalized Numbers For single-precision f. p. (page 217) 0 is 00000000000000000000000000000000 Smallest positive normalized number is 1.0000 0000 0000 0000 0000 000 x 2-126 This is actually a pretty big gap... Gradual underflow: allow a number to degrade in significance until it becomes 0 By getting rid of implicit leading 1 in front of the fraction... we can represent a smaller number We denote a denormalized number by 0 exponent and a non-zero fraction The smallest denormalized number is 0.0000 0000 0000 0000 0000 001 x 2-126 or 1.0 x 2-149 For double-precision f. p.

Smallest denormalized number: 1.0 x 2-1074 Advanced ALU CS465 F08 50 D. Barbara Special Numbers What have we defined so far? (single precision) Exponent 0 0 1-254 255 255 Advanced ALU Fraction 0 nonzero anything 0 nonzero CS465 F08 Object 0 denorm +/- floating-point

??? ??? 51 D. Barbara Representation for +/- Infinity In FP, divide by zero should produce +/infinity, not overflow Why? OK to do further computations with infinity, e.g., X/0 > Y may be a valid comparison IEEE 754 represents +/- infinity by all-one exponent and all-zero fraction S 1111 1111 0000 0000 0000 0000 0000 000 Advanced ALU CS465 F08 52 D. Barbara Representation for Not a Number What do I get if I calculate sqrt(-4.0) or 0/0? If infinity is not an error, these should not be either

They are called Not a Number (NaN) IEEE 754: all-one Exponent (255), nonzero Fraction Why is this useful? A symbol for invalid operations: allow programmers to postpone some tests and decisions to later time They contaminate: op(NaN,X) = NaN Advanced ALU CS465 F08 53 D. Barbara Special Numbers What have we defined so far? (single-precision) Exponent 0 0 1-254 255 255 Fraction 0 nonzero anything 0 nonzero

Object 0 denom +/- fl. pt. # +/- infinity NaN Fig. 3.15 Advanced ALU CS465 F08 54 D. Barbara FP Operations Complexities Operations are somewhat more complicated In addition to overflow we can have underflow Absolute value too small Accuracy can be a big problem IEEE 754 keeps two extra bits, guard and round Four rounding modes Special cases:

Positive divided by zero yields infinity Zero divide by zero yields not a number Implementing the standard can be tricky Not using the standard can be even worse See text for description of 80x86 and Pentium bug! Advanced ALU CS465 F08 55 D. Barbara Basic Addition Algorithm (1) We'll need to convert from fraction to significand (2) We'll need to shift the smaller number so that we have the same exponent: num of shifts= Larger Number's exponent - Smaller Number's exponent -- right shift smaller number that many positions (3) Add or subtract the two numbers (4) Normalize: a. left shift result, decrement result exponent (e.g. 0.0001) b. right shift result, increment result exponent (e.g. 101.1) continue until MSB of data is 1 (will not be stored) (5) Check for overflow or underflow (6) If the result is a 0 significand, set the exponent to zero by special step Advanced ALU

CS465 F08 56 D. Barbara Floating Point Addition Example Example: Add 9.999 101 and 1.610 10-1 assuming 4 decimal digits Align decimal point of number with smaller exponent 1.610 10-1 = 0.161 100 = 0.0161 101 Shift smaller number to right Add significands 9.999 + 0.016 10.015 SUM = 10.015 101 NOTE: One digit of precision lost during shifting Shift sum to put it in normalized form 1.0015 102 Since significand only has 4 digits, we need to round the sum SUM = 1.002 102 NOTE: normalization maybe needed again after rounding, e.g,

rounding 9.9999 you get 10.000 Advanced ALU CS465 F08 57 D. Barbara FP Addition Algorithm: Fig 3.16 Start 1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent 2. Add the significands 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent Overflow or underflow? Yes No Exception 4. Round the significand to the appropriate number of bits

No Still normalized? Yes Done Advanced ALU CS465 F08 58 D. Barbara Sign Exponent Significand Sign Exponent Significand Compare exponents Small ALU Exponent difference 0 1

0 Control 1 0 Shift smaller number right Shift right Big ALU 0 1 0 Increment or decrement Advanced ALU Exponent Add 1 Shift left or right Rounding hardware Sign

1 Significand CS465 F08 Normalize Round Fig 3.17 59 D. Barbara Accurate Arithmetic Round accurately requires HW to include extra bits in the calculation IEEE 754 standard specifies the use of 2 extra bits on the right during intermediate calculations Guard bit and Round bit Example: Add 2.56 100 and 2.34 102, assuming 3 significant digits Shift the smaller number Without guard and round bits

2.56 100 = 0.0256 102(guard holds 5 and round holds 6) 2.34+ 0.02 = 2.36 102 With guard and round bits 2.34+ 0.0256 = 2.3656 102 ROUND 2.37102 Advanced ALU CS465 F08 60 D. Barbara Floating-Point Multiplication Algorithm: Z = X * Y Product exponent = X's exponent + Y's exponent Doubly-biased exponent must be corrected Product exponent = sum of stored exponents bias Xe = 7 Ye = -3 Bias 8 Xe = 1111

Ye = 0101 10100 = 15 = 5 20 = 7+8 = -3 + 8 4+8+8 Multiply the significands: Zm = Xm x Ym Normalization, rounding and exception checking (overflow and underflow) Determine the sign bit Advanced ALU CS465 F08 61 D. Barbara MIPS Floating-Point Instructions Floating-point instructions Computation: add.s, add.d, sub.s, sub.d, mul.s, mul.d, div.s, div.d Comparison: c.x.s, c.x.d (x=eq, neq, lt, le, gt, ge) FP conditional branch: bc1t, bc1f Floating-point data transfer: lwc1, swc1, ldc1, sdc1

32 floating-point registers: $f0, , $f31 Double precision: by convention, even/odd pair contain one DP FP number: $f0/$f1, $f2/$f3 See page 206 for details Advanced ALU CS465 F08 62 D. Barbara Summary Multiplication Refined hardware and algorithms Booths algorithm Division Similar hardware as multiplication Floating point Representing

numbers Addition and multiplication Precision is a big issue Advanced ALU CS465 F08 63 D. Barbara Next Lecture Topic: Assessing and understanding performance Reading Ch4 Patterson and Hennessy Advanced ALU CS465 F08 64 D. Barbara