ROAD : An Order-Impervious Optimal Detailed Router Hasan Arslan, Shantanu Dutt Electrical & Computer Eng. University of Illinois at Chicago ICCD 2003 1 OUTLINE Introduction Standard Single-Net Routing Mode Ripup-Reroute Detailed Routing Alg. Bump &Refit (B&R) Concepts B&R Paradigm Applying B&R to Complete Detailed Routing Optimality-Preserving Speedup Methods Lookahead TC functions Learning-Based Search Space Pruning Clique-Based Search Space Pruning Experimental Results Conclusion 2 INTRODUCTION Efficient Routing: Reducing total wiring area Lengths of critical path nets for performance opt. Detailed Routing: Net ordering problem (Std. Single-Net Routing ) Ripup-and-Reroute (R&R) New approach (Bump & Refit strategy) 3 Prior work on Detailed Routing (Cont.) 2 1 1 S B C

T E 2 1 0 E during detail routing Does not perturb or move existing nets D A 2 1 0 01 2 } 01 2 } 01 2 Cells Standard single-net routing mode Switchboxes Tracks #s Tracks 4 Prior work on Detailed Routing(Cont.) (RIPUP-and-REROUTE) 1 1 B

C E E 1 0 D A 1 0 01 01 01 1) Single Net Routing : Route new nets without removing any existing nets. 2) Rip & Reroute : If some nets cannot be routed, rip-up the existing nets which occupy the resources of new nets. Reroute the ripped up nets Changes net topology Net length can be increased Because of ripup-reroute solution time can be increased 5 Complete Detailed Router by using Bump-&-Refit strategy Basic B&R developed for incremental routing (Dutt etal. ICCAD99, ICCAD01, TODAES02) Incremental Routing Existing routed nets set R Some new nets set S (timing violation, noise) Route the nets in S by doing min. changes on nets in R. Bump-&-Refit Approach Does not rip-up and reroute Shift them (or their subnets) to other track positions --Bump-&-Refit (B&R) No change in topology, length of existing nets Optimal: Finds a detailed route if exists 6

Experimental Results (Dutt etal., TODAES 2002) Comparisons of STD, R&R, B&R Inc. Routing Alg. Unrouted Nets over B&R 10% new net, 10% unused tracks 800 700 600 500 400 300 R&R=(85x) 200 100 0 a 2 1 3 a 1 1 0 p e e 3 2 ss rd7 pm cs sao m4 erm s71 38. ex s82 lt3 cli m m t s8 STD i5 p2 m ex i4 A vg STD=(20x) R&R

7 Complete Detailed Router (B&R) Each step in complete detailed routing is an incremental routing problem B&R B&R B&R B&R Complete_Detailed_Routing() input: unrouted nets N: number of nets output: Routed nets. R0= for (i=1 to N) Ri=Do_Incremental_Routing(ni,Ri-1) B&R 8 B&R Concepts Some definitions and representation (cont.) n5 n4 3 2 1 n3 3 2 1 0 0 3 2 1 n6 012 3

n2 The overlap graph (OG) Nodes: existing nets Edges: a channel is shared by two nets Tj Tk, bumps to some nets 0 012 3 012 3 n2 T1 T0 n6 T0 T3 T1 T0 T1 T2 T2 T3 T3 T1 T1 T0 n5 T3 n3 T0 SP T2

T0 T2 T0 T3 n4 T2 9 B&R Concepts (i) TC1sum (ni TjTk ) = l(nj) nj adjTk (ni)total-length-of bumped nets E l(nj) nj adjTk (ni) (ii) TC1sqrt (ni TjTk ) = ----------------------------sqrt(| adjTk (ni)| l(nj) is the total length of nj in terms of the track segments New_Net 1 1 C adjT1(E)={B} B O_Net 0 D 1 0 01 01 01 adjT0(E)={C} B T1

1 A E X A T0 C T0 O_Net D T1 10 B&R Concepts 1st Level TC Functions n1 n3 n2 n4 n5 0 1 2 TC1=Total-length-of-bumped-nets T1 n2 12 n3 n1 T2 n4 n5 6 11

Example: B&R for Detailed Routing E New_Net 1 1 C C O_Net 3 B E X0 T B T1 O_Net C X T0 2 1 0 A T0 D A 1 0 01 01 01 D T1 D_Sp

T0 12 Example: B&R for Detailed Routing(Cont.) New_Net E T0 1 1 B C B T1 E O_Net C X1 T 1 0 A T0 D A D X01 T 1 0 D_Sp T0 01 01 01

13 B&R Concepts Navigating the OG New-net T2 If there is a solution, T2 1 T1 4 T3 8 2 T1 T1 5 T2 9 this process will find it. it optimizes the number of tracks 3 T3 6 T3 T3 10 11 12

Time Complexity: L= # of paths in OG (m=# of nodes, b:branching facto In worst case L=O((b-1) m) If OG is tree L is linear Sp Sp Sp Sp 14 Optimality-Preserving Speedup Methods Regular incremental routing: B&R applied to 1-10% of the nets---speed is good Complete detailed routing: B&R applied to 100% of nets--speed drops significantly Developed three optimality-preserving speedup methods to increase the speed of B&R 15 Optimality-Preserving Speedup Methods (Cont.) (1) Lookahead TC Functions n4 n5 T1 7 n2 T2 n6 3 T0 Sp 0 n1 n1 n3 n2 n6

0 1 2 T2 3 n 3 T0 T1 n4 3 n5 (i) TC1=Total-length-of-bumped-nets (i) TC2sum-sum (ni TjTk ) = nj adjTk (ni) min Tt (TC1sum (ni TkTt )) 5 16 Optimality-Preserving Speedup Methods (Cont.) (2) A T1 B T2 D T3 Y T3 C T0 Clique-Based Search Space Pruning C A X B Y D T0

T1 T2 T3 C Y A X B D Dynamically determines the presence of cliques in the OG among the longer nets CLIQUE: is a completely connected subgraph of OG Min. # of distinct tracks for succ. Routing (m=4) For each clique, maintain the number of common unused (CUT) track. (k) After a net in clique is bumped, If (k+m) > t , there is no solution to bump that net. (1+4) > 4 T0 T1 T2 T3 17 Optimality-Preserving Speedup Methods (Cont.) (3) Learning-Based Search Space Pruning D A T0 T1 T2 T3 C K B 1 A C D

D A C K A B T0 T1 X T2 Z T3 2 C D X Z K K B P B Q Theorem: if no soln. for bumping net B, and obstacle pattern OP1 is obtained If in another search path, B is bumped again and OP1 AP2 Then no solution exists 18 Optimality-Preserving Speedup Methods (Cont.) (3) Learning-Based Search Space Pruning Isomorphic Function: f: is a one-to-one and onto functions between tracks that maps T i Tk where Tk=Tf(i) D A C K B

T0 T1 T2 T3 C K A B D T X T0 Z T1 2 T3 Pattern Isomorphism: Let P1 and P2 be obstacle patterns, If all nets on each track of P1 appears on a unique, possibly different, track of P2, then P1 is isomorphic to P2. 19 Optimality-Preserving Speedup Methods (Cont.) (3) Learning-Based Search Space Pruning D A C K B T0 T1 T2 T3 C K A 1 A C D B

D A T X T0 Z T1 2 T3 2 C D X Z K K B P B Q Theorem: if no soln. for bumping net B, and obstacle pattern OP1 is obtained If in another search path, B is bumped again OP2 AP2 & OP1 isomorphic to OP2 Then no solution exists 20 Experimental Results (Characteristics of VPR Benchmark Circuits) Circuit Nam e C499 Opt. Circuit Nam e FPGA Size # net # pins track 10x10 147 443 7 vda Opt.

FPGA Size # net # pins track 18x18 399 1511 10 m m 9a 13x13 205 787 6 frg2 36x36 558 2177 6 alu2 15x15 236 1001 7 apex6 30x30 575 2199 6 s1 14x14 248

990 7 ex4p 22x22 586 2337 6 s1423 15x15 272 1133 6 m m 30a 23x23 651 2634 6 t481 15x15 280 1087 8 m isex3c 24x24 663 2773

9 sand 16x16 285 1236 7 ex5p 33x33 1072 5391 15 m m 9b 15x15 292 1107 6 tseng 33x33 1248 5409 7 planet 17x17 307 1357

6 m isex3 38x38 1658 7013 12 planet1 17x17 308 1357 7 alu4 40x40 1748 7632 12 x4 21x21 310 1136 6 diffeq 39x39 1786 7588 8

s1196 17x17 325 1354 7 des 63x63 1847 8456 8 i6 26x26 328 1115 4 apex2 44x44 2284 9431 13 duke2 16x16 328 1306 8

elliptic 61x61 4175 15565 11 s1488 18x18 340 1508 6 spla 61x61 4286 18512 14 21 Experimental Results (Speedup Method Results) ROAD-1: Basic B&R with 1st TC Function ROAD0: 1st TC Function 2nd TC Function ROAD: LBS + Clique-Based pruning mtd. Added to ROAD0 ROAD (bumped & Refit based OptimAl Detailed router) 22 Experimental Results (Internal Comparisons) Circuit Name planet1 mm9a x4 alu2 i6 s1 s1488 s1423 C499 sand

t481 planet mm9b planet1 s1196 x4 duke2 i6 s1488 vda Time in secs ROAD-1 ROAD-0 1.06 3299.22 18.06 4015.23 4.60 3163.87 12.21 537.34 131.76 2010.56 1.06 18.06 4.60 12.21 ROAD 2.24 130.03 3.41 1.44 4.56 1.40 2.97 4.65 51.52 4.90 37.80 2.29 10.05 2.24 87.70 3.41 238.20 4.56 1670.47 2.97

2.37 1.27 2.72 1.41 3.93 1.14 2.31 2.56 0.54 2.08 1.80 2.05 2.45 2.37 3.10 2.72 2.11 3.93 2.31 4.99 total(10 Ckt) 13193.91 157.89 Speed up over ROAD-1 83.56 total ( 16) ckt 2253.63 Speed up over ROAD-0 21.84 604.12 36.83 61.19 Extrapolated Speed up of ROAD over ROAD-1 83 x 61 = 5763 23 Experimental Results (Extracting VPR Global Routing Topology) VPR Placement 2 1 1 2 1

Flat-routing 2 1 ROAD SEGA 0 0 012 01 2 VPR Router 01 2 Comparisons of ROAD, VPR and SEGA 24 Experimental Results (Comparing with VPR-SEGA combine) Improvement over SEGA Comparisions of SEGA, ROAD track numbers 120% 100% 80% 60% 40% 20% 0% m 9a m s1 481 9b et1 196 ke 2 vda ex6 30a x5p sex ffeq ex2 otal t m n e mi di ap T ap m m pla s1 du m 25

Experimental Results (Comparing with VPR-SEGA combine) 25 20 15 VPR ROAD 10 ROAD=(7x) 5 To ta l VPR=(4x) de s pl an et s1 19 6 s1 48 8 ap ex 6 m ise x3 c m ise x t4 81 0 al u2

Speedup factor over SEGA Comparisions of VPR, ROAD, SEGA runtimes (SEGA runtime taken as baseline) 26 Approx. VPR detailed-runtime=(VPR flat-runtime) - (VPR global-runtime) Empirical Avg. Case Time Complexity of ROAD 800.00 700.00 y = 3E-05x2 + 0.0347x - 11.105 600.00 Time 500.00 y = 0.1409x - 56.939 400.00 300.00 200.00 y = 2E-11x4 - 1E-07x3 + 0.0003x2 - 0.1827x + 32.085 100.00 0.00 0 500 1000 1500 2000 2500 3000 3500 4000 4500

5000 -100.00 number of nets ROAD Linear (ROAD) Poly. (ROAD) Poly. (ROAD) 27 CONCLUSION ROAD: uses the bump-and-refit (B&R) Overcomes net-ordering problem Optimality-preserving search space pruning methods that give us orders of magnitude speedup Large circuits: VPRs flat routing time consuming Hence need two-stage routing (global followed by detailed routing) possibly interleaved across the nets ROAD prime candidate for detailed routing phase in this two-stage framework 28 THANK YOU 29