Repairing Write Performance on Flash Devices

Repairing Write Performance on Flash Devices Radu Stoica, Manos Athanassoulis, Ryan Johnson, Anastasia Ailamaki Ecole Polytechnique Fdrale de Lausanne Carnegie Mellon Tape is Dead, Disk is Tape, Flash is Disk* Slowly replacing HDDs (price , capacity ) Fast, reliable, efficient Potentially huge impact

Slow random write Read/write asymmetry -> not a HDD drop-in replacement *Jim Gray, CIDR 2007 2 DBMS I/O today Request Data requirements DBMS HDD optimized I/O pattern Block Device

API Flash optimized I/O pattern Flash device Flash memory access Inadequate device abstraction Flash devices are not HDD drop-in replacements 3 Random Writes Fusion ioDrive Microbenchmark 8 kiB random writes Throughput (MiB/s) 350 300 Average over 1s Moving average

250 20 200 10 150 0 80000 100 80200 50 0 0

5 10 Time(hours) 94% performance drop 15 20 Unpredictability 4 Stabilizing Random Writes Change data placement

Flash friendly I/O pattern Avoid all random writes Minimal changes to database engine 6-9x speedup for OLTP-like access patterns 5 Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Experimental results 6 Related work Request DBMS

Data requirements Flash-opt. DB Algs. HDD optimized I/O pattern Block Device API Flash FS Flash optimized I/O pattern Data placement Flash device FTL Flash memory access

No solution for OLTP workloads 7 Random Write Other devices Vendor advertised performance Rand. Write Rand. Read Response time (ms) 100 rt Mtron SSD 10 Pause length Rand. Write causes

unpredictability 1 Seq. Reads Seq. Reads 0.1 0 Random Writes Random Writes 5000 IO number IO number

10000 Seq. Reads Seq. Reads 13000 *Graph from uFlip, Bouganim et al. CIDR 2009 8 Random Writes Fusion ioDrive Microbenchmark 8 kiB random writes Throughput (MiB/s) 350 300 Average over 1s Moving average

250 200 150 100 50 0 0 5 10 Time(hours) 15 20 9

Sequential Writes Fusion ioDrive Microbenchmark 128kiB sequential write Throughput (MiB/s) 350 300 250 200 150 100 50 Average over 1s Moving average 0 0 200 400

600 800 1000 1200 Time(s) Seq. Writing: Good & Stable Performance 10 Idea Change Data Placement Flash friendly I/O pattern Avoid all Random Writes

Write in big chunks Tradeoffs additional work: Give up seq. reads (SR and RR similar performance) More seq. writing Other overheads 11 Overview Random Write: how big of a problem? Random Write: why still a problem? Append-Pack Data Placement Theoretical model

Experimental results 12 Append-Pack Algorithm Update page page Update page Update No more space Write hot dataset Write seq. Reclaim space No in-place updates Filter cold pages

Write cold dataset Reclaim space Valid page Invalid page How much additional work? Log start Log end 13 Theoretical Page Reclaiming Overhead Update pages uniformly Equal prob. to replace a page # valid pages?

sizeof (disk) = sizeof (hotset) prob(valid) = f () e - Worst case: 36% Easily achievable: 6-11% 14 Theoretical Speedup Traditional Random Write I/O latency: TRW New latency: TSW+prob(valid)(TRR + TSW) Conservative assumption: TRW = 10TSW = sizeof(device) / sizeof(data) Up to 7x speedup 15

Overview RW: how big of a problem? RW: why still a problem? Append-Pack Data Layout Experimental results 16 Experimental setup 4x Quad-core Opteron X86_64-linux v2.6.18 Fusion ioDrive 160GB PCIe 8 kiB I/Os, Direct I/O Parallel threads 16 Firmware runs on host Append-Pack implemented as shim library 17 OLTP microbenchmark Microbenchmark 50% Rand Write / 50% Rand Read

Throughput (MiB/s) 500 Average over 1s Moving average Append-Pack 400 300 200 FTL? 100 0 0

1000 3000 4000 0 9x improvement Time (s) 1000 3000 4000 18 OLTP Microbenchmark Overview Performance better than predicted

19 What to remember Flash HDD We leverage Sequential Writing to avoid Random Writing Random Reading as good as Sequential Reading Append-pack eliminate Random Writes 6-9x speedup 20 Thank you! http://dias.epfl.ch

21 Backup 22 FTLs Fully-associative sector translation [Lee et al. 07] Superblock FTL [Kang et el. 06] Locality-Aware Sector Translation [Lee et al. 08] No solution for all workloads: Static tradeoffs & workload independence Lack of semantic knowledge Wrong I/O patterns -> complicated software layers destroy predictability 23

Other Flash Devices - Backup Device RR (IOPS) RW (IOPS) SW (MB/s) SR (MB/s) Intel x25-E 35,000 3,300 170

250 Memoright GT 10,000 500 130 120 Solidware 10,000 1,000 110 110

Fusion ioDrive 116,046 93,199 (75/25 mix) 750 670 Vendor advertised performance 25 Experimental Results - Backup RR/RW Baseline

Append/Pack Speedup Prediction 50/50 38 MiB/s 349 MiB/s 9.1 6.2 75/25 48 MiB/s 397 MiB/s

8.3 4.3 90/10 131 MiB/s 541 MiB/s 4.1 2.5 ( = 2 in all experiments) 26 OLTP microbenchmark - Backup

50% RW/50% RR - before 50% RW/50% RR - after 27 OLTP Microbenchmark - Backup Traditional I/O 28 OLTP Microbenchmark - Backup Append-Pack 29

Recently Viewed Presentations

  • Structure of Wood Society of Wood Science and

    Structure of Wood Society of Wood Science and

    SWST Teaching Unit 1 Slide Set 2 SWST Teaching Unit 1 Slide Set 2 SOFTWOODS Now you know the structure of wood is quite complicated, especially the structure of hard-woods. The structure of soft-woods is much simpler. Here is a...
  • Brave New Schools: Identity and Power in Canadian

    Brave New Schools: Identity and Power in Canadian

    Each person's social identity is influenced by gender, race, socio-economic status, sexuality, religion, geographic region, ethnicity, age, dis/ability, and other characteristics. The students in every school are diverse in at least some of these ways. Describe your social identity.
  • Steven F. Ashby Center for Applied Scientific Computing Month ...

    Steven F. Ashby Center for Applied Scientific Computing Month ...

    Performance of a model may depend on other factors besides the learning algorithm: Class distribution Cost of misclassification Size of training and test sets Learning Curve Methods of Estimation Holdout Reserve 2/3 for training and 1/3 for testing Random subsampling...
  • Time Value of Money

    Time Value of Money

    Deriving CAPM The frontier slope Two lines slope are equal at the tangency point: CAPM Leading to the Capital Assets Pricing relation: Or CAPM Equation 8' is the Security Market Line (SML) APT Arbitrage Pricing Theory Let: ri, , E(ri,)...
  • Genetic Engineering - Mans

    Genetic Engineering - Mans

    Biology. Basics Genetic engineering is the science dealing with cloning. Cloning steps include isolation, modification, transfer and expression of the genetic information. Purposes: Studying gene structure. Production of useful proteins such as vaccines. Generation of transgenic plants and animals.
  • Parallel DB 101 - Mass Data Training Group

    Parallel DB 101 - Mass Data Training Group

    Moore's Law. $100/TB storage, $1000 servers, commodity networking. Increasing volumes of "dark" data. Data collected but never analyzed. Widening analysis gap of "traditional" solutions
  • Matter Physical and Chemical Changes Pure Substances Mixtures

    Matter Physical and Chemical Changes Pure Substances Mixtures

    Physical and chemical properties may be intensive or extensive. Intensive and Extensive Properties. Intensive properties such as density, color, and boiling point do not depend on the size of the sample of matter and can be used to identify substances.
  • Eo, ire, ii (ivi), itum

    Eo, ire, ii (ivi), itum

    Fio, fieri, factus sum postati, nastati, dogoditi se osnovno: fio, fieri, factus sum - nepravilan glagol pasiv glagola facio 3. (facere) feci, factum, "napraviti, činiti, raditi" Služba riječi u rečenici kao i sum, esse, fui , fio tvori imenske predikate...