OceanStore: An Infrastructure for Global-Scale Persistent Storage John

OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, A few slides have been borrowed from the authors presentations Ben Zhao Vision

What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information Source: Berkeley OceanStore Website Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information

data all kinds of information desktop, laptop, palmtop cars, cellular phones, other devices futuristic: embedded in environment Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information persistence

devices can be rebooted, lost, replaced reliable, durable data (deep archival will last forever) Automatic maintenance Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information connectivity even to tiniest devices, possibly intermittent variable bandwidth, latency

availability uniform access, comparable to LAN-based networked storage fault-tolerant, DoS-tolerant Vision what is oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information scale

geographically distributed 1010 users 1014 files / objects Questions about information: Where is persistent information stored? 20th-century tie between location and content outdated In world-scale system, locality is key How is it protected? Can disgruntled employee of ISP sell your secrets? Cant trust anyone (how paranoid are you?)

Can we make it indestructible? Want our data to survive the big one! Highly resistant to hackers (denial of service) Wide-scale disaster recovery Is it hard to manage? Worst failures are human-related First Observation: Want Utility Infrastructure Mark Weiser from Xerox: Transparent computing is the

ultimate goal. Computers should disappear into the background In the context of storage: Dont want to worry about backup Dont want to worry about obsolescence Need lots of resources to make data secure and highly available, BUT dont want to own them Outsourcing of storage already becoming popular Pay monthly fee and your data is out there Utility-based Infrastructure

Canadian OceanStore Sprint AT&T Pac Bell IBM IBM Service provided by confederation of companies Monthly fee paid to one service provider

Companies buy and sell capacity from each other Target applications Email Group calendar, contacts Distributed design tools Computer Supported Cooperative Work Digital libraries Distributed/shared repositories Assumptions Untrusted infrastructure A small number of servers may crash or leak

information most of the servers functioning correctly financially responsible party of servers ensure integrity but only clients trusted with cleartext Nomadic data

data divorced from location flows freely within the storage infrastructure promiscuous caching: anywhere, anytime location important for performance dynamic system tuning through introspection System overview persistent object GUID: 160-bit SHA-1 hash secure identification globally unique and unforgeable 280 unique objects before collisions (birthday paradox) floating object replicas: independent of location

encrypted data read try fast probabilistic replica search (Bloom filter) fallback to slower deterministic search (Tapestry) write update with predicates [as in Bayou what is Bayou?] creates new version What is Bayou The Bayou System (Xerox PARC) is a

platform of replicated, highly-available, variable-consistency, databases on which collaborative applications can be built. It caters to portable devices having intermittent connections. System overview application interface sessions: sequence of read/writes session guarantees [Bayou] loose consistency levels, ACID active and archival forms

active: latest version, with update handle archive: erasure coded read-only version dynamic optimization object location degree of replication Tentative Updates: Epidemic Dissemination Committed Updates: Multicast Dissemination

naming self-certifying path names (Mazires) object GUID = hash of owner key and readable name create hierarchies using directory objects read restriction through client encryption of data write restriction, access control associate ACL lists with object, respected by servers addressing

address an object by its GUID message: GUID, random number, small predicate route to closest GUID replica matching predicate combines data location and routing: no central name service to attack save one round-trip for location discovery routing fast, probabilistic search algorithm slow, deterministic search algorithm routing fast, probabilistic search algorithm

Bloom filter probabilistic set membership test using bit vector n-bit vector generated from n hashes of each set element filter is union (OR) of all bit vectors attenuated Bloom filter array of d i th Bloom filters

Bloom filter is union of all

Updates based on versioning and conflict resolution i.e. no locking update: actions with predicates commit apply action of first true predicate abort no true predicates conflict resolution on encrypted data possible predicates: compare-version, compare-size, compare-block, search possible actions: replace-block, insert-block, delete-block, append

archival produced when objects idle use erasure codes (redundant fragmentation) simplest example: parity bit need any (n-1) out of n fragments interleaved Reed-Solomon codes, Tornado codes fragmentation improves reliability deep archival storage sweeper processes ensure replication sustained over time

fragmentation improves performance Erasure Codes imple parity bits, or generalized Reed-Solomon code can be used to implement it. Floating Replica and Deep Archival Coding Full Copy Ver1: 0x34243 Ver2: 0x49873

Ver3: Conflict Resolution Logs Floating Replica Full Copy Ver1: 0x34243 Ver2: 0x49873 Ver3:

Conflict Resolution Full Copy Ver1: 0x34243 Ver2: 0x49873 Ver3: Conflict Resolution Logs

Erasure-coded Fragments dynamic optimization (introspection) observation modules collect and summarize information incrementally update system database optimization modules periodically process the observation database cluster recognition: group related objects replica management: maintain replica number and location

periodic migration: work-home-work-home maintenance: routing, dissemination, availability, durability

Recently Viewed Presentations

  • Routing - University of Wisconsin-Madison

    Routing - University of Wisconsin-Madison

    Loop-Breaking Heuristics Set infinity to 16 Assume this is maximum number of hops in network Split horizon Don't send routes learned from a neighbor back to a neighbor Split horizon with poison reverse Send route back to neighbor with negative...
  • CELEBRATING SEASONS OF THE CHURCH YEAR Presented by

    CELEBRATING SEASONS OF THE CHURCH YEAR Presented by

    Read the Christmas story from the gospels of Matthew & Luke. Record kids acting out the Christmas story. Christ candle. Do 12 acts of Christmas kindness. Pray before your Christmas meal. Use white placemats or napkins. Pray before you open...
  • Institutional Assessment: Patient Satisfaction

    Institutional Assessment: Patient Satisfaction

    Imogene King: Theory of Goal Attainment. Each individual brings different knowledge , needs, goals, past experiences and perceptions, which influence interaction. Purposeful interaction leading to goal attainment. Rules that define rights and obligations in a position
  • Prefixes and Suffixes - Ms. Lonza's 6th Grade English Class

    Prefixes and Suffixes - Ms. Lonza's 6th Grade English Class

    a word you can add beginnings and endings to Why look at root word? helps find meanings of words What is a prefix? a word part that has its own meaning add to change meaning of words Prefix pre- un-...
  • New English Language Development and Common Core State

    New English Language Development and Common Core State

    juicy sentences " to dissect and emulate when preparing to write research report. 1. I do not think this will be easy, or even very successful the first time I try it with students. It's very different for me, and...
  • Kelso's Choice Curriculum

    Kelso's Choice Curriculum

    Because you are becoming familiar with Kelso's Choices, you are preparing for conflict you may have with others so you are practicing Habit 2: BEGIN WITH THE END IN MIND. You are PUTTING FIRST THINGS FIRST (Habit 3) when you...
  • Business Process Modeling for Better Requirements

    Business Process Modeling for Better Requirements

    (Business Process Modeling Center of Excellence) at FedEx. Implemented and administered a modeling tool. Implemented modeling standards. Trained 100+ modelers. Developed business process models for strategic programs. Introduction - Carl Radunsky
  • Why Not Store Everything in Main Memory? Why use disks?

    Why Not Store Everything in Main Memory? Why use disks?

    A random pre-pad for each bit-column would make it impossible to break the code by simply focusing on the first bit row. ... r r vv r mR r v v v r r v mV v r v v...