OceanStore: An Infrastructure for Global-Scale Persistent Storage John
OceanStore: An Infrastructure for Global-Scale Persistent Storage John Kubiatowicz, David Bindel, Yan Chen, Steven Czerwinski, Patrick Eaton, Dennis Geels, Ramakrishna Gummadi, Sean Rhea, Hakim Weatherspoon, Westley Weimer, Chris Wells, A few slides have been borrowed from the authors presentations Ben Zhao Vision
What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information Source: Berkeley OceanStore Website Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information
data all kinds of information desktop, laptop, palmtop cars, cellular phones, other devices futuristic: embedded in environment Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information persistence
devices can be rebooted, lost, replaced reliable, durable data (deep archival will last forever) Automatic maintenance Vision What is Oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information connectivity even to tiniest devices, possibly intermittent variable bandwidth, latency
availability uniform access, comparable to LAN-based networked storage fault-tolerant, DoS-tolerant Vision what is oceanstore? a utility infrastructure to span the globe and provide continuous access to persistent information scale
geographically distributed 1010 users 1014 files / objects Questions about information: Where is persistent information stored? 20th-century tie between location and content outdated In world-scale system, locality is key How is it protected? Can disgruntled employee of ISP sell your secrets? Cant trust anyone (how paranoid are you?)
Can we make it indestructible? Want our data to survive the big one! Highly resistant to hackers (denial of service) Wide-scale disaster recovery Is it hard to manage? Worst failures are human-related First Observation: Want Utility Infrastructure Mark Weiser from Xerox: Transparent computing is the
ultimate goal. Computers should disappear into the background In the context of storage: Dont want to worry about backup Dont want to worry about obsolescence Need lots of resources to make data secure and highly available, BUT dont want to own them Outsourcing of storage already becoming popular Pay monthly fee and your data is out there Utility-based Infrastructure
Canadian OceanStore Sprint AT&T Pac Bell IBM IBM Service provided by confederation of companies Monthly fee paid to one service provider
Companies buy and sell capacity from each other Target applications Email Group calendar, contacts Distributed design tools Computer Supported Cooperative Work Digital libraries Distributed/shared repositories Assumptions Untrusted infrastructure A small number of servers may crash or leak
information most of the servers functioning correctly financially responsible party of servers ensure integrity but only clients trusted with cleartext Nomadic data
data divorced from location flows freely within the storage infrastructure promiscuous caching: anywhere, anytime location important for performance dynamic system tuning through introspection System overview persistent object GUID: 160-bit SHA-1 hash secure identification globally unique and unforgeable 280 unique objects before collisions (birthday paradox) floating object replicas: independent of location
encrypted data read try fast probabilistic replica search (Bloom filter) fallback to slower deterministic search (Tapestry) write update with predicates [as in Bayou what is Bayou?] creates new version What is Bayou The Bayou System (Xerox PARC) is a
platform of replicated, highly-available, variable-consistency, databases on which collaborative applications can be built. It caters to portable devices having intermittent connections. System overview application interface sessions: sequence of read/writes session guarantees [Bayou] loose consistency levels, ACID active and archival forms
active: latest version, with update handle archive: erasure coded read-only version dynamic optimization object location degree of replication Tentative Updates: Epidemic Dissemination Committed Updates: Multicast Dissemination
naming self-certifying path names (Mazires) object GUID = hash of owner key and readable name create hierarchies using directory objects read restriction through client encryption of data write restriction, access control associate ACL lists with object, respected by servers addressing
address an object by its GUID message: GUID, random number, small predicate route to closest GUID replica matching predicate combines data location and routing: no central name service to attack save one round-trip for location discovery routing fast, probabilistic search algorithm slow, deterministic search algorithm routing fast, probabilistic search algorithm
Bloom filter probabilistic set membership test using bit vector n-bit vector generated from n hashes of each set element filter is union (OR) of all bit vectors attenuated Bloom filter array of d i th Bloom filters
Bloom filter is union of all
Updates based on versioning and conflict resolution i.e. no locking update: actions with predicates commit apply action of first true predicate abort no true predicates conflict resolution on encrypted data possible predicates: compare-version, compare-size, compare-block, search possible actions: replace-block, insert-block, delete-block, append
archival produced when objects idle use erasure codes (redundant fragmentation) simplest example: parity bit need any (n-1) out of n fragments interleaved Reed-Solomon codes, Tornado codes fragmentation improves reliability deep archival storage sweeper processes ensure replication sustained over time
fragmentation improves performance Erasure Codes imple parity bits, or generalized Reed-Solomon code can be used to implement it. Floating Replica and Deep Archival Coding Full Copy Ver1: 0x34243 Ver2: 0x49873
Erasure-coded Fragments dynamic optimization (introspection) observation modules collect and summarize information incrementally update system database optimization modules periodically process the observation database cluster recognition: group related objects replica management: maintain replica number and location
Loop-Breaking Heuristics Set infinity to 16 Assume this is maximum number of hops in network Split horizon Don't send routes learned from a neighbor back to a neighbor Split horizon with poison reverse Send route back to neighbor with negative...
Read the Christmas story from the gospels of Matthew & Luke. Record kids acting out the Christmas story. Christ candle. Do 12 acts of Christmas kindness. Pray before your Christmas meal. Use white placemats or napkins. Pray before you open...
Imogene King: Theory of Goal Attainment. Each individual brings different knowledge , needs, goals, past experiences and perceptions, which influence interaction. Purposeful interaction leading to goal attainment. Rules that define rights and obligations in a position
a word you can add beginnings and endings to Why look at root word? helps find meanings of words What is a prefix? a word part that has its own meaning add to change meaning of words Prefix pre- un-...
juicy sentences " to dissect and emulate when preparing to write research report. 1. I do not think this will be easy, or even very successful the first time I try it with students. It's very different for me, and...
Because you are becoming familiar with Kelso's Choices, you are preparing for conflict you may have with others so you are practicing Habit 2: BEGIN WITH THE END IN MIND. You are PUTTING FIRST THINGS FIRST (Habit 3) when you...
(Business Process Modeling Center of Excellence) at FedEx. Implemented and administered a modeling tool. Implemented modeling standards. Trained 100+ modelers. Developed business process models for strategic programs. Introduction - Carl Radunsky