Distributed Autonomic Systems - Columbia University

Autonomic Systems Autonomic: adaptive Self-healing: Self-optimizing: variable encoding schemes for web audio streaming services Self-regulating : cluster systems via node restart apache web server periodically kills child processes Maintenance: expensive, time-consuming I want my availability, but I wont do it myself Automated maintenance: Cheaper Quicker response than human 24/7 watch, can afford to forget and leave running

Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 1 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns: Introspection: Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic Intercession:

Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 2 Build large-scale systems with reusable components Inherent problem with the development of largescale systems Hugely complex, unwise for one group of developers to create the whole thing from scratch Outsource sub-projects to experts vs. license their technology Integrate with COTS components: Cheaper than to re-implement them Software engineering and practicality reasons component has already been implemented available immediately no duplication of effort 3 types of software components: COTS In-house One-use, specific-purpose component Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002

3 Component-based Software Engineering Software component: unit of software that conforms to a component model e.g. COM+, JavaBeans Defines standards: Composition: how components are composed together Interaction: IDL description of interface elements Two stages of CBSE 1. Component development 2. No feedback from customer No waterfall model with iterations Exhibit openness, adaptability, Integrating component into applications Requirements analysis

Choose component with required functionality Take it or leave it ... but then go on looking for another implementation Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 4 Component-based Software Engineering ii Imperfect match in functionality and requirements Fixed contract Active Interfaces [12] Adaptation interface. Open policies Static adaptation of component functionality Interface Incompatibilities No means for component evolution Granularity of operations and data-types, interaction mechanisms, implementation languages Component wrappers Connectors [14] SWIG, JNI, popen(..), system(..)

Considerations Application builder is not going to re-implement the component Want to maintain encapsulation, information hiding Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 5 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns: Introspection:

Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic Intercession: Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 6 Static modeling of possible runtime reconfigurations Runtime adaptation of software Separation of concerns: Ever-changing resource availability Dynamic execution environment application logic vs. adaptation Granularity of adaptation Micro-level: Medium-level:

component developer-enabled mechanism, setting switches via Active Interfaces [12, 13, 16] change how components interact with the system, modify the interface [13, 14] Macro-level: phase in/out (groups of) components as part of the dynamic adaptation [13, 14] Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 7 Static modeling of possible runtime reconfigurations ii Self-contained adaptation within component Automatic generation of adaptation code Compiler and language support for high-level specification of adaptation mechanism [13] Pre-packaged adaptation mechanism [16] Automatic integration of new component versions Configuration management [15]

Installations, updates, un-installations Tentative use of new versions [14] Transparent testing in deployed environment Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 8 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns: Introspection:

Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic Intercession: Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 9 Writing code to implement dynamic adaptations Hard to dynamically adapt components Lack proper understanding of the internals Execute (un) trusted, unfamiliar code, with no idea how to fix if things fail Recognize the need to adapt Utilize the available runtime mechanisms Pre-existing reconfiguration mechanisms Dispatch directives to carry out local micro-adaptations Use adaptability of middleware to effectively carry out medium- and macro-scale adaptations Architectural design-driven adapted, guided by

component-interaction specifications The inability to reconfigure when required, is a form of failure Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 10 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns: Introspection: Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic

Intercession: Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 11 Self-healing systems Failure is inevitable: [20] human error: unanticipated problem: stress level proportional to probability of making a mistake [22] can shield from user error, systems lack protection from administrator's errors [22] beyond careful and thorough testing directed security attack lack of handling mechanism software aging: transient bugs recovery requires a restart

build-up of transient bugs failure-prone state during execution Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 12 Self-healing systems ii Availability of system Highly resilient Availability ratio: MTTF / (MTTF+MTTR) Programmed to handle every expected problem Self-heals: manages to survive unexpected situations increase base longevity period (BLP) decrease recovery time Problem-handling mechanism: reactive, failure-driven: detect occurred failure, follow with restart of affected subsystems from a stable state preventive/proactive, failure-avoidance: detect increased likelihood of failure, and gradual

degradation of performance, avert imminent failure Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 13 Technique: Software Rejuvenation [18, 19] Graceful termination, Immediate restart Restart at a clean, internal state Build-up of transient bugs Numerical accumulation errors, unreleased system resources, memory leak, data corruption Levels of rejuvenation Total rejuvenation Partial rejuvenation Scheduled downtime can be fairly cheap Minimal interruption during low usage periods Transparently rejuvenate selected subcomponents Decoupling between subcomponents Reduced recovery time only for subsystem restart Recursive rejuvenation [21]

Rejuvenate progressively larger subsystems recursively Functional or data dependencies between subcomponents Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 14 Other self-healing techniques Program check-pointing Periodically save program state to persistent storage Can rewind to previous states The power of hindsight to enable retroactive repair Demonstrates what if semantics Database systems: auditing, logs recovery to a valid state install corrective patch, resume [22] rollback to consistent state if cannot commit safely Zero-tolerance of system compromise Pre-emptive defense against security attacks

Randomized, but valid binary code sequence Sanity checking of control structures Choose immediate shutdown rather than have system get compromised Immediate restart, with new randomized code Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 15 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns:

Introspection: Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic Intercession: Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 16 Dynamic profiling, generation of runtime data Adaptation subsystem: Automated decision and implementation Monitoring logic and decision-making Execution of adaptation mechanism Adaptation for recovery or otherwise, without human intervention Runtime model of the system architecture Decision based on evolving model Runtime data generated by each component

Embedded probes: PSL Static-adaptable Active Interfaces [12] Context-dependent data format and content E-mail management system: size, frequency, sender/recipient addresses, types of attachments, encryption strength Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 17 Communication of runtime data to decision logic Extended RPC-style communication Client communicates with server at unknown location RPC clients (execution logic) should be unaware of the presence of RPC servers (decision logic) Need to multiplex emitted data Asynchronous callback I can't wait, let me know when you're done! Basic Message Passing to unknown recipients Event notification system Subscribe to published events-of-interest

Item of interest Generators of items of interest Something that happened somewhere, runtime data Core system execution, reporting runtime data Consumers of items of interest Monitoring subsystem, interested in runtime data Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 18 Event systems Centralized event systems event-driven GUI programming Event Delegation Model: AWT, SWING, JavaBeans Stable execution environment Tightly-coupled client-server model: JINI Indirection, anonymity of servers via mediator object Well-ordered delivery mechanisms

Fast, reliable, predictable Distributed event systems Supercharged mediator between decoupled entities Filtering Aggregating Store-and-forward, Store-and-retrieve Mutual anonymity Unreliable execution environment Delayed delivery Data loss Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 19 Distributed event systems Channel-based routing: Single channel per event type [9] birds of a feather flock together Subject-based routing:

faster turnaround time; simple, efficient delivery not scalable to large classes of events NNTP: events on a common theme / interest Mailing lists, CVS notifications Content-based (semantic) routing: Interested in a subset of a class of events selective delivery via specifying acceptability criteria Event-data determines propagation Data replication only if necessary [10, 11] Event composition [8] Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 20 Content-based event routing topologies Centralized routing node Approximation of localized event system Hierarchical collection of nodes Subscriptions only go up, notifications cascade down Disadvantages

Advantages Overloading of higher-level routing nodes Network partitioning via single node failure Simple routing algorithms Simple client-server relationships amongst routing nodes (A)cyclic peer-to-peer network Sophisticated routing algorithms Improved fault-tolerance Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 21 Items for discussion Can large-scale, distributed applications be selfhealing, self-regulating, self-optimizing? Important issues with respect to automated maintenance of large-scale, software systems Harder to build. Focus on reusable components Specify maintenance operations during development Considering maintenance as runtime adaptations

Gracefully handle unfamiliar, exceptional conditions Proposal: design methodology Separation of concerns: Introspection: Application code vs. adaptation mechanisms {decision logic, implementation} Communicate runtime data to decision logic Intercession: Transport reconfiguration code from decision logic Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 22 Activation of reconfiguration code Re-use events the source (client/decision logic) determines who gets reconfigured, so cannot have the server (execution logic) subscribe to these event systems not designed to carry large amount of binary code, if needed for component installation, etc Mobile agents [5]

autonomous program that executes on someones behalf decision logic instructs agents to carry out runtime reconfiguration tasks Late-binding of reconfiguration mechanism at target Asynchronous primary advantage of agents: reconfiguration might consist of significant amount of computing, ideally performed locally at execution logic rather than a long series of RPC invocations Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 23 Mobile code infrastructures Constituents Server: hosting, execution, transportation Place [6] Agent Server [1, 3, 7] Worklet Virtual Machine: PSL Agents Incorporate dynamic interfaces

Agent installs specific-purpose interfaces to components for customized access Wrapper while you wait, but can configure as needed Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 24 Automatic mobility of programs Strong mobility OS support for process relocation [5] Weak mobility State- and code-transfer at application level Programming-language, runtime support [6] Special-purpose language [6] Scripting languages [6] General purpose language [23] Agent code is in textual form Late-binding of class definitions by dynamic code loading Serialization of objects

Simulated strong mobility Local function continuations [2] Modified JVM [4] Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 25 Security issues: mobile code A greater vulnerability: unknown code Protect agent from server, and vice versa [1, 3, 7] Language support Bytecode verification in JVM Type-system protection from malicious classes Integrity-checking of bytecode instructions Cannot define / load core system classes Application-level security considerations: Authentication, authorization

Permissions model based on certification, credentials Data encryption during transit Tampering detection via digital signatures Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 26 Conclusions, future directions Autonomic large-scale, distributed systems Criteria for construction and automated maintenance State of the art research Autonomic systems exist for specific domains Technologies / tools available for building general framework for adaptation Dynamic architectural modeling Accurate modeling of the system during execution Decision made on evolving model Adaptation heuristics based on: Historical patterns Temporal data Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 27

Bibliography Mobile agents 1. 2. 3. 4. 5. 6. 7. Design of the Ajanta System for Mobile Agent Programming Anand R. Tripathi, Neeran M. Karnik, Tanvir Ahmed, Ram D. Singh, Arvind Prakash, Vineet Kakani, Manish K. Vora, Mukta Pathak Journal of Systems and Software, May 2002 How to Migrate Agents Matthew Hohlfeld, Bennet Yee Technical Report CS98-588, Computer Science and Engineering Department, University of California at San Diego, La Jolla, CA, June 1998 Experiences and Future Challenges in Mobile Agent Programming Anand R. Tripathi, Tanvir Ahmed, Neeran M. Karnik Microprocessor and Microsystems 2001 Pickling threads state in the Java system S. Bouchenak, D. Hagimont In Proc. of the Technology of Object-Oriented Languages and Systems (TOOLS), 2000 Mobile Agents: Are they a good idea? Colin G. Harrison, David M. Chess, Aaron Kershenbaum IBM Research Report, T.J.Watson Research Center, NY, 1995 Programming languages for mobile code Tommy Thorn ACM Computing Surveys, 29(3):213-239, 1997. Also Technical Report 1083, University of Rennes IRISA Design Issues in Mobile Agent Programming Systems Neeran M. Karnik, Anand R. Tripathi IEEE Concurrency, July-Sep 1998 Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 28

Bibliography Event systems 8. 9. 10. 11. Generic Support for Distributed Applications Jean Bacon, Ken Moody, John Bates, Richard Hayton, Chaoying Ma, Andrew McNeil, Oliver Seidel, Mark Spiteri IEEE Computer, pages 68-77, March 2000 Host Groups: A Multicast Extension to the Internet Protocol S. E. Deering, D. R. Cheriton Network Working Group: RFC 0966 State of the Art Review of Distributed Event Models Ren Meier Dept. of Computer Science, Trinity College Dublin, Ireland, March 2000. Technical report TCD-CS2000-16 Achieving Expressiveness and Scalability in an Internet-Scale Event Notification Service Antonio Carzaniga, David S. Rosenblum, Alexander L. Wolf In Proceedings of the Nineteenth ACM Symposium on Principles of Distributed Computing (PODC 2000) Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 29 Bibliography System adaptation 12. 13. 14. 15. 16. A Model for Designing Adaptable Software Components George Heineman In 22nd Annual International Computer Software and Applications Conference, pages 121--127, Vienna, Austria, August 1998. In 22nd Annual International Computer Software and Applications Conference, pages 121--127, Vienna, Austria, August 1998 Language and Compiler Support for Adaptive Distributed Applications

Vikram Adve, Vinh Vi Lam, Brian Ensink ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM 2001) Snowbird, Utah, June 2001 (in conjunction with PLDI2001) Increasing the Confidence in Off-the-Shelf Components: A Software Connector-Based Approach Marija Rakic, Nenad Medvidovic Proceedings of SSR '01 on 2001 Symposium on Software Reusability : Putting Software Reuse in Context A Cooperative Approach to Support Software Deployment Using the Software Dock Richard S. Hall, Dennis Heimbigner, Alexander L. Wolf International Conference on Software Enginering, May 1999 The Illinois GRACE Project: Global Resource Adaptation through CoopEration Sarita V. Adve, Albert F. Harris, Christopher J. Hughes, Douglas L. Jones, Robin H. Kravets, Klara Nahrstedt, Daniel Grobe Sachs, Ruchira Sasanka, Jayanth Srinivisan, Wanghong Yuan In proceedings of Workshop on Self-Healing, Adaptive and self-MANaged Systems (SHAMAN) 2002 Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 30 Bibliography Dynamic healing, Miscellaneous 17. 18. 19. 20. 21. 22. 23. Autonomic Computing Paul Horn, IBM Research Software Rejuventation: Analysis, Module and Applications Yennun Huang, Chandra Kintala, Nick Kolettis, N. Dudley Fulton Proceedings of the 25th International Symposium on Fault-Tolerant Computing (FTCS-25), Pasadena, CA, pp. June 1995, pp. 381-390 IBM director software rejuvenation. rejuvenation. White paper

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies David Patterson, Aaron Brown, Pete Broadwell, George Candea, Mike Chen, James Cutler, Patricia Enriquez, Armando Fox, Emre Kiciman, Matthew Merzbacher, David Oppenheimer, Naveen Sastry, William Tetzlaff, Jonathan Traupmann, Noah Treuhaft UC Berkeley Computer Science Technical Report UCB//CSD-02-1175, March 15, 2002 Reducing Recovery Time in a Small Recursively Restartable System George Candea, James Cutler, Armando Fox, Rushabh Doshi, Priyank Garg, Rakesh Gowda Appears in Proceedings of the International Conference on Dependable Systems and Networks (DSN-2002), June 2002 Rewind, Repair, Replay: Three R's to Dependability Aaron B. Brown, David A. Patterson To appear in 10th ACM SIGOPS European Workshop, Saint-Emilion, France, September 2002 Dynamic Class Loading in the Java(TM) Virtual Machine Sheng Liang, Gilad Bracha Conference on Object-oriented programming, systems, languages, and applications (OOPSLA'98) Autonomic Systems ... Gaurav S. Kc ... September 26th, 2002 31

Recently Viewed Presentations

  • Critical velocity

    Critical velocity

    Impact of CV Training. Aerobic capacity of Type II muscle fibers. Capacity to sustain a submaximal VO2 pace or power output. Injury rates while transitioning from base to race-pace training.
  • ALPHA Canada 2008 Peace & Reconciliation Tour

    ALPHA Canada 2008 Peace & Reconciliation Tour

    America needed to strike back at Japan for Pearl . Harbor, if only for morale. A daring and creative raid was planned by Lt. Col. James Doolittle. 16 B-25 bombers took off from a carrier (not normal!), bombed Japan, and...
  • The Kassel Laboratory Astrophysics Thz Spectrometrs

    The Kassel Laboratory Astrophysics Thz Spectrometrs

    Here, a new test setup for broadband fast sweep spectrometry in the MW to submm wavelength region has been realized and can be applied to identify transient molecules in a supersonic jet. An arbitrary waveform generator (AWG) is used to...
  • Ancient Greece

    Ancient Greece

    LYSIPPOS. 3rd of the great Late Classical sculptors. New canon of proportions -> more slender bodies and head 1/8 not 1/7 of body. APOXYOMENOS (Scraper) = athlete scraping oil from his body after exercising. Roman marble copy of bronze original...
  • Agility Power Point Template

    Agility Power Point Template

    OneView Administrator Training Company Module December 2010 Agenda Duplicate Company Report Merging Duplicate Companies Companies Flagged for Deletion Company Extract Account Re-Assignment Tool Duplicate Company Report Company Records in OneView There should only be one company record per location in...
  • CAMPBELL BIOLOGY IN FOCUS URRY  CAIN  WASSERMAN  MINORSKY

    CAMPBELL BIOLOGY IN FOCUS URRY CAIN WASSERMAN MINORSKY

    Scientific Inquiry. Edward B. Lewis, Christiane Nüsslein-Volhard, and Eric Wieschaus won a Nobel Prize in 1995 for decoding pattern formation in . ... In 1997, Scottish researchers announced the birth of Dolly, a lamb cloned from an adult sheep by...
  • FG BLOCK FIELD GOAL  P.A.T. PHILOSOPHY: BLOCK When

    FG BLOCK FIELD GOAL P.A.T. PHILOSOPHY: BLOCK When

    player and the original POST simply locks on the defender and "drives" the defender in the direction he is attacking - use his momentum against him. Bottom line the POST & DRIVE switch responsibility & get movement on this defender...
  • Δ - Y Conversion

    Δ - Y Conversion

    Arial Default Design MathType 5.0 Equation Back to the Bridge Delta (Δ) Connection Wye (Y) Connection Δ - Y Conversion Δ - Y Conversion (continued) Δ - Y Conversion (continued) Y - Δ Conversion Y - Δ Conversion (continued) Y...