Computational Analysis of Microbial Communities to find ...

Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas Overview Human Microbiome Project 16S rRNA Reference and Test Sets Classifiers Accuracy of Classifications

Results Human Microbiome Project (HMP) Microorganism communities Human development Physiology Immunity Disease

Nutrition Core Microbiome http://nihroadmap.nih.gov/hmp/ 16S rRNA 16S Ribosomal RNA

Large RNA component of the small subunit of the ribosome Phylogenetic Markers Species Identification 1542 bp Using 16S for Species Identification

Sequence Classifier Predicted Classification Project Goal New Sequencing Technology Evaluate the accuracy of the classification of the 16S

rRNA across different: Classifiers Regions of the sequence Phylogeny Reference Dataset RDP Core Set Trusted Taxonomies

6,621 sequences Phylum: 27 Class: 43 Order: 97 Family: 258 Genus: 1352 GreenGeness Full Collection of Sequences Full Collection used by GreenGenes High phylogenetic diversity

188,073 188,073 sequences Comparison of Taxonomy Predictions by Method Classified GreenGenes Core Set Using: RDP (Nave Bayesian)

kmerRank Blast All 188,073 135,269 Match 135,269 sequences Phylum: 27 Class: 43

Order: 96 Family: 257 Genus: 1335 None Match: 19588 None Match 32334 19588 BLAST RDP 4934 135269

kmerRank 15949 CD-hit: Normalizing Genus Representation 3% difference between genera 21,179 sequences Phylum:

188,073 27 Class: 43 Order: 96 Family: 235 Genus: 1241 135,269 21,179 Li, 2006 Sliding Window: Producing our Localized Regions

Sliding Window Approach 300 bp window 25 bp overlap Sanger vs. 454-XLR = Full-length vs. localized region Van de Peer, 1996 Overall Accuracy of the Three Different Classifiers Overall Accuracy of the Three Different Classifiers

Average BLASTN: .843 kmerRank: .830 RDP: .831 Overall Accuracy of the Three Different Classifiers Average

BLASTN: .843 kmerRank: .830 RDP: .831 Standard Deviation BLASTN: .031 kmerRank: .030 RDP: .017 Genus Prediction Accuracy

(per Phylum) Genus Prediction Accuracy (per Phylum) Average BLASTN: .843 kmerRank: .830 RDP: .831

Standard Deviation BLASTN: .107 kmerRank: .153 RDP: .142 Finding the 16S Region Providing the Most Reliable Prediction Accuracy 0.06 0.05 0.04 0.03 0.02 0.01

0 -0.01 0.15 0.22 -0.02 -0.03 -0.04 -0.05 0.3 0.37 0.45 0.52 0.6 0.67 0.75 0.82 0.9

0.97 1.05 1.12 1.2 1.27 1.35 Clustering Phyla and Methods by Prediction Accuracy Clustering Phyla and Methods by Prediction Accuracy Best method is Phylum-dependent

Variation in accuracy impacted by depth of species coverage Summary Central region of 16S is the most accurate, on average Of the methods examined, BLAST is most accurate across all 16S regions and all phyla, on average RDP-bayes is least variable across short sequence regions Best short sequence classification method

is phylum-dependent Acknowledgements Genome Sequencing and Analysis Program Brian Haas

Dirk Gevers Michael Feldgarden Doyle Ward Chad Nusbaum Bruce Birren Administration Shawna Young Lucia Vielma Maura Silverstein

Recently Viewed Presentations

  • Occupational Health and Safety Shop Safety Fire Drill

    Occupational Health and Safety Shop Safety Fire Drill

    WHMIS--workplace hazardous materials information system..objective 3 There are three main parts to WHMIS : 1) worker education 2) labels 3) material safety data sheet Worker education: 1)how to interpret suppliers labels and material safety data sheets 2)the significance of the...
  • Abstract Algebra - Kutztown University of Pennsylvania

    Abstract Algebra - Kutztown University of Pennsylvania

    Here is the term cycle in a mathematically precise way: Cycle Definition A permutation Sn is a cycle if it has at most one orbit containing more than one element. The length of a cycle is the number of elements...
  • Chapter 3

    Chapter 3

    Stem Classes. Stemming generates stem classes. A stem class is the group of words that will be transformed into the same. stem . by the stemming algorithm. Generated by running . stemmer. on large corpus. e.g., Porter stemmer on TREC...
  • CS696 Talk

    CS696 Talk

    Arial Gill Sans MT Book Antiqua Lucida Sans Unicode Times New Roman Symbol Default Design PowerPoint Presentation Research Projects A Gross Oversimplification Approach Recent Work Splint Programming the Swarm PowerPoint Presentation Programming the Swarm: Long-Range Goal Why this Might be...
  • William S. Hart Union High School District

    William S. Hart Union High School District

    William S. Hart Union High School District Library Study Guide Hart High School Level One Library Staff and Hours Mrs. Driggs Teacher Librarian Ms. DeLeon-Torres Library Technician (bilingual) Mrs. Vonrhein Textbook Technician (bilingual) HOURS Monday through Friday 7:15 - 3:30...
  • Notional Program Management Career Map* v6 - HCI

    Notional Program Management Career Map* v6 - HCI

    Notional Program Management Career Map* v6. Contracting Specialist. Operations Research Analyst. General Engineer. Software Engineer . Test & Evaluation Analyst. Logistics Mgmt Specialist. Program Analyst. Project Engineer. Program Integrator. Integration Product/Project . Team Lead. Capability Test Team Chair. Sustainment ....
  • Ergonomic Applications to the Practice of Dentistry

    Ergonomic Applications to the Practice of Dentistry

    How can Ergonomic Principles Be Applied to Dental Practice? ... angle Magnification factor Lighting needs Ergonomics in Dentistry Workstation Layout Ergonomics in Dentistry Operator Chair Ergonomics in Dentistry Patient Chair Ergonomics in Dentistry Posture/Positioning Potential Strategies ...
  • Henri Rousseau - Spring Brook Elementary School

    Henri Rousseau - Spring Brook Elementary School

    Henri Rousseau was born in 1844 in Laval, France and lived in France his whole life. He lived from 1844-1910 (he was 66 years old). French Post-Impressionist painter. Henri Rousseau was born in 1844 in Laval, France and lived in...