Nucleotide and Protein sequence and structure databases The different types of databases in bioinformatics Organisation: flat files Relational databases Object-oriented databases Availability:
Publicly available, no restriction Available, but with copyright Accessible, but not downloadable Academic, but not freely available Commercial Curators: Large, public institution (EMBL, NCBI) Quasi-academic institute (Swiss institute of Bioinformatics, TIGR,) Academic group or scientists Commercial company
Identifiers and Accession numbers Identifier: string of letters and digits that somehow define the sequence or structure Example: TPIS_CHICK (Triose Phosphate Isomerase from chicken (gallus gallus) ) in SwissProt The identifier can change (based on the curator) Accession code: a string of letters and digits that uniquely identifies an entry in its
database. The accession number for TPIS_CHICK in Swissprot is P00940 Accession number should not changed!! Nucleotide sequence databases EMBL, GenBank, and DDBJ are the three primary nucleotide sequence databases EMBL www.ebi.ac.uk/embl/ GenBank www.ncbi.nlm.nih.gov/Genbank/ DDBJ www.ddbj.nig.ac.jp
Genbank An annotated collection of all publicly available nucleotide and proteins Set up in 1979 at the LANL (Los Alamos). Maintained since 1992 NCBI (Bethesda). http://www.ncbi.nlm.nih.gov
EMBL Nucleotide Sequence Database An annotated collection of all publicly available nucleotide and protein sequences Created in 1980 at the European Molecular Biology Laboratory in Heidelberg. Maintained since 1994 by EBI- Cambridge.
http://www.ebi.ac.uk/embl.html http://www3.ebi.ac.uk/Services/DBStat s/ DDBJDNA Data Bank of Japan An annotated collection of all publicly available nucleotide and protein sequences Started, 1984 at the National Institute of
Genetics (NIG) in Mishima. Still maintained in this institute a team led by Takashi Gojobori. http://www.ddbj.nig.ac.jp Sequence submission Data mainly direct submissions from the authors. Submissions through the Internet:
Web forms. Email. Sequences shared/exchanged between the 3 centers on a daily basis: The sequence content of the banks is identical. Derived databases
CUTG Codon usage tabulated from GenBank http://www.kazusa.or.jp/codon/ Genetic Codes Deviations from the standard genetic code in various organisms and organelles http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?m ode=c TIGR Gene Indices Organism-specific databases of EST and gene sequences http://www.tigr.org/tdb/tgi.shtml UniGene Unified clusters of ESTs and full-length mRNA sequences http://www.ncbi.nlm.nih.gov/UniGene/
ASAP Alternative spliced isoforms http://www.bioinformatics.ucla.edu/ASAP Intronerator Introns and alternative splicing in C.elegans and C.briggsae http://www.cse.ucsc.edu/~kent/intronerator/ Sequence Retrieval Tools Various tools to get sequences of interests from databases Entrez in NCBI http://www.ncbi.nlm.nih.gov/Entrez SRS
for EMBL and other DBs http://srs.ebi.ac.uk Fetch in GCG package Seqret in EMBOSS Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein classification Databases of individual protein families
http://www.ncbi.nlm.nih.gov/protein NCBI Protein database The NCBI Entrez Protein database Sequences from: SwissProt, the Protein Information Resource, the Protein Research Foundation, the Protein Data Bank, and translations from annotated coding regions in the GenBank and RefSeq databases. Protein sequence records in Entrez have links to precomputed protein BLAST alignments, protein structures, conserved protein domains, nucleotide sequences, genomes, and genes.
Swiss-Prot The Swiss-Protein Knowledgebase is a curated protein sequence database established in 1986. It provides a high level of annotation (such as the description of protein function, domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. Now, it is part of the Universal Protein Knowledgebase (a part of UniProt), a "one-stop shop" that allows easy access to all publicly available information of protein sequence
annotation. UniProt (http://www.ebi.uniprot.org/index.shtml) The Swiss-Prot, TrEMBL, and PIR protein database activities have united to form the Universal Protein Resource (UniProt) Uniprot Knowledgebase (UniprotKB): curated Sequence information, annotations, linked to other databases. Uniprot Reference Clusters (UniRef): removing
sequence redundancy by merging sequences that are 100%, 90% and 50%, no annotations, linked to Knowledgebase and UniParc records. Uniprot Archive (UniParc): history of sequences, no annotation, linked to source records. Trivia The shortest sequence is GWA_SEPOF (P83570): 2 amino acids, a Neuropeptide from cuttle fish. The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids, assembly and functioning of vertebrate striated muscles, defects cause myopathies. http://www.expasy.org/sprot
General Sequence databases General Sequence databases General protein sequence databases Protein Sequence database Source Properties worth mentioning URL
EXProt Proteins with experimentally verified function Non redundant http://www.cmbi.kun.nl/EXProt/ MIPS Proteins from genome sequencing projects Manually curated
http://mips.gsf.de/ NCBI Protein database Multiple source Blast results, structures http://www.ncbi.nlm.nih.gov/entrez PA-GOSUB Protein sequences from model organisms GO assignment and subcellular
localization http://www.cs.ualberta.ca/~bioinfo/PA/GOSU B/ PIR multiple source Annotated sequences merged with Uniprot now PRF sequences, source includes literature search
includes sequences not found in EMBL, Genbank and SwissProt also includes synthetic proteins and peptides http://www.prf.or.jp/en RefSeq multiple combination of manual and automated methods http://www.ncbi.nlm.nih.gov/RefSeq/
Swiss-Prot multiple High level annotation and minimal level of redundancy http://www.expasy.org/sprot Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein classification Databases of individual protein families
DBs based on Protein properties AAindex: AAindex is a database of amino acid indices and amino acid mutation matrices Cybase: Cyclic proteins dbPTM: protein post-translational modification (PTM ) information
iProLINK: Integrated Protein Literature, INformation and Knowledge PFD - Protein Folding Database PINT: Protein-protein Interactions Thermodynamic Database PPD: Protein pKa Database
ProTherm: Thermodynamic database for proteins and mutants REFOLD: Data related to refolding experiments Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein classification Databases of individual protein families
Protein localization and targeting
DBSubLoc - Database of protein Subcellular Localization LOCATE: manually curated, immunofluorescence-based assay data MitoNuc: database of nuclear encoded mitochondrial proteins in Metazoa NESbase: Leucine-rich nuclear export signal NLSdb: database of nuclear localization signals NMPdb - Nuclear matrix associated proteins database NOPdb - Nucleolar Proteome Database: NPD - Nuclear Protein Database: results from MS Nuclear Receptor Resource: NUREBASE: nuclear hormone receptors NURSA: nuclear receptors OGRe - Organellar Genome Retrieval: mitochondrial genomes PSORTdb: protein subcellular localizations: ePSORT, cPSORT Secreted Protein Database: human, mouse and rat.
THGS - Transmembrane Helices in Genome Sequences Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein classification Databases of individual protein families Protein sequence motifs and active sites
ASC - Active Sequence Collection Blocks CoC COMe - Co-Ordination of Metals etc. CoPS CSA - Catalytic Site Atlas eBLOCKS eF-site - Electrostatic surface of Functional site eMOTIF InterPro Metalloprotein Site Database O-GLYCBASE PDBSite PhosphoELM Base PRINTS PROMISE
ProRule PROSITE ProTeus SitesBase Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein classification Databases of individual protein families Protein domain databases; protein classification
ADDA - Automatic Domain Decomposition Algorithm BAliBASE BIOZON CDD CluSTr - Clusters of Swiss-Prot and TrEMBL proteins
COG - Clusters of Orthologous Groups of proteins FunShift FusionDB Hits HSSP InterDom InterPro PROSITE, Pfam, PRINTS, Prodom, SMART, TIGRFAMs, PIR superfamily iProClass MulPSSM PALI PANDIT Pfam PIRSF ProDom SP, TrEMBL ProtoMap
ProtoNet SBASE SIMAP SMART Simple Modular Architecture Research Tool SUPFAM TCDB TIGRFAMs HMM, GO annotations, MSA ProtRepeatsDB Protein Databases General Sequence databases Protein properties Protein localization and targeting Protein sequence motifs and active sites Protein domain databases; protein
classification Databases of individual protein families
AARSDB ABCdb ARAMEMNON BacTregulators (formerly AraC/XylS database) CSDBase - Cold Shock Domain database DCCP - Database of Copper-Chelating Proteins DExH/D Family Database DSD Endogenous GPCR List EROP-Moscow ESTHER FUNPEP
GPCRDB gpDB - G-protein database Histone Database HIV RT and Protease Sequence Database Homeobox Page Homeodomain Resource InBase KinG - Kinases in Genomes Knottins LGICdb Lipase Engineering Database Lipid MAPS LOX-DB MEROPS Nuclear Receptor Resource NucleaRDB NUREBASE
NURSA Olfactory Receptor Database Peptaibol PHYTOPROT PLANT-PIs PlantsP/PlantsT PLPMDB ProLysED - Prokaryotic Lysis Enzymes Database Prolysis Protein kinase resource REBASE Ribonuclease P Database RNRdb RPG - Ribosomal Protein Gene database RTKdb - Receptor Tyrosine Kinase database SDAP SENTRA
SEVENS SRPDB TransportDB VKCDB - Voltage-gated K+ Channel Database Wnt Database Databases of individual protein families Protein DataBank (PDB) Important in solving real problems in molecular biology Protein Databank
PDB Established in 1972 at Brookhaven National Laboratory (BNL) Sole international repository of macromolecular structure data Moved to Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/ Effective use of PDB Queries are of three types PDBid - As quoted in paper
Search Lite - one or more keywords Search Fields - A detailed query form Query results Structure Explorer - details of the structure Query Result Browser - for multiple structures PDB Viewer PDB: example HEADER LYASE(OXO-ACID) 01-OCT-91 12CA 12CA 2
COMPND CARBONIC ANHYDRASE /II (CARBONATE DEHYDRATASE) (/HCA II) 12CA 3 COMPND 2 (E.C.220.127.116.11) MUTANT WITH VAL 121 REPLACED BY ALA (/V121A) 12CA 4 SOURCE HUMAN (HOMO SAPIENS) RECOMBINANT PROTEIN 12CA 5 AUTHOR S.K.NAIR,D.W.CHRISTIANSON 12CA 6 REVDAT 1 15-OCT-92 12CA 0 12CA 7 JRNL AUTH S.K.NAIR,T.L.CALDERONE,D.W.CHRISTIANSON,C.A.FIERKE 12CA 8 JRNL TITL ALTERING THE MOUTH OF A HYDROPHOBIC POCKET. 12CA 9 JRNL TITL 2 STRUCTURE AND KINETICS OF HUMAN CARBONIC ANHYDRASE 12CA 10 JRNL TITL 3 /II$ MUTANTS AT RESIDUE VAL-121
REMARK 3 PROGRAM PROLSQ 12CA 19 REMARK 3 AUTHORS HENDRICKSON,KONNERT 12CA 20 REMARK 3 R VALUE 0.170 12CA 21 REMARK 3 RMSD BOND DISTANCES 0.011 ANGSTROMS 12CA 22 REMARK 3 RMSD BOND ANGLES 1.3 DEGREES 12CA 23 REMARK 4 12CA 24
REMARK 4 N-TERMINAL RESIDUES SER 2, HIS 3, HIS 4 AND C-TERMINAL 12CA 25 REMARK 4 RESIDUE LYS 260 WERE NOT LOCATED IN THE DENSITY MAPS AND, 12CA 26 REMARK 4 THEREFORE, NO COORDINATES ARE INCLUDED FOR THESE RESIDUES. 12CA 27 PDB (cont.) SHEET 3 S10 PHE 66 PHE 70 -1 O ASN 67 N LEU 60 12CA 68 SHEET 4 S10 TYR 88 TRP 97 -1 O PHE 93 N VAL 68 12CA 69 SHEET 5 S10 ALA 116 ASN 124 -1 O HIS 119 N HIS 94 12CA 70 SHEET 6 S10 LEU 141 VAL 150 -1 O LEU 144 N LEU 120 12CA 71 SHEET 7 S10 VAL 207 LEU 212 1 O ILE 210 N GLY 145 12CA 72 SHEET 8 S10 TYR 191 GLY 196 -1 O TRP 192 N VAL 211 12CA 73 SHEET 9 S10 LYS 257 ALA 258 -1 O LYS 257 N THR 193 12CA 74 SHEET 10 S10 LYS 39 TYR 40 1 O LYS 39 N ALA 258 12CA 75 TURN 1 T1 GLN 28 VAL 31 TYPE VIB (CIS-PRO 30) 12CA 76
TURN 2 T2 GLY 81 LEU 84 TYPE II(PRIME) (GLY 82) 12CA 77 TURN 3 T3 ALA 134 GLN 137 TYPE I (GLN 136) 12CA 78 TURN 4 T4 GLN 137 GLY 140 TYPE I (ASP 139) 12CA 79 TURN 5 T5 THR 200 LEU 203 TYPE VIA (CIS-PRO 202) 12CA 80 TURN 6 T6 GLY 233 GLU 236 TYPE II (GLY 235) 12CA 81 CRYST1 42.700 41.700 73.000 90.00 104.60 90.00 P 21 2 12CA 82 ORIGX1 1.000000 0.000000 0.000000 0.00000 12CA 83 ORIGX2 0.000000 1.000000 0.000000 0.00000
ATOM 3 C TRP 5 6.786 -2.502 10.667 1.00 13.47 12CA 91 ATOM 4 O TRP 5 6.422 -2.085 9.607 1.00 13.57 12CA 92 ATOM 5 CB TRP 5 6.997 -0.917 12.645 1.00 13.34 12CA 93 ATOM 6 CG TRP 5 5.784 -0.209 12.221 1.00 13.40 12CA 94 ATOM 7 CD1 TRP 5 5.681 1.084 11.797 1.00 13.29 12CA 95 ATOM 8 CD2 TRP 5 4.417 -0.667 12.221 1.00 13.34 12CA 96 ATOM 9 NE1 TRP 5 4.388 1.418 11.515 1.00 13.30 12CA 97 ATOM 10 CE2 TRP 5 3.588 0.375 11.797 1.00 13.35 12CA 98 ATOM 11 CE3 TRP 5
3.837 -1.877 12.645 1.00 13.39 12CA 99 ATOM 12 CZ2 TRP 5 2.216 0.208 11.656 1.00 13.39 12CA 100 ATOM 13 CZ3 TRP 5 2.465 -2.043 12.504 1.00 13.33 12CA 101 ATOM 14 CH2 TRP 5 1.654 -1.001 12.009 1.00 13.34 12CA 102 . Databases related to Proteomics Contain information obtained by 2D-PAGE: master images of the gels and description of identified proteins Examples: SWISS-2DPAGE, ECO2DBASE, Maize-2DPAGE, Sub2D, Cyano2DBase, etc. Format: composed of image and text files
Most 2D-PAGE databases are federated and use SWISS-PROT as a master index Mass Spectrometry (MS) database Database searching tips Look for links to Help or Examples Always check update dates Level of curation Try Boolean searches Be careful with UK/US spelling differences
leukaemia vs leukemia haemoglobin vs hemoglobin colour vs color Homework Go to the UniprotKB Search for Lys49Phospholipase A2 from
Agkistrodon contortix laticinctus Describe, in paragraph form, the protein using functional information/characteristics from the site. Include a small picture of the protein structure Limit of one page, font size 11, Arial. One-inch margin on all sides, short bond paper. 58
Sequence Source organism Function Variants History of research Scientists and institutes involved And many more!
More complex, clinical diagnosis requirement, metaphylaxis, prophylaxis, Aiming to increase availability. Application process. Pharmacovigilance. Wholesaling . Retailing . Prescribing and dispensing . Cascade and. exceptional imports. P O S S I B L E . F U T U R...
4th Grade is Fabulous! Desert Heights Charter School Open House July 30, 2011 4th Grade Team Ms. Bethany Jarrell [email protected] Mrs. Nicole Parker [email protected] Ms. Jennifer Scruggs [email protected] = 4th Grade Success Stay Focused Take notes in class Pay attention...
Question tags are short questions at the end of statements. ... Form : Positive Statement → negative question tag Negative statement → positive question tag The statement have raising intonations at end of the sentences. The auxiliaries (is, am, are,...
Pytanie 1 - Harry Potter Quiz Gilderoya Lockharta Artura Weasleya Severusa Snape'a Moody'ego Toma Riddle'a Pytanie 2 - Harry Potter Quiz Percy Weasley Artur Weasley Ron Weasley Charlie Weasley George Weasley Bill Weasley Fred Weasley Pytanie 3 - Harry Potter...
The Executive Office of the President (EOP) consists of the immediate staff of the President of the United States, as well as multiple levels of support staff reporting to the President. Since its inception under Franklin D. Roosevelt, the size...
Cerebrospinal fluid formed by the choroid plexus cells and ependymal cells occupies this space and Absorbed by the arachnoid villus. Blood Brain Barrier : It restricts entry of molecules to the brain such as blood cells, large proteins, lipids, H...
Transcendentalism. Transcendentalism is not just a movement that took during the 1800s. It has left lasting effects on the world and how we live life today. We will be looking at other examples of how the ideas of Transcendentalism are...
FAIR TRADE AND SOCIAL PROJECTS LITERACY gives IDENTITY to a displaced community FAIRTRADE Registration The Greenfield estate is included on the International Fairtrade Tea Register. A Social Committee (Joint body), made up of elected representatives from the estate residents, is...
Ready to download the document? Go ahead and hit continue!