Curso de verao

0 downloads 0 Views 2MB Size Report
Features could be repeats, genes, promoters, protein domains…….. • Features can be linked to other databases e.g.. Pfam/Pubmed. AG-ICB-USP ...
Anotação automática de sequências biológicas: ontologias e sistemas de pipelines

Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP

Sequence annotation •

Annotation is the process information to a DNA sequence.

of

adding



The information usually has DNA coordinate.



Features could be repeats, genes, promoters, protein domains……..



Features can be linked to other databases e.g. Pfam/Pubmed

AG-ICB-USP

Public databases • •

GenBank, EMBL and DDBJ. All databases update each other automatically

AG-ICB-USP

Feature table •

http://www.ncbi.nlm.nih.gov/projects/collab/FT/ •

Format definition



Covers DDBJ/EMBL/GenBank



Defines all accepted annotation terms and hierarchy

AG-ICB-USP

Annotation file Contains: • A header with: • • • • •

Information about the sequence Organism Authors References Comments

• A feature table containing

• Sequence features and co-ordinates

AG-ICB-USP

Header (EMBL) ID PFMAL1P4 standard; DNA; INV; 66441 BP. XX AC AL031747; XX SV AL031747.8 XX DT 24-SEP-1998 (Rel. 57, Created) DT 27-APR-2000 (Rel. 63, Last updated, Version 13) XX DE Plasmodium falciparum DNA from MAL1P4 XX KW HTG; rifin; telomere; var; var-like hypothetical protein. XX OS Plasmodium falciparum (malaria parasite P. falciparum) OC Eukaryota; Alveolata; Apicomplexa; Haemosporida; Plasmodium. XX RN [1] RA Oliver K., Bowman S., Churcher C., Harris B., Harris D., Lawson D., RA Quail M., Rajandream M., Barrell B.; RT ; RL Submitted (24-SEP-1998) to the EMBL/GenBank/DDBJ databases. RL P.falciparum Genome Sequencing Consortium, The

AG-ICB-USP

NCBI Header LOCUS PFMAL1P4 66442 bp DNA linear INV 02-DEC-2004 DEFINITION Plasmodium falciparum DNA from MAL1P4, complete sequence. ACCESSION AL031747 AL844501 VERSION AL031747.9 GI:23477012 KEYWORDS HTG; rifin; telomere; var; var-like hypothetical protein. SOURCE Plasmodium falciparum 3D7 ORGANISM Plasmodium falciparum 3D7 Eukaryota; Alveolata; Apicomplexa; Haemosporida; Plasmodium. REFERENCE 1 AUTHORS Hall,N., Pain,A., Berriman,M., Churcher,C., Harris,B., Harris,D., TITLE Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13 JOURNAL Nature 419 (6906), 527-531 (2002) PUBMED 12368867 REFERENCE 2 AUTHORS Oliver,K., Pain,A., Berriman,M., Bowman,S., Churcher,C., Harris,B., Harris,D., Lawson,D., Quail,M., Rajandream,M., Hall,N. and Barrell,B. TITLE Direct Submission JOURNAL Submitted (24-SEP-1998) P.falciparum Genome Sequencing Consortium, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK COMMENT On Oct 2, 2002 this sequence version replaced gi:7670004. For more information about this sequence or the Malaria Project, AG-ICB-USP

Feature •

Region of DNA that was annotated with a key/qualifier •

Keys: CDS, intron, miscellaneous, etc.



Qualifier: notes or extra-information about a feature i.e. exon (key) /gene=“adh” (qualifier)

AG-ICB-USP

Feature keys attenuator misc_difference misc_feature C_region CAAT_signal misc_recomb misc_RNA CDS misc_signal conflict misc_structure D-loop D_segment modified_base mRNA enhancer N_region exon old_sequence GC_signal polyA_signal gene polyA_site iDNA precursor_RNA intron prim_transcript J_segment LTR mat_peptide misc_binding

primer_bind promoter protein_bind RBS repeat_region repeat_unit rep_origin rRNA S_region satellite scRNA sig_peptide snRNA snoRNA source stem_loop STS TATA_signal terminator

transit_peptide tRNA unsure V_region V_segment variation 3'clip 3'UTR 5'clip 5'UTR -10_signal -35_signal

AG-ICB-USP

Feature qualifier Additional information about a feature

/note="text" /allele="text" /number=unquoted /citation=[number] /product="text" /codon=(seq:"text",aa:) /protein_id="" /codon_start=