Features could be repeats, genes, promoters, protein domainsâ¦â¦.. ⢠Features can be linked to other databases e.g.. Pfam/Pubmed. AG-ICB-USP ...
Anotação automática de sequências biológicas: ontologias e sistemas de pipelines
Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP
Sequence annotation •
Annotation is the process information to a DNA sequence.
of
adding
•
The information usually has DNA coordinate.
•
Features could be repeats, genes, promoters, protein domains……..
•
Features can be linked to other databases e.g. Pfam/Pubmed
AG-ICB-USP
Public databases • •
GenBank, EMBL and DDBJ. All databases update each other automatically
AG-ICB-USP
Feature table •
http://www.ncbi.nlm.nih.gov/projects/collab/FT/ •
Format definition
•
Covers DDBJ/EMBL/GenBank
•
Defines all accepted annotation terms and hierarchy
AG-ICB-USP
Annotation file Contains: • A header with: • • • • •
Information about the sequence Organism Authors References Comments
• A feature table containing
• Sequence features and co-ordinates
AG-ICB-USP
Header (EMBL) ID PFMAL1P4 standard; DNA; INV; 66441 BP. XX AC AL031747; XX SV AL031747.8 XX DT 24-SEP-1998 (Rel. 57, Created) DT 27-APR-2000 (Rel. 63, Last updated, Version 13) XX DE Plasmodium falciparum DNA from MAL1P4 XX KW HTG; rifin; telomere; var; var-like hypothetical protein. XX OS Plasmodium falciparum (malaria parasite P. falciparum) OC Eukaryota; Alveolata; Apicomplexa; Haemosporida; Plasmodium. XX RN [1] RA Oliver K., Bowman S., Churcher C., Harris B., Harris D., Lawson D., RA Quail M., Rajandream M., Barrell B.; RT ; RL Submitted (24-SEP-1998) to the EMBL/GenBank/DDBJ databases. RL P.falciparum Genome Sequencing Consortium, The
AG-ICB-USP
NCBI Header LOCUS PFMAL1P4 66442 bp DNA linear INV 02-DEC-2004 DEFINITION Plasmodium falciparum DNA from MAL1P4, complete sequence. ACCESSION AL031747 AL844501 VERSION AL031747.9 GI:23477012 KEYWORDS HTG; rifin; telomere; var; var-like hypothetical protein. SOURCE Plasmodium falciparum 3D7 ORGANISM Plasmodium falciparum 3D7 Eukaryota; Alveolata; Apicomplexa; Haemosporida; Plasmodium. REFERENCE 1 AUTHORS Hall,N., Pain,A., Berriman,M., Churcher,C., Harris,B., Harris,D., TITLE Sequence of Plasmodium falciparum chromosomes 1, 3-9 and 13 JOURNAL Nature 419 (6906), 527-531 (2002) PUBMED 12368867 REFERENCE 2 AUTHORS Oliver,K., Pain,A., Berriman,M., Bowman,S., Churcher,C., Harris,B., Harris,D., Lawson,D., Quail,M., Rajandream,M., Hall,N. and Barrell,B. TITLE Direct Submission JOURNAL Submitted (24-SEP-1998) P.falciparum Genome Sequencing Consortium, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK COMMENT On Oct 2, 2002 this sequence version replaced gi:7670004. For more information about this sequence or the Malaria Project, AG-ICB-USP
Feature •
Region of DNA that was annotated with a key/qualifier •
Keys: CDS, intron, miscellaneous, etc.
•
Qualifier: notes or extra-information about a feature i.e. exon (key) /gene=“adh” (qualifier)
AG-ICB-USP
Feature keys attenuator misc_difference misc_feature C_region CAAT_signal misc_recomb misc_RNA CDS misc_signal conflict misc_structure D-loop D_segment modified_base mRNA enhancer N_region exon old_sequence GC_signal polyA_signal gene polyA_site iDNA precursor_RNA intron prim_transcript J_segment LTR mat_peptide misc_binding
primer_bind promoter protein_bind RBS repeat_region repeat_unit rep_origin rRNA S_region satellite scRNA sig_peptide snRNA snoRNA source stem_loop STS TATA_signal terminator
transit_peptide tRNA unsure V_region V_segment variation 3'clip 3'UTR 5'clip 5'UTR -10_signal -35_signal
AG-ICB-USP
Feature qualifier Additional information about a feature
/note="text" /allele="text" /number=unquoted /citation=[number] /product="text" /codon=(seq:"text",aa:) /protein_id="" /codon_start=