Automatic annotation

The automatic annotation of EST-based gene was performed in a similarity-based manner. Each EST-based gene was assigned with the function of its best homolog found in sequence database search. The best homolog of each EST-based gene was determined by using NCBI BLASTP to search against the following types of annotated protein sequences downloaded from the UniProt database:

  1. Swiss-Prot Fungi
  2. Swiss-Prot All
  3. TrEMBL Fungi
  4. TrEMBL All

Once a gene was found to have the best hit in Swiss-Prot Fungi dataset, it would be annotated as the function of this hit and no further database search would be performed. If no significant hit could be found for a gene in the current search, other databases would be searched as in the order listed above until a hit was found or no more databases could be used.

Subsets of protein sequence databases containing more well -defined and -curated functional descriptions were prioritized with higher ranks in the similarity-based annotations. This approach aimed at assigning human understandable functions to genes, avoiding unclear descriptions such as predicted/hypothetical genes, or other ill-defined remarks that do not make much sense to biologists.

MIPS functional category and pathway assignment

>> MIPS functional category

Protein sequences that have been functionally categorized were downloaded from the Munich Information Center for Protein Sequences (MIPS). Then the homolog of each EST-based gene to the MIPS categorized proteins was determined by using NCBI BLASTP. The functional category of an EST-based gene is assigned with that of its best homolog in the MIPS categorized proteins.

>> Pathway assignment

We took advantage of KAAS to perform the pathway assignment to each EST-based gene. Putative protein sequences of EST-based genes were submitted to KAAS, which then returned the mapping between each gene and its corresponding entity in the KEGG pathways.

To assist users to explore differentially expressed genes in pathways, numbers of ESTs supporting each pathway entity could be painted on pathway diagrams (for an example see here). Each set of numbers next to each entity consists of 5 values, corresponding to the 5 different libraries [see Table 1] used in this project.