Syntenic Blocks in ancestral species
The Genomicus browser displays (when possible) the predicted order of genes in ancestral species. The method used to predict this order is briefly described here, and in more details in a poster presented at the Cold Spring Harbor meeting on Genome Informatics in october 2009. The method is described in full in a manuscript in preparation.
- A pairwise comparison between ALL available species is performed to identify pairwise synteny blocs. Two consecutive genes A1 and B1 in species 1 will belong to a syntenic block with their respective orthologs A2 and B2 in species 2 if A2 and B2 are also consecutive and in the same respective orientation as A1 and B1. This definition is applied strictly for any number of consecutive genes.
- All pairwise syntenic blocs are compared and when two such blocks overlap without any inconsistencies, the two are merged into a larger block.
- Merged blocks represent the ancestral gene order in the common ancestor of those extant species that contributed pairwise syntenic blocs.
Because the definition of pairwise syntenic blocks is very strict, it is assumed that this order reflects accurately the order and orientation of genes in their last common ancestor. Merging pairwise syntenic blocks solves the problem of gene losses or duplications in terminal branches of the tree that disrupt the above definition.
Conserved Non-coding Elements (CNEs)
CNEs were computed from multiple alignments between 46 vertebrate genomes projected on the human genome, generated using multiz and other tools by the UCSC and Penn State Bioinformatics groups, and made available on the UCSC web site.
To identify CNEs, multiple alignments are scanned using a window of fixed size W, and windows that show more than D percent of columns where all bases are strictly identical in the species considered (straight columns) are selected. Next, each selected window is considered an "anchor" that will be extended on either side until a position is reached where at least X consecutive columns are not straight. The extension then stops at the last straight column.
The current CNE set was generated with three levels of conservation and using the following parameters:
Set | Species | W | D | X | Color |
---|---|---|---|---|---|
Set 1 | human + mouse + dog + cow | 20 | 80 | 2 | green |
Set 2 | human + mouse + dog + cow + chicken | 20 | 70 | 2 | red |
Set 3 | human + mouse + dog + cow + chicken + zebrafish | 20 | 60 | 2 | blue |
CNEs are excluded from regions overlapping protein coding sequences in all of the species considered. By convention, intronic CNEs are displayed on the right-hand side of the gene in which they are included (regardless of the transcription orientation of the gene). Intronic CNEs are shown as small vertical bars and intergenic CNEs as circles.