Tuesday, 8 January 2008

bioinformatics - Any tool to align whole genome sequence data to another genome and give exon regions a higher mark?

If you are not trying to assemble but just to align each read to the genome, you can use exonerate. On a Unix/Linux platform, once you have installed it run something like:



exonerate -m genome2genome WGS.fasta genome.fasta > out.txt 


From the exonerate manual:



          genome2genome
This model is similar to the cod‐
ing2coding model, except introns are
modelled on both sequences. (not work‐
ing well yet)


What I would recommend though, is to align against a reference cDNA dataset, not the whole genome. In that case, you should use this instead:



exonerate -m cdna2genome genome_cdna.fasta WGS.fasta > out.txt 


From the exonerate manual:



          cdna2genome
This combines properties of the
est2genome and coding2genome models, to
allow modeling of an whole cDNA where a
central coding region can be flanked by
non-coding UTRs. When the CDS start
and end is known it may be specified
using the --annotation option (see
below) to permit only the correct cod‐
ing region to appear in the alignemnt.

No comments:

Post a Comment