Wednesday, 26 August 2015

bioinformatics - Using the IMGT/GENE-DB service to find RSS

I'm trying to get the data for the Human and Mouse 12 and 23 Recomination
Signal Sequences (RSS), to run a classification algorithm on it. I'm not a
biologist, so I apologise in advance for my misunderstandings and
confusion.



A version of the data is available here, but I thought I would try to
get it from www.imgt.org, if possible. There is also another slightly
different version available for the mouse here.



I'm trying to follow the instructions at IMGT-FAQ to obtain Recombination Signal
Sequences for the mouse.



Here is what I have selected at the search page:



Identification:
Species : Mus Musculus
GeneType: any
Functionality: functional
MolecularComponent: any
Clone name: <blank>

IMGT group: IGHV
IMGT subgroup: any
IMGT gene: <blank>


I'm not clear what "Locus", "Main locus", and
"IGMT group" mean here exactly. Specifically, what is the difference
between "Locus" and "Main locus"?



I think, but am not sure, that IGHV corresponds to V genes in the
Immunoglobulin heavy locus (IGH@) on chromosome 14, where locus here
denotes collections of genes. Clarifications and corrections appreciated.



I would have expected that the IGH locus would correspond to "IMGT group"
entries like "IGHJ, IGHV" etc, and the IGK locus would correspond to IMGT
group entries like "IGK, IGKJ, IGKV", but no matter what I select for
Locus, it does not change the possible entries for "IMGT group".



Running the search gives



Number of resulting genes : 218
Number of resulting alleles : 350



As instructed, I went to the bottom, selected "Select all genes", clicked
on "Choose label(s) for extraction", and selected "V-RS".



I got




Number of results=98




The first few results were



>X02459|IGHV1-4*02|Mus musculus_BALB/c|F|V-RS|395..432|38 nt|NR| | | | 
|38+0=38| | |
cacagtggtgcaaccacatcccgactgtgtcagaaacc

>X02064|IGHV1-54*02|Mus musculus|F|V-RS|295..332|38 nt|NR| | | | |38+0=38|
| |
cacagtgttgcaaccacatcctgagtgtgtcagaaatc

>M34978|IGHV1-58*02|Mus musculus_A/J|P|V-RS|554..560|7 nt|NR| | | |
|7+0=7|partial in 3'| |
cacagtg


Ok, now I'm confused. The lengths of the RSS should be 28 or 39. but I
counted lengths of 4,7, 31, 38, and 39. Are the results here not supposed
to contain the 12 and 23 RSS?



So, I must be misunderstanding things here. Possibly many things. Any
explanations and clarifications are appreciated.

No comments:

Post a Comment