Genetic Sequence Data Scale

mmarti5 · March 10, 2011, 1:11am

As part of the enhanced swine influenza surveillance program, the National Animal Health Laboratory Network may need to message actual base sequences from various Influenza A genes. Everything I read about genetic LOINC codes relates to tests for specific mutations, specific genetic patterns, etc. I’m not completely sure even what to look for as the scale. Is this nominal since there is a finite (huge but finite) number of possible base sequences to choose from? Or is it narrative because in database terms it is just a string of characters (only 4 characters, but still).

ps. I wasn’t even sure which forum area this belonged in.

Mike

pdbanning · March 10, 2011, 6:11pm

Hi Mike, This is a very good question. With a sample answer being a “huge, but finite” list of 4 character text abbreviations, we’re able to exclude from scale: QN (only numerics and titers); ORDQN (reserved for micro susceptibilities); ORD (ranked ordinal, where one possible answers is someone linked to another possible…positive/negative, 1+, 2+, 3+, 4+). That leaves NOM (non-ordinal nominal values: yellow, clear, turbid… one answer doesn’t rank itself amongst the others) and NAR (paragraph style narratives containing a patient answer.)

The Genetic Sequence of Influenza A organisms can be drawn into an analogy of “here’s a substance, report what is found”. Same analogy can apply for human genetic testing, urine drug screens or blood bank antibody screens. The NOM (nominal) scale would best suit a possible list of answers like different H1 strains, base sequences, individual drugs detected, individual antibodies detected, particularly suited with the PRID property. The list within one study can be short or long, but it’s not constructed in sentences/paragraphs as a Narrative is.

Please write or call if this doesn’t help.

mmarti5 · March 14, 2011, 3:48pm

I know the basic difference between NAR and NOM. But I can’t tell if there are any existing LOINC codes where the actual sequence “ACGTCGTAACTGGCTAC…ACGT” is the “answer.” I confused things by pointing out that the vocabulary of this “narrative” is limited to four characters.

What I really need is how to request a code for

Influenza A Hemagglutinin cDNA genetic base sequence

Influenza A Hemagglutinin cDNA:PRID:Isolate:Pt:?:Amplification/Sequencing (I’m assuming Prid is right.)

with the result being a base sequence.

I’m not sure 49532-5 Influenza virus A hemagglutinin cDNA [Identifier] in Isolate by Amplification/Sequencing is what I’m looking for because I suspect the “answer” is a coded value for the strain.

How do I know for sure?

Thanks

rmerrick · March 15, 2011, 3:42pm

Mike and Pam,

this is an excellent discussion and hopefully we can make a decision together.

The Public Health Laboratory Interoperability Project (PHLIP), a collaboration between CDC and APHL working on all use cases related to lab data exchange in Public Health laboratories, is using the code Mike mentions to identify the Flu Strain - so he is correct in that respect.

We are also currently working on properly coding pyrosequencing, though again that is done using marker specific targets, though we do want to report the nucleotide substitution as well as the resulting amino acid change.

We are also considering using NAR scale, just because a new combination might come up and so “free text” might need to be allowed and is NOT coded but string in the database, though it is not, as Pam points out a whole paragraph.

However for human genomics 48013-7 Genomic reference sequence [Identifier] in Blood or Tissue by Molecular genetics method is using NOM scale, but they are also pointing out: “For this ID use either the NCBI genomic nucleotide RefSeq IDs with their version number (see: NCBI.NLM.NIH.Gov/RefSeq) or use the LRG identifiers, without transcript (t or p) extensions – when they become available. (See- Report sponsored by GEN2PHEN at the European Bioinformatics Institute at Hinxton UK April 24-25, 2008).”

Now this is the reference sequence, but if I understand Mike correctly he wants to report the mutation, if any is observed.

Again in human genomics I think 62356-1 Chromosome analysis result in ISCN expression in Blood or Tissue by Molecular genetics method with NOM scale would be used for that, where the chromosome analysis result, is expressed using the International System for Human Cytogenetics Nomenclature (ISCN).

Do we have something similar for micro genetic testing?

mmarti5 · March 21, 2011, 2:46pm

Let me throw one more complication into this discussion and hope we’ll get some more input.

What if instead of returning the sequence of bases, we instead return the GenBank key for the sequence? That is more clearly a NOM, but what would differentiate it from strain-typing by sequencing. 49532-5 (FLUAV HA cDNA Islt Amp/Seq) is I think intended for strain-typing but the LOINC parts are pretty much what I would have put for sequencing reported as a GenBank key.

mmarti5 · April 2, 2012, 8:33pm

A recent Nature article points to why we want to be able to order and result actual sequences for key parts of the influenza virus isolated from various species.

http://www.nature.com/news/flu-surveillance-lacking-1.10301

An example of the way these would end up can be seen in the way the GenBank data are queried using NCBI

http://www.ncbi.nlm.nih.gov/genomes/FLU/

The thing that makes our use case different from the past is that it is no longer just the NVSL national lab at Ames doing the sequencing but also the NAHLN labs they “contract out” to. So we need a way to order and result these “Please sequence that virus” or “Please sequence the HA gene of that virus” tests.

Mike