Answers to #NGSchool2018 registration quiz

Answers and explanations were prepared by G. Demidov on behalf of #NGSchool team. All the further questions and discussions can be directed here .

 

Which of the following is an example of a missense mutation?

1.    UAC to UAG
2.    AAA to UAA
3.    UGC to UCC
4.    UAA to UGA

Explanation: Missense mutation is the change in DNA codon resulting in change in encoded amino acid. You may check the correctness using the table of genetic code.

 

Is it possible to have two different SNVs at the same locus in the same sample, for example in sequenced tumors? 

1.    Yes
2.    No

Explanation: Somatic mutations may occur at the same position and, when we sequence a bulk of cells, we can see that part of our reads support one variant, part of reads – another, etc. It does not happen often since having a mutation at the same position is not a very likely event, however, it happens - the genome is long and the overall number of cells is huge. It can also be a technical issue caused by the alignment of short reads to the non-perfect reference in repetitive region.

 

You want to determine DNA 5mC (methylation status of cytosines) in a blood sample of a well described organism (say, mouse). You perform bisulfite sequencing of your sample. How many times do you have to align each resulting read to a reference genome(s) using three-letter alignment?

1.    2
2.    4
3.    8
4.    256

Explanation: The scheme of mapping is described eg at Fig 1A here. Each read is converted in 2 different ways and aligned to 2 converted reference genomes, using 4 instances of short-read aligner. The way reads are converted changes from aligner to aligner (but aligners, as specified above, are 3-letter, not wild card), but overall still 4 alignments of each read are required.

 

 

You are performing the analysis of cell-free DNA with ultra-deep sequencing and find that 3 out of 1000 reads, covering particular region of TP53, have an SNV. What it is likely NOT to be? Mark the wrong answer:

1.    Germline heterozygous variant
2.    Somatic benign variant
3.    Somatic tumorigenic variant
4.    Sequencing error

Explanation: Germline heterozygous variant is presented in half of copies of chromosomal DNA (if we talk about autosomes, but TP53 gene is located on 17th chromosome). So you would expect half of reads to support such variant. Probability of having only 3 out of 1000 reads supporting germline heterozygous variant may be estimated with Binomial test where number of trials is equal to 1000, probability of success (roughly) 0.5 and the number of successes = 3. In other words, you tossed a fair coin 1000 times and got heads only 3 times. Not really likely comparing to other options.

 

 

Low accuracy of Nanopore sequencing is caused by:

1.    Complicated sample preparation that can introduce changes to the input material
2.    DNA/RNA modifications
3.    Denaturation of the protein pores
4.    The complicated relationship between DNA/RNA sequence and ion current signal.
5.    Global warming conspiracy 

Explanation: All these factors play role in accuracy of Nanopore sequencing (and voodoo magic too…). However the most important is the 4th variant. You can find a detailed explanation (with pictures) here. The article is quite old and a lot of things changed since, but the signal from Nanopore is still very noisy.

 

Among sequencing platform listed below, for which one homopolymers cause the most problems? 

1.    10x Genomics linked reads
2.    NanoPore
3.    PacBio
4.    Sanger

Explanation: To come up with this question we basically asked several experts in sequencing who worked with different platforms. You can read a non-detailed comparison of platforms (with mentioning the non-random errors in homopolymers problem) here

 

 

A young person comes to you, a medical geneticist, and asks which analysis he/she should do in order to figure out if he/she already has cancer. The patient does not have any symptoms, neither the idea of which type of cancer he/she might have, but he/she looks to be extremely scared by the perspective of having cancer. What will you do?

1.    Refuse to do any analysis given that his/her medical history does not contain anything associated with elevated cancer risks
2.    Send the patient for a cheap array-based analysis to assess the general risks 
3.    Sequence panel of cancer drivers since it is cheap and helps to find variants associated with elevated risk of cancer in most of the cases
4.    Perform whole genome NGS analysis of blood sample to detect not only coding, but also non-coding cancer risk mutations
5.    Prescribe homeopathy to the patient and his family to reduce anxiety level
6.    Perform tumor markers’ levels analysis in her/his blood/urine sample 

Explanation: Well, variants 2, 3, 4 are not really useful because the person does not want to know if (s)he has a predisposition for cancer, (s)he tries to find out if (s)he has cancer NOW and has no idea which cancer (s)he has so it is not really possible to take a biopsy and investigate somatic mutations. Tumor marker analysis is also a low accuracy tool when there is no specific “cancer” to test for. In general the only right solution here would be to send the person home (prescription of homeopathy is funny and technically could help to reduce the anxiety, but we have a principal position against pseudo-science so this variant was ideologically wrong). If you want to learn more you can read eg, this web page , “Although tumor markers are extremely useful in determining whether a tumor is responding to treatment or assessing whether it has recurred, no tumor marker identified to date is sufficiently sensitive or specific to be used on its own to screen for cancer”.

 

You want to confirm the diagnosis for a patient with preliminary diagnosis “cystic fibrosis” (CF). Your budget is limited to $1,000 by the insurance company which finances the analysis, therefore you may choose only one of the following analysis. Which experiment will you perform? 

1.    Nanopore low-coverage WGS because having information about whole genome is always better and preliminary diagnosis may be not accurate
2.    Illumina 100x sequencing of the CFTR gene since most of the cystic fibrosis cases were associated with single nucleotide variants (SNVs) in this gene
3.    Ultra high-density microarray (such as Affymetrix Whole Genome SNP array 6.0) since this disease may be caused by copy number variations (CNVs)
4.    Reduced representation bisulfite sequencing (RRBS), since methylation status is important in many diseases and RRBS is focused only on important regions

Explanation: CFTR is the gene responsible for cystic fibrosis and it is mostly caused by point mutations and short indels and in less than 10% of cases – exonic-level CNVs, and both types of variation has to be investigated. So any variant except 2 will not allow you to do that (nanopore has low accuracy especially for low coverage case, microarrays will not let you to investigate SNVs and RRBS is focused on CpG-rich regions - and CFTR gene is not fully covered by RRBS, and overall doing variant calling from bisulphite sequencing is not really accurate).

 

You have a Nanopore sequencer and want to produce reads as long as possible. What will prevent you from achieving the goal? (Self-hybridization of DNA can be neglected)

1.    Presence of repeats in your DNA
2.    The fact that quality of base calling is degrading with the length
3.    Too narrow pipette tips 
4.    Presence of short homopolymers (2-10 bps) in DNA you sequence

Explanation: If your pipette tip is not wide enough, you will break your DNA into small pieces even before you start sequencing (eg, here, “Whenever possible, DNA was handled with a wide-bore, low-bind pipette tip.”). Other aspects may play some role, but they are unavoidable and do not matter if you did not take variant 3) into account in the beginning.