I have copied and pasted the header from my CGI whole genome data below, and then pasted two lines of data below it for an example. I am trying to evaluated variant call quality scores from my data, so I have been reviewing three items from the header -
For "varQuality", I have rarely, if ever, found a descriptor other than "VQHIGH", so it seems that term is not particularly useful in evaluating the quality of a variant call. As you can see from the examples below, "varScoreVAF" and "varScoreEAF" are often identical numerical scores, but just as often not identical scores. The two examples below also demonstrate the very large range of numerical values for these scores. For the untrained, the intuitive guess is that higher scores indicate greater confidence in the variant call, but I have no idea if that is correct. IF that is true, what kind of numerical threshold might be reasonable for evaluating the reliability of any given variant call?? In the examples below, we see a SNP with scores of 91 & 13, and another SNP with scores of 1883 & 1883. That's a very large difference in scores.
Any answers or ideas greatly appreciated. Thanks!!
#COSMIC COSMIC v48
#DBSNP_BUILD dbSNP build 132
#GENOME_REFERENCE NCBI build 37
#GENERATED_AT 2013-Nov-07 19:21:12.293299
locus ploidy allele chromosome begin end varType reference alleleSeq varScoreVAF
varScoreEAF varQuality hapLink xRef
5022567 2 2 chr5 226452 226453 snp G A 1883 1883 VQHIGH dbsnp.116:rs6555059
5023182 2 2 chr5 254860 254861 snp C G 91 13 VQHIGH dbsnp.98:rs2241600
indent preformatted text by 4 spaces