Gene-gene interaction analysis method for rare variants from NGS high-throughput sequencing data
With the rapid advancement of array-based genotyping techniques, genome-wide association studies (GWAS) have successfully identified common genetic variants associated with common complex diseases. However, it has been shown that only a small proportion of the genetic etiology of complex diseases could be explained by the genetic factors identified from GWAS. This missing heritability could possibly be explained by gene-gene interaction (epistasis) and rare variants. There has been an exponential growth of gene-gene interaction analysis for common variants in terms of methodological developments and practical applications. Also, the recent advancement of high-throughput sequencing technologies makes it possible to conduct rare variant analysis. However, little progress has been made in gene-gene interaction analysis for rare variants.
GxGrare is a new gene-gene interaction method for the rare variants in the framework of the multifactor dimensionality reduction (MDR) analysis. GxGrare consists of three steps; 1) collapsing the rare variants, 2) MDR analysis for the collapsed rare variants, and 3) detect top candidate interaction pairs. The first step is to collapse the rare variants according to their biological characteristics such as allele frequency or functional regions; this step utilizes known biological information to redefine the given genotypes to a more biologically meaningful categorical variable. An example would be a gene having no exonic rare variants given a value close to 0, and 1 otherwise, since non-exonic variants have weak or no effect on the function of a gene. The second step is to perform MDR analysis for the collapsed rare variants. The last is to use several evaluation measures to detect top candidate interaction pairs. GxGrare can be used for the detection of not only gene-gene interactions, but also interactions within a single gene.
Weight | Effect model | Conditions | data | |
---|---|---|---|---|
simulation 1 | No weight | only interaction effect | unidirectional | sim1.tar.gz |
simulation 2 | No weight | Interaction + marginal effect | unidirectional | sim2.tar.gz |
simulation 3 | No weight | only interaction effect | Bidirectional | sim3.tar.gz |
simulation 4 | MAF weight | only interaction effect | unidirectional | sim4.tar.gz |
simulation 5 | Conservation weight (0.5) | only interaction effect | unidirectional | sim5.tar.gz |
simulation 6 | Conservation weight (0.8) | only interaction effect | unidirectional | sim6.tar.gz |
simulation 7 | Conservation weight (1.0) | only interaction effect | unidirectional | sim7.tar.gz |
simulation 8 | Conservation weight (0.5) | Interaction + marginal effect | unidirectional | sim8.tar.gz |
simulation 9 | Conservation weight (7.5) | Interaction + marginal effect | unidirectional | sim9.tar.gz |
simulation 10 | Conservation weight (1.0) | Interaction + marginal effect | unidirectional | sim10.tar.gz |
Usage: Run GxGrare gxgrare --in [input file] --out [output file] --score [score file] --perm [permutation number] ex) ./gxgrare --in example_genotype.txt --score example_score.txt --out example_result.csv --perm 1000 Parameter --in : input genotype file path + name --score : score file path + name --out : result file path + name --perm : permuation number Input genotype file format (tab-delimited file) : the first column has phenotype class (0:case and 1:control). [example_genotype.txt] ------------------------------------ pheno SNP1 SNP2 SNP3 SNP4 1 1 0 0 2 1 0 2 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 ------------------------------------ Score file format: the score file has the effect scores (0.0~1.0) for each SNP. [example_score.txt] ------------------------------------ score 0.2 0.68 0.23 0.124 ------------------------------------ Output result format (comma-delimited file) : [example_result.csv] - MDRcol_MAF(IG) : permuted p-value of information gain (IG) using MAF-based collapsing - MDRcol_MAF(BA) : permuted p-value of balanced accuracy (BA) using MAF-based collapsing - MDRcol_func(IG) : permuted p-value of IG using functional region-based collapsing - MDRcol_func(BA) : permuted p-value of BA using functional region-based collapsing - MDRcol_effect(IG) : permuted p-value of IG using effect-based collapsing - MDRcol_effect(BA) : permuted p-value of BA using effect-based collapsing