# Abstract

To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham’s model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham’s approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham’s approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

This work has been supported by the Research Training School on Statistical Modelling of the German Research Foundation (DFG). Katja Ickstadt has also been supported by the DFG within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis,” project C4. Holger Schwender has been supported by the Deutsche Forschungsgemeinschaft for grant SCHW 1508/3-1. Magdalena Malina was supported by the Budget of Ministry of Science and Higher Education, Republic of Poland as a research project No. N N201 412539, and by the Vienna Science and Technology Fund (WWTF) through project MA09-007a.

## Appendix A

### Proof, that the non-centrality parameter *λ*_{CF} is equal to the non-centrality parameter *λ*_{LR}

_{CF}

_{LR}

**Notation:** Consider a model

where ** X**=

*X*

_{n}_{×}

*is the design matrix of rank*

_{t}*t*and

**=(**

*ζ**ζ*

_{1},

*ζ*

_{2}, …,

*ζ*is a vector of model’s parameters. Let

_{t})^{T}**=**

*μ***,**

*Xζ**V*denote a subspace of

**matrix and**

*X**W*⊂

*V*denote a

*k*-dimensional subspace of

*k*<

*t*. By

*P*we mean an orthogonal projection of

_{V}y*y*on

*V*, and by

*P*the projection matrix. Let

_{V}*V*|

*W*=

*V*⌒

*W*

^{⊥}={

*v*∈

*V:v*⊥

*W*}. Consider testing the hypothesis

*H*

_{0}:

*μ*∈

*W*against

*H*

_{1}:

*μ*∉

*W*in the model (A.1). Then the test statistic

*F*for testing

*H*

_{0}can be expressed as

where

is the maximum likelihood estimate of*ζ*and is an unbiased estimator of σ

^{2}of the form

calculated in the model (A.1). The non-centrality parameter can be expressed as (see Theorem 7.4, Arnold, 1981)

The desired equality of noncentrality parameters can be proved in a more general case. Assume that the relationship between the dependent variable *Y*∈*R** ^{n}* and explanatory variables

*x*_{1},

*x*_{2}, …,

*x**, where*

_{t}

*x**=(*

_{j}*x*

_{1}

_{j}, x_{2}

*, …,*

_{j}*x*=1, 2, …,

_{nj})^{T}, j*t*, can be described by the following model

where *ε*=(*ε*_{1},…, *ε _{n})^{T}, ε*~

*N*(

**0**, σ

^{2}

**I**

_{n}_{×}

*),*

_{n}

*X**is a full rank*

_{c}*n*×(

*t*+1) -matrix of the form

*X**=(*

_{c}**1**,

*x*_{1},

*x*_{2}, …,

*x**) and*

_{t}

*ζ**is a (*

_{c}*t*+1) – dimensional vector of model’s parameters. Let

*R**spanned by the columns of*

^{n}

*X**.*

_{c}Now let us consider an *n*×2- matrix *X*_{n}_{×}_{2}=(**1**, *X*_{1}), *X*_{1}=(*X*_{11}, *X*_{21}, …, *X _{n}*

_{1})

*, such that*

^{T}**=**

*μ***∈**

*X*_{c}ζ_{c}*lin*{

**1**,

*X*

_{1}}, so as the model (3) can be also written in the form

where *ζ*=(*β*_{0}, *β*_{1})* ^{T}*. Let

*V*denote a subspace of

*R**spanned by the columns of*

^{n}**. Consider the testing problem**

*X*where *W* is a linear subspace of *R** ^{n}* spanned by

**1**. Then the non-centrality parameters of

*F*test for testing (A.5) in (A.3) and in (A.4) are equal.

Indeed, let *λ _{c}* denote the non-centrality parameter for testing problem (A.5) in the model (A.3) and let

*λ*denote the non-centrality parameter in testing problem (A.5) in the model (A.4). Then

and

Since

and*W*⊂

*V*, then by projection properties

and

Moreover, by the definition of *μ* we have that in (A.3)

and hence

Similarly in (A.4)*μ*=** Xζ**∈

*lin*[

**1**,

*X*

_{1}]=

*V*,

and hence *P _{V}μ*=

*μ*. Then clearly

*λ*=

*λ*.

_{c}## Appendix B

### Derivation of the formula for logic regression model under Cockerham’s coding

We start from the model (1), where *n* is the number of individuals selected from an intercross *F*2 population and *D _{j}* denotes the binary vector of the “dominant” values for all individuals at

*jth*marker. To rewrite the model in a Cockerham’s coding, let

*v*

_{1}=(1, 1, 1)

^{T}, v_{2}=(0, 1, 1)

^{T}, v_{3}=(0, 0, 1)

*and*

^{T}**v**=(

*v*

_{1},

*v*

_{2},

*v*

_{3}). Similarly, let

*w*

_{1}=(1, 1, 1)

^{T}, w_{2}=(–1, 0, 1)

^{T}, w_{3}=(–1/2, 1/2, –1/2)

*and*

^{T}**w**=(

*w*

_{1},

*w*

_{2},

*w*

_{3}). We may then define

*f*(**w**)=(*f*_{1}(**w**), *f*_{2}(**w**), *f*_{3}(**w**))=**v**,

where *f*_{1}(**w**)=*w*_{1},

Then ∀*j*∈{1, 2, …, *m*}

Then the considered model (1) can be rewritten as:

which is exactly equal to (5).

### Power to detect an individual interaction in the Cockerham’s model

Let

denote the design matrix for the true GLM model (5), i.e.,and let

be the corresponding vector of parametersAssume that we are performing an F-test for testing *H*_{0}:*η*(*k, l*)=0 against *H*_{1}:*η*(*k, l*)≠0 within a class of models

*k, l*∈{1, 2, …,

*m*} and

*k*<

*l*. Let be the design matrix for the considered model, and denote the corresponding hat-operator. Let be the all-one matrix, and be the identity (

*n*×

*n*)-matrix. By the Cochran’s theorem the distribution of the F-statistic can be calculated as (

*n*–2) times the ratio of two independent noncentral chi-square distributions given by

and

For each of the remaining three interaction terms *I _{k, l}*∈{

*U*} the corresponding

_{k}Z_{t}, Z_{k}U_{l}, Z_{k}Z_{l}*F*test for testing

*H*

_{0}:

*η*

_{(}

_{k, l}_{)}=0 against

*H*

_{1}:

*η*

_{(}

_{k, l}_{)}≠0 is performed within the class of models

respectively, where *k, l*∈{1, 2, …, *m*}, *k*<*l*. Then, in the above procedure the design matrix

Similarly as for *U _{k}U_{l}*, for each

*I*∈{

_{k, l}*U*} the power of the coresponding F-test was calculated empirically, based on 10,000 realizations of the ratios of two independent noncentral χ

_{k}Z_{l}, Z_{k}U_{l}, Z_{k}Z_{l}^{2}random variables. This power for each

*I*was compared with the power obtained in (4). It can be clearly seen that the interaction

_{k, l}*U*analyzed in Section (3.2), as compared with the other three possible interactions, is detected with the largest power for each value of

_{k}U_{l}*β*

_{1}(Figure 3). All the four powers as functions of

*β*

_{1}are substantially smaller than the power to detect a single interaction within the properly defined logic regression model. Obviously, the average power to detect a single interaction within the Cockerham’s genetic model, calculated as a mean of the four powers for each

*β*

_{1}is also substantially smaller than the power to detect a single interaction within the logic regression model (Figure 4).

### Figure 3

### Figure 4

## Appendix C: Two step testing procedures in the Cockerham’s model

### Classical approach to test for epistasis

Classical methods of localizing genes are aimed at locating genes with significant main effects and often do not allow the location of QTL if there are no main effects but there are epistatic interactions with other QTL. The common practice is to estimate epistatic effects only at those positions for which main effects have been found. Such an approach can be described by the two step testing procedure.

Assume that the model (1) holds. In the first step of the procedure one would test *m* hypotheses

*H*_{0,}* _{j}:α_{j}*=

*δ*=0

_{j}against

*H*_{1,}* _{j}:α_{j} or δ_{j} or both are different from* 0

within the class of models

*M _{j}:Y*=

*μ*+

_{j}*α*+

_{j}U_{j}*δ*+

_{j}Z_{j}*ε*

^{(}

^{j}^{)},

where *j*∈{1, 2, …*m*}. Adjusted significance level for a single test in the first step is then equal to

Let

denote the set of indices of genes for which the hypotheses*H*

_{0,}

*were rejected. In the second step of the procedure, for each pair of genes with indeces*

_{j}*k*and

*l*, respectively, such that and

*k*<

*l*, one tests

against

*H*_{1, (}*k, l*_{)}: not all *γ*_{(k, l)}’s are equal to 0

within the class

In order to control the FWER at the fixed level *α*, in the second step one needs to apply a significance level of

For the first step of the procedure one tests within the misspecified model, hence the test statistic for *H*_{0, }* _{j}* has the distribution of (

*n*–2) times the ratio of two independent noncentral

*χ*

^{2}distributions. The power

*P*of the first step test is calculated empirically based on 10,000 independent realizations of noncentral

_{I}*χ*

^{2}distributions defined by (B.1) and (B.2) respectively (Appendix B).

For the second step, the *F* statistic for testing *H*_{0, (}_{j, k}_{)} has the

For both steps of the procedure resulting powers as functions of *β*_{1} are compared with the power obtained by logic regression for *m*=200, *n*=1000, *α*=0.05 and *σ*=1.0 (Figure 5). The power for the first step exceeds very slightly the power for logic regression if the effect size *β*_{1} is very small, and is lower than the power for logic regression for larger values of *β*_{1}. The power for the second step of the procedure is substantially smaller than the power for logic regression at the whole range of *β*_{1}. Hence, the resulting power for the two step procedure will also be substantially smaller than the power to detect an interaction within the logic regression model. The relationship between these three powers remains the same for different values of *m* and *n* (Table 10).

### Figure 5

### Table 10

β_{1}=0.5 | β_{1}=1.0 | β_{1}=1.5 | |||
---|---|---|---|---|---|

m=50 | n=500 | P_{LR} | 0.756 | 1.000 | 1.000 |

P_{I} | 0.012 | 0.333 | 0.893 | ||

P_{II} | 0.000 | 0.007 | 0.110 | ||

m=100 | n=200 | P_{LR} | 0.051 | 0.960 | 0.999 |

P_{I} | 0.012 | 0.333 | 0.893 | ||

P_{II} | 0.000 | 0.007 | 0.110 | ||

n=500 | P_{LR} | 0.658 | 1.000 | 1.000 | |

P_{I} | 0.410 | 0.999 | 1.000 | ||

P_{II} | 0.002 | 0.166 | 0.848 | ||

n=1000 | P_{LR} | 0.996 | 1.000 | 1.000 | |

P_{I} | 0.843 | 1.000 | 1.000 | ||

P_{II} | 0.017 | 0.757 | 0.999 | ||

n=2000 | P_{LR} | 1.000 | 1.000 | 1.000 | |

P_{I} | 0.996 | 1.000 | 1.000 | ||

P_{II} | 0.159 | 0.999 | 1.000 | ||

m=200 | n=500 | P_{LR} | 0.555 | 1.000 | 1.000 |

P_{I} | 0.843 | 1.000 | 1.000 | ||

P_{II} | 0.017 | 0.757 | 0.999 |

*P _{I}* denotes the power of the first step of the procedure from Appendix C, calculated empirically based on 10,000 independent realizations of noncentral

*χ*

^{2}distributions defined by (B.1) and (B.2) respectively.

*P*denotes the power of the second step, calculated as in (C.1).

_{II}### References

Arnold, S. F. (1981): The theory of linear models and multivariate analysis, John Wiley & Sons: New York, pp. 79–82.Search in Google Scholar

Baierl, A., M. Bogdan, F. Frommlet and A. Futschik (2006): “On locating multiple interacting quantitative trait loci in intercross designs,” Genetics, 173, 1693–1703.Search in Google Scholar

Ball, R. D. (2001): “Bayesian methods for quantitative trait loci mapping based on model selection: Approximate analysis using the Bayesian information criterion,” Genetics, 159, 1351–1364.Search in Google Scholar

Bateson, W. and G. Mendel (1909): Mendel’s principles of heredity, Cambridge University Press: New York, G.P. Putnam’s Sons.Search in Google Scholar

Bogdan, M., F. Frommlet, P. Biecek, R. Cheng, J. K. Ghosh and R. Doerge (2008): “Extending the modified Bayesian information criterion (mbic) to dense markers and multiple interval mapping,” Biometrics, 64, 1162–1169, URL Search in Google Scholar

Boulesteix, A. L., A. L. Strobl, S. Weidinger and W. Wichmann, H. E. (2007): “Multiple testing for snp-snp interactions,” Statist. Appl. Gen. Mol. Biol., 6, 1544–6115.Search in Google Scholar

Breiman, L. (1996): “Bagging predictors,” Mach. Learn., 26, 123–140.Search in Google Scholar

Breiman, L. (2001): “Random forests,” Mach. Learn., 45, 5–32.Search in Google Scholar

Breiman, L., J. H. Friedman, R. A. Olshen and C. J. Stone (1984): Classification and regression trees, Belmont, CA: Wadsworth.Search in Google Scholar

Broman, K. W. and S. C. G. Wu, H. Sen (2003): “R/qtl: Qtl mapping in experimental crosses,” Bioinformatics, 19, 889–890.Search in Google Scholar

Carlborg, O. and C. S. Haley (2004): “Epistasis: too often neglected in complex trait studies?” Nat. Rev. Genet., 5, 618–625.Search in Google Scholar

Chen, C., H. Schwender, J. Keith, R. Nunkesser, K. Mengersen and P. Macrossan (2011): “Methods for identifying snp interactions: a review on variations of logic regression, random forest and bayesian logistic regression,” Comput. Biol. and Bioinf., 8, 1580–1591.Search in Google Scholar

Chen, Z. and J. Liu (2009): “Mixture generalized linear models for multiple interval mapping of quantitative trait loci in experimental crosses,” Biometrics, 65, 470–477.Search in Google Scholar

Clayton, D. G. (2009): “Prediction and interaction in complex disease genetics: Experience in type 1 diabetes,” PLoS Genet, 5, e1000540, URL .Search in Google Scholar

Cockerham, C. C. (1954): “An extension of the concept of partitioning hereditary variance for analysis of covariances among relatives when epistasis is present,” Genetics, 39, 859–882.Search in Google Scholar

Coffman, C., R. W. Doerge, K. Simonsen, K. Nichols, C. Duarte, R. Wolfinger, and L. M. McIntyre (2005): “An effective model selection strategy for detecting multiple qtl,” Genetics, 170, 1281–1297.Search in Google Scholar

Cordell, H. J. (2002): “Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans,” Hum. Mole. Genet., 11, 2463–2468, URL .Search in Google Scholar

Cordell, H. J. (2009): “Detecting gene-gene interactions that underlie human diseases,” Nat. Rev. Genet., 10, 392–404.Search in Google Scholar

Doerge, R. W. (2002): “Mapping and analysis of quantitative trait loci in experimental populations,” Nat. Rev. Gene., 43–52, URL .Search in Google Scholar

Dupuis, J. and D. Siegmund (1999): “Statistical methods for mapping quantitative trait loci from a dense set of markers,” Genetics, 151, 373–386.Search in Google Scholar

Erhardt, V., M. Bogdan and C. Czado (2010): “Locating multiple interacting quantitative trait loci with the zero-inated generalized Poisson regression,” Statist. Appl. Gen. Mol. Biol, 9, 1554–6115.Search in Google Scholar

Fisher, R. A. (1919): “The correlation between relatives on the supposition of Mendelian inheritance,” T. Roy. Soc. Edin., 52, 399–433.Search in Google Scholar

Fritsch, A. and K. Ickstadt (2007): “Comparing logic regression based methods for identifying snp interactions,” Berlin, Heidelberg: Springer, Lecture Notes in Computer Science, 4414, 90–103.Search in Google Scholar

Haley, C. and S. Knott (1992): “A simple regression method for mapping quantitative trait loci in line crosses using anking markers,” Heredity, 69, 315–324.Search in Google Scholar

Jansen, R. and P. Stam (1994): “High resolution of quantitative traits into multiple loci via interval mapping,” Genetics, 136, 1447–1455.Search in Google Scholar

Kao, C. and Z. Zeng (2002): “Modeling epistasis of quantitative trait loci using Cockerham′s model,” Genetics, 160, 1243–1261.Search in Google Scholar

Kao, C., Z. Zeng and R. Teasdale (1999): “Multiple interval mapping for quantitative trait loci,” Genetics, 152, 1203–1216.Search in Google Scholar

Kirkpatrick, S., C. D. Gelatt and M. Vecchi (1983): “Optimization by simulated annealing,” Science, 220, 671–680.Search in Google Scholar

Kooperberg, C. and I. Ruczinski (2005): “Identifying interacting snps using Monte Carlo logic regression,” Genet. Epidemiol., 28, 157–170.Search in Google Scholar

Lander, E. S. and D. Botstein (1989): “Mapping Mendelian factors underlying quantitative traits using rp linkage maps.” Genetics, 121, 185–199, URL .Search in Google Scholar

Li, W. and Z. Chen (2009): “Multiple interval mapping for quantitative trait loci with a spike in the trait distribution,” Genetics, 182, 337–342.Search in Google Scholar

Liu, J., Y. Liu, X. Liu and H. -W. Deng (2007): “Bayesian mapping of quantitative trait loci for multiple complex traits with the use of variance components,” Am. J. Hum. Genet., 81, 304–320.Search in Google Scholar

Lucek, P. R. and J. Ott (1997): “Neural network analysis of complex traits,” Genet. Epidemiol., 14, 1101–1106.Search in Google Scholar

Lyons, M. A., H. Wittenburg, R. Li, K. A. Walsh, M. R. Leonard, G. A. Churchill, M. C. Carey and B. Paigen (2003): “New quantitative trait loci that contribute to cholesterol gallstone formation detected in an intercross of cast/ei and 129s1/svimj inbred mice,” Physiol. Genomics, 14, 225–239.Search in Google Scholar

McIntyre, L., C. Coffman and R. Doerge (2001): “Detection and location of single binary trait loci in experimental populations,” Genet. Res., 78, 79–92.Search in Google Scholar

Ruczinski, I., C. Kooperberg and M. LeBlanc (2003): “Logic regression,” J. Comput. Graphical Statist., 12, 474–511.Search in Google Scholar

Ruczinski, I., C. Kooperberg and M. LeBlanc (2004): “Exploring interactions in high-dimensional genomic data: an overview of logic regression, with applications,” J. Multivariate Anal., 90, 178–195.Search in Google Scholar

Schwender, H., K. Bowers, M. D. Fallin and I. Ruczinski (2011): “Importance measures for epistatic interactions in case-parent trios,” Ann. Hum. Genet., 75, 122–132.Search in Google Scholar

Schwender, H. and K. Ickstadt (2008): “Identification of snp interactions using logic regression,” Biostatistics, 9, 187–198.Search in Google Scholar

Sen, S. and G. A. Churchill (2001): “A statistical framework for quantitative trait mapping,” Genetics, 159, 371–387.Search in Google Scholar

Xu, S. and W. R. Atchley (1996): “Mapping quantitative trait loci for complex binary diseases using line crosses,” Genetics, 143, 1417–1424.Search in Google Scholar

Yandell, B. S., T. Mehta, S. Banerjee, R. M. J. Y. Shriner, D. Venkataraman, W. W. Neely, H. Wu, R. von Smith and N. Yi (2007): “Qtl with Bayesian interval mapping in experimental crosses,” Bioinformatics, 23, 641–643.Search in Google Scholar

Yi, N. and S. Xu (2000): “Bayesian mapping of quantitative trait loci for complex binary traits,” Genetics, 155, 1391–1403.Search in Google Scholar

Zeng, Z. B. (1994): “Precision mapping of quantitative trait loci,” Genetics, 136, 1457–1468.Search in Google Scholar

Zhang, M., K. L. Montooth, M. T. Wells, A. G. Clark and D. Zhang (2005): “Mapping multiple quantitative trait loci by Bayesian classification,” Genetics, 169, 2305–2318, URL .Search in Google Scholar

**Published Online:**2014-01-07

**Published in Print:**2014-02-01

©2014 by Walter de Gruyter Berlin Boston