Вы находитесь на странице: 1из 7

Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 Contents lists available at ScienceDirect


2
3
4 Journal of Theoretical Biology
5
6
journal homepage: www.elsevier.com/locate/yjtbi
7
8
9
10
11
12
Two-part zero-inflated negative binomial regression model
13 for quantitative trait loci mapping with count trait
14
15 Q1 Abbas Moghimbeigi n
16
17 Modeling of Noncommunicable Disease Research Canter, Department of Biostatistics and Epidemiology, School of Public Health, Hamadan University of
Medical Sciences, Hamadan, Iran
18
19
20
21 H I G H L I G H T S
22
23 Q2  I proposed a two-part zero-inflated negative binomial for modeling the count trait.
24  The zero-inflated and negative binomial parts are influenced by genetic covariates, simultaneously.
25  The model applied for QTL mapping of the number of cholesterol gallstone formation.
 In comparison with other studies that have used the same data, the new significant markers have been detected that may be resistant QTLs against
26
gallstone formation.
27
28
29
30 art ic l e i nf o a b s t r a c t
31
32 Article history: Poisson regression models provide a standard framework for quantitative trait locus (QTL) mapping of
33 Received 15 August 2014 count traits. In practice, however, count traits are often over-dispersed relative to the Poisson
Received in revised form distribution. In these situations, the zero-inflated Poisson (ZIP), zero-inflated generalized Poisson (ZIGP)
34 6 January 2015
35 and zero-inflated negative binomial (ZINB) regression may be useful for QTL mapping of count traits.
Accepted 16 February 2015
Added genetic variables to the negative binomial part equation, may also affect extra zero data. In this
36
study, to overcome these challenges, I apply two-part ZINB model. The EM algorithm with Newton–
37 Keywords: Raphson method in the M-step uses for estimating parameters. An application of the two-part ZINB
38 Extra zero model for QTL mapping is considered to detect associations between the formation of gallstone and the
39 Negative binomial regression model
genotype of markers.
QTL mapping
40 & 2015 Published by Elsevier Ltd.
Cholesterol gallstone
41
42 67
43 68
44 1. Introduction Merek’s disease virus or the number of cholesterol gallstones formed
69
45 in mice (Wittenburg et al., 2003).
70
46 Interval mapping in experimental crosses dates backs to Lander Poisson regression models provide a standard framework for
71
47 and Botstein (1989) paper, and it is the most commonly based for analyzing of count phenotypes (Rebaı, 1997; Shepel et al., 1998). In
72
48 quantitative trait loci (QTL) mapping. The QTL mapping is a powerful many situations, however, count trait data are often over-dispersed
73
49 method to describe each genomic architecture region on the relative to the Poisson distribution. To account of data dispersion,
74
50 quantitative trait (Mackay, 2001). Regression methods have been generalized estimating equation (GEE) approach has been applied in
75
51 conducted for QTL mapping (Haley and Knott, 1992; Haley et al., QTL mapping count trait (Lange and Whittaker, 2001; Thomson,
76
52 1994; Zeng, 1993) and the methods have been extended and 2003). Then, a new approach based on the generalized Poisson (GP)
77
53 improved in composite interval mapping by Zeng (1994) and used regression mixture model has been conducted to deal with over- or
78
54 in the multiple-interval mapping by Jansen (1993) and Kao et al. under-dispersion issue (Cui et al., 2006). Xu and Hu (2010) have
79
55 (1999). Generally, the normal distribution is assumed for the developed a generalized linear (GLM) approach to interval mapping
80
56 quantitative and continuous traits. Nevertheless, sometimes the (IM) for traits with a discrete distribution of nominal and ordinal
81
57 phenotypes of some traits are measured as count variables. Count traits. They thoroughly have considered topics in interval mapping,
82
58 traits, e.g. the number of tumor lesions on chicken exposed to a missing data, and interactions for GLM models and generalized
83
59 linear mixed effect models (GLMM) and recently Che and Xu (2012)
84
60 have extended to joint mapping of multiple traits using GLMMs.
85
61 Extra-zero relative to the Poisson distribution is one of the
86
62 n
Corresponding author. Tel.: þ 98 8138380090; fax: þ98 8138380509. reasons for over-dispersion in count events. A clear example of such
87
63 E-mail address: moghimb@yahoo.com a phenomenon was provided by Lyons et al. (2003) where the
88
64 http://dx.doi.org/10.1016/j.jtbi.2015.02.016 89
65 0022-5193/& 2015 Published by Elsevier Ltd. 90
66

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
2 A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 number of gallstones has been considered, and many of the mice recombination (Xu et al., 1998). The QTL mapping study improves to 67
2 haven’t developed any disease symptoms. judge the QTL effects and their locations on the genome. In simple 68
3 In many applications where there are zero-counts, it is important terms, QTL analysis is based on the principle of detecting an 69
4 to assess whether the zero-inflation model assumption is appropriate. association between the phenotype of interest and the genotype of 70
5 In the literature, some tests for testing ZIP have been developed (Van markers. Therefore, linkage maps use to identify genome locations 71
6 Q3 den Broek, 1995; Xiang et al., 2006; Moghimbeigi et al., 2009). In containing genes and QTLs associated with a phenotype. Genetic 72
7 practice, the non-zero part of the count data is over-dispersed and markers represent genetic differences between individual organisms 73
8 other distributions such as the zero-inflated negative binomial (ZINB) or species. The greater the distance between markers, results the 74
9 may be more appropriate than zero-inflated Poisson (ZIP). In this greater chance of recombination occurring during meiosis. The 75
10 context, there are many tests for testing over-dispersion in ZIP location of genuine QTL linked to the trait could be anywhere on 76
11 regression models (Ridout et al., 2001; Xiang et al., 2007) and extra- the chromosomes or genome and may not be observed, but can be 77
12 zero in a negative binomial (NB) regression model (Moghimbeigi, inferred through the observed molecular markers. 78
13 2011). The foundation of an interval QTL mapping (Lander and Botstein, 79
14 Cui et al. (2006) have been illustrated the GP regression mixture 1989) impresses in a mixture model that each observation y is 80
15 model and then Cui and Yang (2009) developed zero-inflated considered to have arisen from one of a known or unknown part. 81
16 generalized Poisson (ZIGP) regression mixture model in order to Supposing that y is a count trait, then mixture distribution of y with 82
17 QTL mapping of count traits. More recently, Silva et al. (2011) k QTL genotypes can be written as: 83
18 proposed a ZIP model for QTL mapping. The extra zeros may be 84
y  f ðy; λ; r Þ ¼ π 1 f 1 ðy; λ1 ; r Þ þ ⋯ þ π k f k ðy; λk ; r Þ y ¼ 1; 2; 3; …
19 generated in resistive reason of locus genotypes to count trait. The   P 85
20 previous studies haven’t considered zero-inflated part as equating of where f ¼ f 1 ; …; f k , π ¼ ðπ 1 ; …; π k Þ ( k π k ¼ 1) and λ ¼ ðλ1 ; 86
21 genetic variables to detect resistance QTLs to phenotype of interest. …; λk Þ are, respectively, ZINB mass functions, proportions and 87
22 In this research, I illustrate the situation where the zero-inflated and parameters of k QTL genotypes. The r is the dispersion parameter 88
23 NB parts are influenced by genetic variables, simultaneously. in the negative binomial part. 89
24 Suppose there is a putative segregating QTL which combination 90
25 of allele Q and q to suggest a linkage with a zero-inflated count trait. 91
26 2. Methods Data for QTL mapping consist of a set of marker genotypes measured 92
27 on chromosomes and phenotypic traits of each individual. I develop 93
28 2.1. ZINB regression an interval QTL mapping method for over-dispread zero-inflated 94
29 count traits in an F2 intercross population. If the genotype of the F2 95
30 Let response variable Y ¼ ðy1 ; y2 ; …; yn Þ denotes the count intercrossed population at the putative QTL is qq, Qq, or QQ, then j 96
31 trait in the n subject. It can be written the ZINB distribution of Y as: takes values 0, 1, 2, respectively. The π ij (i¼ 1, 2, …, n, j¼ 0, 1, 2) can 97
32  r be assumed conditional probability of unobserved QTL on observed 98
r
33 pðY ¼ 0Þ ¼ ϕ þ ð1  ϕÞ flanked marker M i in F2 intercross population. The mean of the 99
λþr
34    r  yi above mixture distribution can be easily computed as: 100
  Γ yi þ r r r
35 P Y ¼ yi ¼ ð1 ϕÞ   1  101
Γ ðr ÞΓ yi þ 1 λ þ r λþr   n X
 X 2
36 EðyÞ ¼ E E yλ ; r ¼ π ij 1  ϕi=j λi=j 102
37 yi ¼ 1; 2; 3; … i ¼ 1j ¼ 0 103
38 where 0 oϕ o 1, ϕ is the probability of an extra zero response. The λ  2
  2  X
y n X 2  r 2 r 2 þ r þ λ  104
i=j
39 is the mean and r  1 is called the dispersion parameter of the E y ¼E E ; r ¼ π ij 1  ϕi=j 105
40 λ i ¼ 1j ¼ 0 λ2
i=j 106
underlying NB distribution.  y  y
41 This distribution has: 107
VarðyÞ ¼ E V þV E
42 λ λ 108
43 EðY Þ ¼ ð1  ϕÞλ 
  109
  X n X 2
λi=j
44 λ ¼ π ij 1 ϕi=j 1þ þ ϕi=j λi=j λi=j 110
VarðY Þ ¼ ð1  ϕÞ 1 þ þϕλ λ i ¼ 1j ¼ 0
r
45 r 111
h i h  i 2
46 112
þE y2j  E yj
47 113
As r  1 -0 then VarðYÞ-ð1  ϕÞð1 þ ϕλÞλ and the ZINB distribu- 8
48 114
49
tion converges to ZIP. < þ1
> for QQ
115
The NB models for counts, permits λ depend on explanatory Let x ¼ ð1; x1 ; x2 Þ where x1 ¼ 0 for Qq and
50 ( >
: 1 116
variables (Lawless, 1987). Then the linear predictors ξi and 1 for Qq for qq
51 117
ηi ði ¼ 1; 2; … nÞ are defined as: x2 ¼ .
52 0 for qq or QQ 118
 
53 ϕi 119
log ¼ ξi ¼ X T1i α Then the conditional mean of each part of QTL genotypes Gj can be
54 1  ϕi 120
written as:
55 log ðλi Þ ¼ ηi ¼ X T2i β 121
8
56
λi  < λ2 ¼ expðμ þ aÞ for QQ
> 122
57 where the X 1i and X 2i are, respectively, covariates vectors of the ith 123
¼ exp ηi=j ¼ λ1 ¼ expðμþ dÞ for Qq
58 subject in the logistic and NB parts and aren’t always the same. The α Gj >
: λ ¼ expðμ  aÞ for qq 124
0
59 and β are the corresponding vectors of regression coefficients. 125
60 where β ¼ ðμ; a; dÞ0 which μ is an overall genetic effect, a is an 126
61 2.2. QTL mapping with ZINB additive genetic effect and d is a dominant genetic effect in the NB 127
62 part (Lynch and Walsh, 1998). Similarly, for the zero-inflation part 128
63 Suppose an experimental F2 cross design with two inbred lines 8 as:
can be written  129
64 that results randomly n individuals. We consider an interval mapping  > < expg 0 þ g 1  for QQ 130
65 method for zero-inflated count traits. QTL mapping is based on the exp ξi=j ¼ exp g 0 þ g 2 for Qq where g ¼ ðg 0 ; g 1 ; g 2 Þ0 131
66 principle that genes and markers segregate by through chromosome : expg  g  for qq
>
132
0 1

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 3

1 which g 0 is an overall genetic effect, g 1 is an additive genetic effect parameters. The E-step of the EM algorithm proceeds replacing zi=j by 67
ðgÞ
2 and g 2 is a dominant genetic effect in the zero-inflated part. its conditional expectation zi=j , where g denotes gth iteration, under 68
ðgÞ ðgÞ
3 Finally, for testing QTL existence at each location in genome- the current values of g^ and β^ estimates: 69
4 wide or chromosome-wide, the hypothesis tests are formulated 8 70
< 1 ðgÞ  r^ðgÞ if yi ¼ 0
5 as: H0: a ¼ d ¼ g 0 ¼ g 1 ¼ 0 and H1: at least one parameter is not 1 þ exp  aTi g^ t i=j 71
zi=j ðgÞ ¼
6 zero. The likelihood ratio (LR) test statistic is twice the difference : 0 if yi Z 1 72
7 in a log-likelihood of the full model (H1) over the reduced model 73
8 (H0): LR ¼  2(logL0  logL1). The L1 and L0 obtain under maximum where 74
9 likelihood estimations (MLEs) of parameters under H1 and H0, ^r ðgÞ r 75
t i=j ¼  ðgÞ

10 respectively. 76
r þ exp xTi=j β^
11 77
12 2.3. Parameter estimation via EM algorithm and with fixing zi=j at zðgÞ 78
i=j , the ℓξ and ℓη are maximized (M-step)
13 ðg þ 1Þ
79
ðg þ 1Þ
14 The log-likelihood for over-dispersed count data can be max- separately for g^ and β^ inside view of the orthogonal 80
15 imized by a stable numerical procedure such as the EM algorithm partition ℓc ¼ ℓξ þℓη . 81
16 (Yau et al., 2003; Moghimbeigi et al., 2008). To ensure convergence In the M-step, the Newton–Raphson algorithm is applied to get 82
17 and stability in the estimation of genetic effects, the log-likelihood the maximum likelihood estimation for each putative QTL position 83
18 can be written as (Wang et al., 2002): every 1 or 2 cM distance. The details of the EM algorithm are given 84
19 2  3 in Appendix A. 85
20 P X 2 exp ξi=j þt ri=j 86
ℓc ¼ ci=j log 4  5þ
21 yi ¼ 0j ¼ 0 1 þ exp ξi=j 87
22 3. Simulation 88
"  
23 XX 2
Γ yi þ r     89
24 log   þ r log t i=j þ yi log 1  t ii=j Monte Carlo simulations carry out to investigate the statistical 90
yi 4 0 j ¼ 0 Γ yi þ 1 Γ ðr Þ
25 behavior of the proposed methods in practical situations. Simula- 91
#
26   X n X2 tions are designed to assess the two-part ZINB model for detection 92
 log 1 þ exp ξi=j þ ci=j log π ij of the F2 intercross design performance. The effects of different
27 93
i ¼ 1j ¼ 0
28 sample sizes (n ¼ 100, 200, and 500) to the estimation of para- 94
29 meters under varying parameters in the zero-inflation part is 95
30 The complete data log-likelihood ℓc is constructed as ℓc ¼ evaluated. Five equidistant markers are considered for a relatively 96
31 ℓξ þ ℓη with easy to analyze. The QTL and marker genotypes on 1 chromosome 97
32 2 h
of the length 80 cM are generated. Markers are equally spaced at a 98
XX   i
33 ℓξ ¼ zi=j ξi=j  log 1 þ exp ξi=j distance of 16 cM with the first marker at position 16 and the fifth 99
34 yi j ¼ 0 marker at position 80 of chromosome. The QTL is located at 48 cM 100
35 " # distance. In the model, the dispersion (k), NB part (μ, a and d) and 101
 
36 X  Γ yi þ r       zero-inflation part (g 0 , g 1 and g 2 ) parameters are considered. The 102
ℓη ¼ 1  zi=j log   þr log t i=j þ yi log 1  t i=j
37 Γ yi þ 1 ΓðrÞ parameters are fixed and ZINB distribution is used to generate 103
yi
38 zero-inflated count trait and each simulation 100 replicates are 104
39
n X
X 2   implemented. All simulations are in accord with previous simula- 105
þ ci=j log π ij tion studies (Piepho and Gauch, 2001; Cui and Yang, 2009). The LR
40 i¼1j¼0 106
41 test and the logarithm of the odds score of 3 (LOD3) as the 107
42 where zi=j is a latent binary variable that shows whether the response threshed are used to determine significance of this test in each 108
43 variable of ith (i¼ 1, …, n) subject (yi ) with
 genotype
 covariate value j simulation. In each chromosome-wide the threshold value obtains 109
44 (j¼0, 1, 2) is from the latent class zero zi=j ¼ 1 or non-zero ðzi=j ¼ 0Þ. such that 5% of all LR values are greater than this critical value, and 110
45 The extra zeros may be generated in resistive reason of locus it calculates in genome-wide similarly. The null distributions of LR 111
46 genotypes and happen in order to genotype effects. The complete- were simulated with assuming no genetic effects in the both NB 112
47 data log-likelihood ℓc decomposes into two orthogonal components and zero-inflation parts i.e. a ¼ d ¼ g 1 ¼ g 2 ¼ 0. 113
48 ℓξ and ℓη with considering the occurrence of extra zeros as a missing The power of QTL detecting in each simulation by LOD 3 threshed 114
49 latent variable (Wang et al., 2002; Yau et al., 2003). So, parameter and LR test is calculated and indicated as p. Without generally loosing, 115
50 estimation can be performed by maximizing these two functions, the NB part parameters are fixed and two zero-inflation patterns are 116
51 separately. Maximization of the log-likelihoods for the two compo- applied. In these two patterns, it is considered, completely different 117
52 nents of the model provides a convenient method for estimation of parameters for the zero-inflation part. The mean MLEs of parameters 118
53 119
54 Table 1 120
The mean MLEs (RMSEs) of the parameters and QTL testing power obtained from 100 simulation replicates with two ZI part patterns.
55 121
56 n Distance 1/k ¼0.5 NB part ZI part Power 122
57 123
58 Dist ¼ 48 cM μ ¼0.5 a¼ 0.5 d ¼ 0.3 g0=-2.0 g 1 ¼1 g2 ¼ 0.5 P1 P2 124
59 125
100 46.80 (18.080) 0.334 (0.173) 1.981 (0.166) 0.495 (0.195) 0.315 (0.201)  2.208 (1.125) 0.804 (0.349) 0.316 (0.367) 99 100
60 200 47.88 (16.785) 0.461 (0.075) 1.993 (0.089) 0.507 (0.089) 0.309 (0.123)  1.907 (0.481) 0.936 (0.481) 0.375 (0.570) 100 100 126
61 500 47.54 (1.709) 0.467 (0.048) 2.005 (0.056) 0.491 (0.056) 0.307 (0.078)  1.834 (0.214) 0.820 (0.214) 0.388 (0.272) 100 100 127
62 g 0 ¼0.7 g1 ¼ 0.4 g2 ¼ 0.1 128
63 100 47.2 (13.643) 0.455 (0.123) 1.667 (0.166) 0.334 (0.173) 0.459 (0.055) 0.796 (0.413) 0.502 (0.425)  0.050 (0.680) 92 98 129
200 48.84 (9.592) 0.436 (0.160) 1.957 (0.195) 0.533 (0.200) 0.358 (0.292) 0.753 (0.303) 0.447 (0.257) 0.067 (0.415) 100 100
64 130
500 47.96 (3.137) 0.467 (0.095) 1.984 (0.94) 0.522 (0.101) 0.318 (0.133) 0.728 (0.168) 0.418 (0.162) 0.047 (0.234) 100 100
65 131
66 P1: The LOD3 threshold power. P2: the simulated LR test power. 132

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
4 A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

1 and their square root mean square errors (RMSEs) are listed in Fig. 2 shows the LR test values in the autosomal chromosomes 67
2 Table 1. As expected, with increasing sample size, RMSEs of QTL of mice. The dashed horizontal line indicates the 5% genome-wide 68
3 location estimation decrease from 18.080 to 1.709 and from 13.643 to significance level and the solid line indicates the 5% chromosome- 69
4 3.137, respectively, in two zero-inflation patterns. Also, the power in wide significance level. The chromosome-wide threshold value 70
5 LOD3 threshed and LR test rapidly increase. As sample size increases obtains of all LR values that %5 LRs are less than this critical value. 71
6 to 200, powers increase to 100 in zero-inflation parameters patterns. Also in a similar way, it is calculated in the genome-wide. Fig. 2 72
7 clearly shows that on the 10 chromosomes, 17 QTLs are significant 73
8 74
9 ZIP-NB ___ Poisson-ZIP ......
75
10 4. Application 76
11 250 77
12 Cholesterol gallstone formation is a complex genetic common 78
200
13 disorder that affects people for centuries. To identify additional 79

AIC differences
14 cholesterol gallstone susceptibility loci, it has performed a QTL 150 80
15 analysis using an intercross of PERA/Ei and I/LnJ inbred strains 100 81
16 of mice. 82
50
17 The ZINB model for QTL mapping is considered to detect associa- 83
18 tions between the number of cholesterol gallstone and the genotype 0 84
19 of markers. I use the dataset associated with Wittenburg et al. (2003). 85
20 The data have been obtained by means of an intercross population 0 1 2 3 4 5 6 7 8 9 101112 13 14 15 16171819 86
21 generated from the intercross gallstone susceptible wild-derived Chromosomes 87
22 inbred strain PERA/Ei (PERA) and a resistant inbred strain, I/LnJ (I). 88
23 Total 279 F2 mice have been generated and 107 genetic markers, with NB-ZIGP ___ ZIGP-ZINB ...... 89
24 a total length of 1382.3 cM, representing a good coverage of 19 mouse 90
25 autosomal chromosomes. Cui and Yang (2009) considered the dataset 250 91
26 and applied ZIGP mixture model in mapping QTL. 92
27 It is essential to find the best fit between the distributions 200 93
AIC differences

28 commonly used for count data to detect associations between the 94


150
29 number of cholesterol gallstone and the genotype of markers. 95
30 Genotype markers may resist generating cholesterol gallstone and 100 96
31 then many of zeros. In other words, many QTLs may affect to 97
32 generate a number of cholesterol gallstone and many others resist 50 98
33 generation of cholesterol gallstone. To select a model that is evidence 99
0
34 for data, it is necessary to compare models after fitting data. Unlike 100
35 other fields (Van den Broek, 1995; Ridout et al., 2001; Xiang et al., 101
36 2006, 2007; Moghimbeigi et al., 2009), in the QTL mapping context, 0 1 2 3 4 5 6 7 8 9 101112 13 14 15 16171819 102
37 there is not a score test in comparison and selection optimum model Chromosomes 103
38 for zero-inflated and over-dispersed count traits. In the data, the Fig. 1. Comparisons of AIC difference values calculated from (a): the ZIP, NB and 104
39 frequency distribution of the cholesterol gallstone counts shows the Poisson models (b): the NB, ZIGP and ZINB models across 19 chromosomes. 105
40 large proportion (about 57%) of zero data. Cui and Yang (2009) 106
41 showed that there is over-dispersion in the data against the Poisson Ch.1 Ch.2 Ch.3 Ch.4 107
42 regression model. The AIC values of the Poisson, NB, ZIP, ZIGP and 108
43 proposed ZINB distributions are considered for models comparisons. 109
44 In calculation AIC, the genetic covariates entered into the models. The 110
45 ZIP and ZIGP models as proposed by Cui and Yang (2009) are refitted 111
46 (R code available at: http://www.stt.msu.edu/ cui/) and calculated Ch.5 Ch.6 Ch.7 Ch.8 Ch.9 112
47 the AIC with genetic covariates in the non-zero part. Also in the 113
48 proposed ZINB model, AIC obtains with genetic covariates in the non- 114
49 zero part and an intercept in zero-inflation part. The AIC values are 115
50 reported on across scanned chromosomes position (Fig. 1). Among 116
51 the models, smaller AIC values indicate better fitting of the under- 117
LR

Ch.10 Ch.11 Ch.12 Ch.13 Ch.14


52 lying model to the data. Fig. 1(a) displays the AIC values of the ZIP 118
53 model are smaller than the Poisson model and larger than the AIC 119
54 values of the NB model. The figure suggests substantial zero-inflation 120
55 and over-dispersion against the Poisson model in the cholesterol 121
56 gallstone counts. So for analyzing the data and QTL mapping, the NB, 122
57 ZIGP or ZINB is beneficial. Fig. 1(b) shows that AIC of the proposed Ch.15 Ch.16 Ch.17 Ch.18 Ch.19 123
58 ZINB model (one parameter in the zero-inflation part) in the almost 124
59 over the genome is less than AIC of the NB and ZIGP regression 125
60 models. It is clear that the AIC of the NB and ZIGP are nearly equal in 126
61 genome-wide. Also, the AIC difference of GP and ZIGP regression 127
62 models in the data was obtained small (Cui and Yang, 2009). Testing Position 128
63 However, the almost larger differences of the AIC of the two-part 129
Fig. 2. The log-likelihood ratios (LR) between reduced and full ZINB models in
64 ZINB regression model than the NB regression models, confirms that testing position across chromosomes and genome wide. In the chromosomes and
130
65 the proposed model has better fit and contributes a large proportion genome wide, 5% simulated thresholds are presented dashed and solid horizontal 131
66 of zeros. lines, respectively. 132

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 5

1 at the genome-wide significance level. The detected QTLs are situation, two-component zero-inflated Poisson, NB or GP, which 67
2 located on the chromosomes 1, 3, 4, 6, 8, 10, 13, 15, 16 and 17. consider a part with variables that affect to extra zero data, are 68
3 Also one QTL is significant on the chromosome wide (chromosome suitable. This paper has proposed a two-component ZINB, NB and 69
4 11). Table 2 tabulates the estimated genetic effects with the zero-inflation parts, to model QTL’s count phenotype containing 70
5 asymptotic standard errors given in the parenthesis. Fig. 2 displays extra zero data and over-dispersion, simultaneously. The method 71
6 all locations that are significant in both parts. Table 3 compares can provide insight into the source of excess zeros and the 72
7 detecting QTLs in ZIGP and two-part ZINB models. The last QTLs apparent over-dispersion in the count trait structure. I applied 73
8 detected on chromosomes 4 and 10, and a QTL detected on the method to the QTL mapping of cholesterol gallstone trait to 74
9 chromosome 15 are consistent with the results that reported identify loci in an intercross of PERA/Ei and I/LnJ inbred strains 75
10 Q4 studies (Wittenburg et al., 2003; Cui and Yang, 2009). The second of mice. 76
11 QTL detected on chromosome 8 is also detected with ZIGP In the presence of extra-zero and over-dispersion relative to 77
12 regression model. The QTLs on the chromosomes 1, 3, 6, 13, 17, ZIP, the ZINB regression model enables the researchers to draw 78
13 the first QTLs on the chromosome 8, the second QTLs on the sensible and valid conclusions, with identifying significant addi- 79
14 chromosome 4 and the second QTL on the chromosome 16 are not tive or dominant genetic affects to the count and zero-inflated trait 80
15 reported before. I considered two-part ZINB that contains an NB parts. For the ZINB regression model, estimation of parameters is 81
16 part and a zero-inflation part. It seems the many of identified QTLs facilitated using the EM algorithm with Newton–Raphson algo- 82
17 are significant in zero-inflation part and are resistant QTLs against rithm in the M-step. 83
18 gallstone formation. Cui and Yang (2009) have carried out ZIGP model by considering a 84
19 zero-inflation and an over-dispersion parameters. Table 3 shows a 85
20 comparison between the ZIGP and the proposed two-part ZINB 86
21 5. Discussion models. As seen, only one QTL on chromosome10 is detected in the 87
22 genome-wide significance level with ZIGP regression model and it is 88
23 In many situations count traits may have excess zero data that on the marker position (D10Mit102). Additional to this position, in the 89
24 can be arisen because of genetics resistance types in locus. In this current study, 16 QTLs are significant in the genome-wide. Potentially, 90
25 91
26 Table 2 92
The estimated genetic effects (asymptotic standard error) of detecting QTLs in two parts of ZINB model.
27 93
28 Ch Maker interval NB part ZI part LR/LOD 94
29 95
30 β0 β1 β2 1/k α0 α1 α2 96
31 97
1 D1Mit296 0.170 (0.166)  0.277 (0.166) 0.455 (0.222) 1.920 (0.313)  2.00 (0.337) 1.110 (0.337) 0.313 (0.408) 13.151/2.86
32 3 D3Mit137–D3Mit104 1.471 (0.328) 0.673 (0.325)  0.454 (0.364) 0.564 (0.215) 0.548 (0.378) 0.060 (0.378)  1.729 (0.502) 14.980/3.25 98
33 4 D4Mit111–D4Mit152 0.983 (0.143)  0.254 (0.141)  0.797 (0.208) 1.176 (0.237)  0.557 (0.172)  0.360 (0.172)  0.754 (0.278) 13.084/2.84 99
34 4 D4Mit152–D4Mit31 0.973 (0.140)  0.146 (0.138)  0.901 (0.207) 1.234 (0.243)  0.829 (0.183)  0.279 (0.183)  0.322 (0.275) 13.819/3.01 100
35 4 D4Mit204 0.785 (0.144) 0.381 (0.143)  0.489 (0.208) 1.323 (0.249)  1068 (0.200)  0.070 (0.200) 0.224 (0.270) 15.565/3.38 101
6 D6Mit31–D6Mit105 0.176 (0.151)  0.620 (0.150) 0.717 (0.220) 1.320 (0.267)  1.875 (0.299)  0.836 (0.299) 1.487 (0.348) 15.964/3.47
36 8 D8Mit292–D8Mit147 1.488 (0.139) 0. 939(0.178)  0.317 (0.177) 0.298 (0.154) 0.136 (0.303) 0.623 (0.303)  0.840 (0.452) 19.444/4.22
102
37 8 D8Mit147–D8Mit271 1.770 (0.188) 0.696 (0.187)  0.788 (0.257) 0.376 (0.169) 0.110 (0.312) 0.354 (0.312)  0.676 (0.443) 16.276/3.53 103
38 10 D10Mit22 0.964 (0.147) 0.461 (0.145)  0.632 (0.206) 1.057 (0.229)  0.610 (0.191) 0.445 (0.191)  0.125 (0.263) 16.950/3.68 104
39 10 D10Mit12 0.913 (0.139) 0.424 (0.137)  0.691 (0.204) 1.094 (0.230)  0.833 (0.190) 0.248 (0.190) 0.084 (0.264) 20.383/4.43 105
10 D10Mit102 0.681 (0.140) 0.412 (0.140)  0.467 (0.216) 1.510 (0.274)  1.443 (0.215)  0.088 (0.215) 0.496 (0.290) 18.202/3.95
40 106
11 D11Mit152–D11Mit4 0.630 (0.144)  0.191 (0.143)  0.122 (0.216) 1.559 (0.271)  1.159 (0.198) 0.190 (0.198) 0.219 (0.279) 8.944/1.942n
41 13 D13Mit64–D13Mit11 0.692 (0.145)  0.502 (0.149)  0.296 (0.208) 1.386 (0.259)  1.165 (0.215)  0.478 (0.215) 0.128 (0.286) 12.316/2.67 107
42 15 D15Mit174–D15Mit184 0.728 (0.142)  0.340 (0.142)  0.562 (0.210) 1.437 (0.264)  1.571 (0.240)  0.211 (0.240) 0.613 (0.302) 13.587/2.95 108
43 16 D16Mit57 0.919 (0.148)  0.105 (0.146)  0.706 (0.217) 1.230 (0.239)  0.816 (0.190) 0.517 (0.190) 0.027 (0.269) 17.754/3.86 109
44 16 D16Mit42–D16Mit139 0.980 (0.208) 0.216 (0.147)  0.905 (0.146) 1.323 (0.214)  0.945 (0.208) 0.917 (0.208) 0.339 (0.300) 19.884/4.32 110
17 D17Mit171–D17Mit177 0.849 (0.138) 0.021 (0.138)  1.027 (0.216) 1.591 (0.280)  1.712 (0.238) 0.057 (0.238) 0.381 (0.319) 12.146/2.64
45 17 D17Mit177–D17Mit150 0.560 (0.138)  0.315 (0.138)  0.013 (0.213) 1.424 (0.256)  2.403 (0.416)  1.118 (0.416) 1.959 (0.450) 13.051/2.83
111
46 112
47 Note: The significance is at a level 5% through 200 permutation tests. 113
n
48 The significance is on chromosome-wide with LR test. 114
49 115
50 116
Table 3
51 The comparison of detecting QTLs in ZIGP and two-part ZINB models.
117
52 118
53 Ch Maker interval ZIGP Two-part ZINB Ch Maker interval ZIGP Two-part ZINB 119
54 120
(1) (2) (1) (2) (1) (2) (1) (2)
55 121
56 1 D1Mit296   þ þ 10 D10Mit12   þ þ 122
57 3 D3Mit137–D3Mit104   þ þ 10 D10Mit102 þ þ þ þ 123
58 4 D4Mit111–D4Mit152   þ þ 11 D11Mit152–D11Mit4   þ  124
4 D4Mit152–D4Mit31   þ þ 13 D13Mit64–D13Mit11   þ þ
59 125
4 D4Mit204 þ  þ þ 15 D15Mit174–D15Mit184 þ  þ þ
60 6 D6Mit31–D6Mit105   þ þ 16 D16Mit122 þ    126
61 8 D8Mit292–D8Mit147   þ þ 16 D16Mit57   þ þ 127
62 8 D8Mit147–D8Mit271 þ  þ þ 16 D16Mit42–D16Mit139   þ þ 128
63 10 D10Mit148–Mit22 þ    17 D17Mit171–D17Mit177   þ þ 129
10 D10Mit22   þ þ 17 D17Mit177–D17Mit150   þ þ
64 130
10 D10Mit66–Mit12 þ    19 D19Mit32–Mit40 þ   
65 131
66 (1): Chromosome-wide; (2): genome-wide; the plus ( þ) and minus (  ) signs show significance and no-significance with LR test at a level 5%, respectively. 132

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
6 A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎

! !r !
1 these positions could affect to excess zeros or cholesterol gallstone yi  Γ y þ r  r r 67
i
2 count trait. It is noteworthy, that the identified QTLs may be resistance P ; ϕi=j ; r ¼ 1  ϕi=j   1 68
λi=j Γ ðr ÞΓ yi þ 1 λi=j þ r λi=j þ r
3 against cholesterol gallstone generation, however, I found no evidence 69
4 to support the idea. In addition, Table 3 shows two QTLs on y yi ¼ 1; 2; 3; … 70
5 chromosome 10 and QTLs on chromosomes 16 and 19 are detected 71
6 on chromosome-wide by the ZIGP regression model, but are not Then, with considering: 72
7 signed by the two-part ZINB regression model. As Fig. 1(b) shows   73
n X
X 2  
8 almost in the more locations, the two-part ZINB regression model and ℓ¼ ci=j I yi ¼ 0 log ϕi=j þ 1  ϕi=j t ri=j 74
9 in other locations, the ZIGP regression model have small AIC in QTL i ¼ 1j ¼ 1 75
10 mapping of the data. Joe and Zhu (2005) have shown that fitting the
  76
n X
X 2  
11 NB and GP result differences in many situations, their zero-inflated þ ci=j I yi 4 0 log 1  ϕi=j 77
12 distributions, means and variances. In addition, Nikoloulopoulos and i ¼ 1j ¼ 1 78
 
13 Karlis (2008) have expressed that for small mean and over-dispersed Γ yi þr   79
14 data, modeling with both distributions are quite the same, while for þ log   þ r log t i=j 80
Γ y i þ 1 Γ ðr Þ
15 larger means the GP distribution has larger tails than the NB. Fig. 2  X n X 2
81
16 shows that the 5% simulated threshold in the chromosomes and þ yi log ð1 t i=j Þ þ ci=j log π ij 82
17 genome wide are so close, whereas it was quite large in the study of i ¼ 1j ¼ 1 83
18 Cui and Yang (2009) in which the ZIPG model was used. 84
19 The QTL mapping with a mixture of two components ZIGP (zero-   85
Assume ξi=j ¼ log ϕi=j = 1  ϕi=j where,0o ϕi=j o 1, ϕi=j is the
20 inflated and generalized Poisson parts) and comparing the results 86
probability of an extra zero response.
21 with current study is important areas of future research. The dataset 87
22 associated with Wittenburg et al. (2003) that used in this study, have X
n X
2 h       i 88
ℓ¼ ci=j  log 1 þ exp ξi=j þ I yi ¼ 0 log exp ξi=j þ t ri=j
23 two large numbers of 20 and 25 cholesterol gallstone formation. In 89
i ¼ 1j ¼ 1
24 this context, to overcome outlier trait counts, the robust expectation- 90
25 solution (Hall and Shen, 2010), minimum Hellinger distance (MHD)    91
X
n X
2   Γ yi þ r  
26 approach for finite mixtures of Poisson regression models (Lu et al., þ ci=j I yi 40 log þ r log t i=j 92
1 ¼ 1j ¼ 1
Γðyi þ1ÞΓðrÞ
27 2003) are suggestion area for future studies. 93
28 Asymptotic standard errors of parameters are obtained with  n X
 X 2
94
þyi log 1  t i=j þ ci=j log π ij
29 the square root of the diagonal elements of the inverse of the 95
i ¼ 1j ¼ 1
30 information matrix. 96
31 97
32 Since 98
33 Acknowledgements         99
ci=j f yi ; ci=j f yi =ci=j f ci=j
34 f ¼   ¼ P2   100
yi f yi s ¼ 0 π is ps yi =λs=j ; ξ
35 The author would like to thank the Jackson laboratory for provid- 101
36 ing the mouse data set (available at: http://www.qtlarchive.org) and 102
37 an anonymous reviewer for their helpful and constructive comments Then in the E-step we calculate Π i=j at the tth iteration as: 103
38 that greatly contributed to improving the quality of the publication.  104
  π ij pj yi =λi=j ; ξi=j
39 Q5 This study was supported by Hamadan University of Medical Sciences. ðtÞ ci=j 105
Π ¼E ; π; λi=j ; ξ ¼ P 
40 yi 2 106
s ¼ 0 π ij ps yi =λs=j ; ξi=j
i=j
41 107
42 Appendix A 2 ðtÞ h
108
n X
X       i
43 ℓ¼ ∏  log 1 þ exp ξi=j þ I yi ¼ 0 log exp ξi=j þ t ri=j 109
44 Consider ci ¼ 0; 1 and 2 are for QTL genotype correspondence i ¼ 1j ¼ 1 i=j 110
"   #
45 qq, Qq and QQ. The EM algorithm for parameter estimation is X
n X
2 ðtÞ   Γ yi þ r     111
implemented as follows: þ ∏I yi 4 0 log   þ r log t i=j þ yi log 1  t i=j
46 Γ yi þ1 ΓðrÞ 112
1 ¼ 1j ¼ 1 i=j
47 2   113
f ðc i Þ ¼ Π
c
π iji=j where π ij ¼ p ci=j ¼ j X
n X
2 ðtÞ
48 j¼0 þ ∏ log π ij 114
49 " !#ci=j i ¼ 1j ¼ 1 i=j 115
 
50 y 2 yi 116
f i ¼ Π p ; ϕi=j ; r
51 ci j¼0 λi=j Then let zi denotes an unobserved binary variable indicating 117
52 ! 118
n   n yi whether yi to come from the latent class zero ðzi ¼ 1Þ or negative
53 f ðy; cÞ ¼ ∏ f yi ; ci ¼ ∏ f ; ϕi=j ; r f ðci Þ binomial part ðzi ¼ 0 Þ. Hence when zi ¼ 1, then t i ¼ 0 and when 119
54 i¼1 i¼1 λi=j 120
8 2 zi ¼ 0, then exp ξi=j ¼ 0. So, the complete data log-likelihood ℓ
55 ! 3ci=j 121
n < 2 y can be rewritten as:
56 ¼ ∏ ∏ 4pj i
; ϕi=j ; r π ij 5 g 122
2 ðtÞ    X 2 ðtÞ 
i ¼ 1:j ¼ 0 λi=j X
n X n X
57 123
!! ℓ¼ ∏  log 1 þ exp ξi=j þ ∏ zi ξi=j þ ð1  zi Þt r=ji
58 124
X n X 2
yi X n X 2   i ¼ 1j ¼ 1 i=j i ¼ 1j ¼ 1 i=j
59 ℓ¼ ci=j log pj ; ϕi=j ; r þ ci=j log π ij "   # 125
60 i ¼ 1j ¼ 1
λ i=j i ¼ 1j ¼ 1 X
n X
2 ðtÞ
Γ yi þ r     126
þ ∏ð1  zi Þ log   þ r log t i=j þ log 1  t i=j
61
i ¼ 1j ¼ 1 i=j
Γ yi þ 1 Γ ðr Þ 127
62 128
The ZINB distribution of count traits can be written as:
63 X
n X
2 ðtÞ 129
   r þ ∏ log π ij
64 p yi ¼ λ0i=j ; ϕi=j ; r ¼ ϕi=j þ 1  ϕi=j λi=jrþ r 130
i ¼ 1j ¼ 1 i=j
65 131
66 132

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i
A. Moghimbeigi / Journal of Theoretical Biology ∎ (∎∎∎∎) ∎∎∎–∎∎∎ 7


1 Assume λi=j ¼ exp ηi=j in the t i=j , then ℓ ¼ ℓξ þ ℓη with Lu, Z, Hui, Y.V., Lee, A.H., 2003. Minimum Hellinger distance estimation for finite 53
2 mixtures of Poisson regression models and its applications. Biometrics 59 (4), 54
n X
X 2 ðtÞ    X
n ðtÞ
1016–1026.
3 ℓξ ¼ ∏ zi ξi=j  log 1 þexp ξi=j þ Π zi log π ij Lynch, M., Walsh, B., 1998. Genetics and Analysis of Quantitative Traits. Sinauer 55
i=j
4 i ¼ 1 j ¼ 0 i=j i¼1 Associates Inc, Sunderland. 56
  ! Lyons, M.A., Wittenburg, H., Li, R., Walsh, K.A., Leonard, M.R., Korstanje, R.,
5 n X
X 2 ðtÞ Γ yi þ r     57
6 ℓη ¼ ∏ ð1  zi Þ log   þ r log t i=j þ yi log 1  t i=j Churchill, G.A., Carey, M.C., Paigen, B., 2003. Lith6: a new QTL for cholesterol
58
i ¼ 1 j ¼ 0 i=j
Γ yi þ 1 ΓðrÞ gallstones from an intercross of CAST/Ei and DBA/2J inbred mouse strains.
7 J. Lipid Res. 44, 1763–1771. 59
8 n X
X 2 ðtÞ Mackay, T.F.C., 2001. Quantitative trait loci in Drosophila. Nat. Rev. Genet. 2, 11–20. 60
9 þ Π ð1  zi Þlog π ij Moghimbeigi, A., 2011. A score test for extra zeros in negative binomial mixed
61
i=j models. J. Stat. Comput. Simul. 81 (5), 635–644. http://dx.doi.org/10.1080/
i¼1j¼1
10 00949650903451777. 62
11 Moghimbeigi, A., Eshraghian, M.R., Mohammad, K., McArdle, B., 2008. Multilevel 63
zero-inflated negative binomial regression modeling for over-dispersed count
12 Therefore, that parameter estimation can be performed by 64
data with extra zeros. J. Appl. Stat. 35 (10), 1193–1202. http://dx.doi.org/
13 maximizing these two functions separately. 10.1080/02664760802273203. 65
14 The M-step of the EM algorithm can be performed by the Moghimbeigi, A., Eshraghian, M.R., Mohammad, K., McArdle, A., 2009. A score test 66
15 for zero-inflation in multilevel count data. Comput. Stat. Data Anal. 53 (4), 67
following two sets of Newton–Raphson equations for estimating 1239–1248. http://dx.doi.org/10.1016/j.csda.2008.10.041.
16 β ¼ ðμ; a; dÞ0 and g ¼ ðg 0 ; g 1 ; g 2 Þ0 . Nikoloulopoulos, A.K., Karlis, D., 2008. On modeling count data: a comparison of 68
17   h i   some well-known discrete distributions. J. Stat. Comput. Simul. 78 (3), 437–457. 69
∂ℓξ ∂ℓη
18 g^ ¼ g 0 þ ℑg 1 ; β^ ¼ β0 þℑβ 1 : Piepho, H.P., Gauch, H.G., 2001. Marker pair selection for mapping quantitative trait 70
∂g ∂β loci. Genetics 157, 433–444.
19   Rebaı, A., 1997. Comparison of methods for regression interval mapping in QTL 71
20 where fα0 g and β0 are the initial values of the parameters and analysis with non-normal traits. Genet. Res. 69 (1), 69–74. 72
21 are replaced by their updated estimates in each iteration. The ℑβ 1 Ridout, M., Hinde, J., Deme´trio, C.G.B., 2001. A score test for testing a zero-inflated 73
Poisson regression model against zero-inflated negative binomial alternatives.
22 and ℑg 1 are inverse of information matrix in negative binomial Biometrics 57 (1), 219–223.
74
23 and logistic parts, respectively. Scale parameter (r) in NB part and Shepel, L.A., Lan, H., Haag, J.D., Brasic, J.M., Gheen, M.E., Simon, J.S., Hoff, P., Newton, 75
24 its asymptotic standard error computed with the manner of Yau M.A., Gould, M.N., 1998. Genetic identification of multiple loci that control 76
breast cancer susceptibility in the rat. Genetics 149 (1), 289–299.
25 et al. (2003). 77
Silva, F.F., Tunin, K.P., Rosa, G.J.M., da Silva, M.V.B., Azevedo, A.L.S., Verneque, R., da,
26 S., Machado, M.A., Packer, I.U., 2011. Zero-inflated Poisson regression models 78
27 References for QTL mapping applied to tick resistance in a Gyr  Holstein F2 population. 79
Genet. Mol. Biol. 34 (4), 575–581.
28 80
Thomson, P.C., 2003. A generalized estimating equations approach to quantitative
29 Cui, Y., Kim, D.Y., Zhu, J., 2006. On the generalized Poisson regression mixture trait locus detection of non-normal traits. Genet. Sel. Evol. 35, 257–280. 81
30 model for mapping quantitative trait loci with count data. Genetics 174, Van den Broek, J., 1995. A score test for zero-inflation in a Poisson distribution. 82
31 2159–2172. http://dx.doi.org/10.1534/genetics.106.061960. Biometrics 51 (2), 738–743. 83
Cui, Y., Yang, W., 2009. Zero-inflated generalized Poisson regression mixture model Wang, K., Yau, K.K.W., Lee, A.H., 2002. A zero-inflated Poisson mixed model to
32 for mapping quantitative trait loci underlying count trait with many zeros. analyze diagnosis related groups with majority of same-day hospital stays. 84
33 J. Theor. Biol. 256, 276–285. http://dx.doi.org/10.1016/j.jtbi.2008.10.003, Epub Comput. Methods Program. Biomed. 68 (3), 195–203. 85
34 2008 Oct 15. Wittenburg, H., Lyons, M.A., Li, R., Churchill, G.A., Carey, M.C., Paigen, B., 2003. FXR 86
Che, X., Xu, S., 2012. Generalized linear mixed models for mapping multiple and ABCG5/ABCG8 as determinants of cholesterol gallstone formation from
35 quantitative trait loci. Heredity 109 (1), 41–49. http://dx.doi.org/10.1038/ quantitative trait locus mapping in mice. Gastroenterology 125 (3), 868–881. 87
36 hdy.2012.10, Epub 2012 Mar 14. http://dx.doi.org/10.1016/S0016-5085(03)01053-9. 88
37 Haley, C.S., Knott, S.A., 1992. A simple regression method for mapping quantitative Xiang, L., Lee, A.H., Yau, K.K.W., McLachlan, G.J., 2006. A score test for zero-inflation 89
trait loci in line crosses using flanking markers. Heredity 69 (4), 315–324. in correlated count data. Stat. Med. 25, 1660–1671. http://dx.doi.org/10.1002/
38 Haley, C.S., Knott, S.A., Elsen, J.M., 1994. Mapping quantitative trait loci in crosses sim.2308.
90
39 between outbreed lines using least squares. Genetics 136 (3), 1195–1207. Xiang, L., Lee, A.H., Yau, K.K.W., McLachlan, G.J., 2007. A score test for over- 91
40 Hall, D.B., Shen, J., 2010. Robust estimation for zero-inflated Poisson regression. Scand. dispersion in zero-inflated Poisson mixed regression model. Stat. Med. 26 (7), 92
J. Stat. 37 (2), 237–252. http://dx.doi.org/10.1111/j.1467-9469.2009.00657.x. 1608–1622. http://dx.doi.org/10.1002/sim.2616.
41 Jansen, R.C., 1993. Interval mapping of multiple quantitative trait loci. Genetics 135,
93
Xu, S., Hu, Zh., 2010. Generalized linear model for interval mapping of quantitative
42 205–211. trait loci. Theor. Appl. Genet. 121 (1), 47–63. http://dx.doi.org/10.1007/s00122- 94
43 Joe, H., Zhu, R., 2005. Generalized Poisson distribution: the property of mixture of 010-1290-0. 95
Poisson and comparison with negative binomial distribution. Biom. J. 47 (2), Xu, S., Yonash, N., Vallejo, R.L., Cheng, H.H., 1998. Mapping quantitative trait loci for
44 96
219–229. binary traits using a heterogeneous residual variance model: an application to
45 Kao, C.H., Zeng, Z.B., Teasdale, R.D., 1999. Multiple interval mapping for quantitative Marek’s disease susceptibility in chickens. Genetics 104 (2), 171–178. 97
46 trait loci. Genetics 152, 1203–1216. Yau, K.K.W., Wang, K., Lee, A.H., 2003. Zero-inflated negative binomial mixed 98
47 Lange, C., Whittaker, J.C., 2001. Mapping quantitative trait loci using generalized regression modeling of over-dispersed count data with extra zeros. Biom. J. 45 99
estimating equations. Genetics 159, 1325–1337. (4), 437–452. http://dx.doi.org/10.1002/bimj.200390024.
48 Lawless, J.F., 1987. Negative binomial and mixed Poisson regression. Can. J. Stat. Zeng, Z.B., 1993. Theoretical basis for separation of multiple linked gene effects in 100
49 15 (3), 209–225. mapping quantitative trait loci. Proc. Natl. Acad. Sci. U.S.A. 90 (1), 10972–10976. 101
50 Lander, E.S., Botstein, D., 1989. Mapping Mendelian factors underlying quantitative Zeng, Z.B., 1994. Precision mapping of quantitative trait loci. Genetics 136, 102
traits using RFLP linkage maps. Genetics 121 (1), 185–199. 1457–1468.
51 103
52

Please cite this article as: Moghimbeigi, A., Two-part zero-inflated negative binomial regression model for quantitative trait loci
mapping with count trait. J. Theor. Biol. (2015), http://dx.doi.org/10.1016/j.jtbi.2015.02.016i

Вам также может понравиться