Вы находитесь на странице: 1из 67

Multivariate Meta-analysis

Pantelis G. Bagos, PhD


Assistant Professor

Niki L. Dimou, MSc


Phd Candidate

University of Central Greece, Lamia, Greece Department of Computer Science and Biomedical Informatics
Lamia 2012

Meta-analysis

Combining the estimates of several studies The methodology dates back to Fisher The term appeared for the first time in Psychology (Glass, 1976) In its simpler form, it is a weighted average of the estimates Improves the statistical power to detect weak effects

Glass GV. Primary, secondary, and meta-analysis of research. Educational Researcher, 1976; 5: 3-8

Nikolopoulos G, Tsantes A, Bagos PG, Travlou A, Vaiopoulos G. Integrin, alpha 2 gene C807T Polymorphism and Risk of Ischemic Stroke: a Meta-Analysis. 2007, Thrombosis Research; 119 (4): 501-510

Tsantes A, Nikolopoulos G, Bagos PG, Rapti E, Mantzios G, Kapsimali V, Travlou A. Association between the Plasminogen Activator Inhibitor-1 4G/5G Polymorphism and Venous Thrombosis: a MetaAnalysis. 2007, Thrombosis and Haemostasis; 97(6):907-13

Statistical models
Fixed effects models

Random effects models

Multivariate meta-analysis

In many situations we need to model simultaneously two or more effect sizes from each study This may be due to their complementarity (i.e. sensitivity-specificity) There may be multiple treatments, multiple outcomes, or multiple risk factors The joint analysis usually increases the power by borrowing strength from external studies The joint analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model

Higgins JP, Whitehead A (1996) Borrowing strength from external trials in a metaanalysis. Stat Med 15(24): 2733-2749

The model

van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stat Med. 2002;21(4):589-624

Estimation

Based on the multivariate normal distribution

ML, REML and method of moments (non-iterative) Studies reporting a subset of the outcomes are treated as MAR

Berkey CS, Hoaglin DC, Antczak-Bouckoms A, Mosteller F, Colditz GA (1998) Meta-analysis of multiple outcomes by regression with random effects. Stat Med 17(22): 2537-2550 Jackson D, Riley R, White IR.Multivariate meta-analysis: Potential and promise. Stat Med. 2011 Jan 26. Jackson D, White IR, Thompson SG: Extending DerSimonian and Laird's methodology to perform multivariate random effects meta-analyses. Stat Med 2010, 29:1282-129 7

What about within-studies correlation?



Just like the variance, the covariance is assumed known and needs to be available In the majority of the situations however, unlike variance, the covariance is not reported in publications and one needs access to individual data in order to calculate it Ignoring or approximating the correlation results in biased overall estimates of variance

Riley RD, Abrams KR, Sutton AJ, Lambert PC, Thompson JR. Bivariate random-effects metaanalysis and the estimation of between-study correlation. BMC Med Res Methodol. 2007 12; 7:3.

An alternative model

It is different from all the above-mentioned models, in that it uses an estimate for the overall correlation (). That is, rather than partitioning the overall correlation into within-study and between-study components, it uses a single parameter, , to model directly the overall correlation. The additional variation beyond sampling error is indicated by 1, 2. However, these are not directly equivalent to 1, 2, the betweenstudy variances in the general model, although in some circumstances they may be similar. The model is not hierarchical and it can also include studies that provide only one of the 2 endpoints under a missing at random assumption.
Riley RD, Thompson JR, Abrams KR (2008) An alternative model for bivariate random-effects meta-analysis when the within-study correlations are unknown. Biostatistics 9(1): 172-186

The general case



Let Y, X1, and X2 denote three categorical random variables with two levels (i.e. 0,1), used to classify n individuals. Usually, in a retrospective case-control study Y denotes the case-control status and X1, X2 denote the two different exposures. In clinical trials (or in prospective observational studies), Y will denote the disease/non-disease outcome whereas X1, X2 may denote two different treatments (or exposures). Alternatively, Y may denote the single treatment whereas X1, X2 will denote the two alternative outcomes.

Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012

The data

The data (cont.)

The odds ratios

The odds ratios

The main result

The main purpose of this work is to derive an estimate for the covariance and express it using solely the observed counts of the contingency tables (Table 1 and Table 2). I will show that the estimate of the covariance is given by:

It is interesting to note at this point that the covariance depends only on the observed counts nijk, nij+ and ni+k. If nij+ or ni+k becomes zero, a simple correction can be employed adding c= to the cell counts. It is clear that for calculating the covariance requires knowledge of the full distribution of the counts in Table 1 (i.e. nijk ). However, if we recall that nij+, ni+k and n+jk are the minimal sufficient statistics for obtaining the maximum likelihood (ML) estimates of the 2x2x2 contingency table in the case of no three-factor interaction, we realize that the covariance can theoretically be calculated in certain cases even when the nijk are not directly observed, provided that we assume no three-way interaction.

Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine.

Special cases already known in the literature


Dose-response models Genetic association studies Observational studies that share the same group of controls Clinical trials with multiple treatments that share a common placebo group Mutually exclusive outcomes

Dose-response models and Genetic association studies


In such cases where we have two (or more) categories of exposure that are compared against the same baseline group (no exposure), we will have n1+0=n10+ and n00+= n0+0, and thus:

Berrington A, Cox DR. Generalized least squares for the synthesis of correlated information. Biostatistics 2003, 4(3):423-431. Greenland S, Longnecker MP. Methods for trend estimation from summarized dose-response data, with applications to metaanalysis. Am J Epidemiol 1992, 135(11):1301-1309 Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. Stat Appl Genet Mol Biol 2008, 7:Article31 Bagos PG. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature. BMC Genet 2011, 12:8

Observational studies that share the same group of controls/ clinical trials with multiple treatments (common placebo group)

The latter has become very important recently, since it finds applications in the so-called multiple treatment comparison or network meta-analysis. In such a case we will have n01+ =n0+1 and n00+ =n0+0 and thus:

Gleser LJ, Olkin I. Stochastically dependent effect sizes. In: The Handbook of Research Synthesis Edited by Cooper HM, Hedges LV. New York: Russell Sage Foundation; 1994: 339355 Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004, 23(20):3105-3124.

Mutually exclusive outcomes

In this particular situation (i.e. when we have death from cancer, death from other cause and no death at all), the odds-ratios are calculated against all other alternatives and not only against the alive category. Thus, using the notation of Table 2, we will have Y denoting the treatment and X1 and X2 denoting the mutually exclusive outcomes and it is easily understood that the two log-odds ratios will be negatively correlated. To reconstruct this scenario using the notation followed here, we have to resort to Table 1 and observe that n111=n011=0 by design (a person cannot die from both causes) and that the remaining counts are disjoint:

Trikalinos TA, Olkin I. A method for the meta-analysis of mutually exclusive binary outcomes. Stat Med 2008, 27(21):4279-4300

New applications
Combining matched and unmatched case-control studies
Moreno V, Martin ML, Bosch FX, de Sanjose S, Torres F, Munoz N. Combined analysis of matched and unmatched case-control studies: comparison of risk estimates from different studies. Am J Epidemiol 1996, 143 (3):293-300.

Combining population-based and family-based genetic


association studies
Pfeiffer RM, Pee D, Landi MT. On combining family and case-control studies. Genet Epidemiol 2008, 32(7):638-646

Multipoint meta-analysis of genetic association studies Meta-analysis of diagnostic tests


For all the above, only methods that use individual data were available

More information

Bagos PG. On the covariance of two correlated log-Odds Ratios. Statistics in Medicine. 2012 Bagos PG, Dimou NL, Liakopoulos TD, Nikolopoulos GK. Meta-Analysis of Family-Based and Case-Control Genetic Association Studies that Use the Same Cases. Statistical Applications in Genetics and Molecular Biology.2011, 10(1):Article19 Bagos PG. Meta-analysis of haplotype-association studies: Comparison of methods and empirical evaluation of the literature, 2011, BMC Genetics, 12:8 Bagos PG, Liakopoulos TD. A multipoint method for meta-analysis of genetic association studies. 2010, Genetic Epidemiology, 34(7):702-15

http://www.compgen.org/publications/by-subject

Applications in Stata
Baseline risk Diagnostic tests Multiple outcomes with known within studies correlation Multiple studies with unknown within studies correlation Multiple treatments Genetic association studies Mendelian randomization

Requirements
A working version of Stata (www.stata.com) GLLAMM for Stata (www.gllamm.org) mvmeta v.2 for Stata (from within Stata type: net from
http://www.mrc-bsu.cam.ac.uk/IW_Stata/) The Stata do-files are available at: http://www.compgen.org/material/metaanalysis/multivariate

Baseline risk (1)


Example: BCG data The meta-analysis concerns 13 trials on the efficacy of BCG vaccine against tuberculosis.

In each trial a vaccinated group is compared with a non-vaccinated control group. Some covariates are available that might explain the heterogeneity among studies: geographic latitude of the place where the study was done; year of publication, and method of treatment allocation (random, alternate or systematic). The main question behind the discussion on baseline risk is whether the baseline risk (risk in the non-vaccinated group) can be a source of heterogeneity. However, the log-odds ratio and the log-odds of the nonvaccinated group are correlated (regression to the mean)

Colditz GA, Brewer FB, Berkey CS, Wilson EM, Burdick E, Fineberg HV, Mosteller F. Efficacy of BCG vaccine in the prevention of tuberculosis. Journal of the American Medical Association 1994; 271:698 702.

Baseline risk (2)


The data look like this:
Where: X_T: VaccinatedDisease n_T: Vaccinated-No disease X_C: Not vaccinatedDisease n_C: Not vaccinatedNo disease Lat: Latitude alloc: Allocation (1: Random, 2: Alternate, 3: Systematic)

Baseline risk (3)

Figure 1. LAbbe plot of observed log(odds) of the not-vaccinated trial arm versus the vaccinated trial arm. The size of the circle is an indication for the inverse of the variance of the log-odds ratio in that trial.

Baseline risk (4)


We use bcg.do and we perform a univariate random effects meta-analysis.

Baseline risk (5)


The results are identical using mvmeta command: . mvmeta b V, vars(b1 b2) ml

Baseline risk (6)

The conditional variance of the true log-odds, and therefore also of the log-odds ratio, in the vaccinated group given the true log-odds in the not-vaccinated group is which is interpreted as the variance between treatment effects among trials with the same baseline risk. The variance of the treatment effect, measured as the log-odds ratio, calculated from is (1.4313709+2.4073333-2*1.7573268)=0.3240506 So baseline risk, measured as the true log-odds in the not-vaccinated group, explains (0.32405060.1485417)/0.3240506=54 per cent of the heterogeneity in vaccination effect between the trials.

Baseline risk (7)


We can also use gllamm command assuming a bivariate normal distribution:
. gllamm logit grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) s(wgt) constraint(1) from(A) long adapt

Baseline risk (8)


Or a binomial distribution:
. gllamm X grp1 grp2, nocons i(trial) nrf(2) eqs(grp1 grp2) family(binomial) denom(z) link(logit) adapt trace allc

Diagnostic tests (1)


Usually, in diagnostic tests there are two parameters of interest (sensitivity and specificity) which are modelled simultaneously: Table 1. Classification of the findings of a diagnostic test according to the test result (positive or negative) and the disease status for study i (i=1,2, ,k).

The marginal model on which we base the inference is:

Diagnostic tests (2)


Example: RF for the diagnosis of Rheumatoid Arthritis

The method was applied in the data obtained from a meta-analysis that aimed to determine whether Rheumatoid Factor (RF) identifies patients with Rheumatoid Arthritis (RA). A total of 50 studies provided information concerning RF.

Nishimura K, Sugiyama D, Kogata Y, Tsuji G, Nakazawa T, Kawano S, et al. Meta-analysis: diagnostic accuracy of anti-cyclic citrullinated peptide antibody and rheumatoid factor for rheumatoid arthritis. Ann Intern Med. 2007 Jun 5;146(11):797-808.

Diagnostic tests (3)


Thus after running roc.do we get the following output for RF: . mvmeta b V, vars(b1 b2)

Diagnostic tests (4)


The construction of the ROC curves is also feasible for RF:

Diagnostic tests (5)


Or we can use the MIDAS command: midas tp fp fn tn, res(sum)

Known within studies correlation (1)


Example: Periodontitis (PD-AL)

A recent meta-analysis by Antczak-Bouckoms et al. located 5 randomized controlled trials that compared a surgical procedure with a non-surgical procedure for the treatment of moderate periodontal disease. The two outcomes assessed on each patient were (pre- to post-treatment mm changes in) probing depth (PD) and attachment level (AL) which are modeled simultaneously in our example. The goal of treatment is to decrease probing depths and to increase attachment levels around the teeth.

Antczak-Bouckoms, A., Joshipura, K., Burdick, E. and Tulloch, J. F. C. Meta-analysis of surgical versus non-surgical method of treatment for periodontal disease, Journal of Clinical Periodontology, 20, 259-268 (1993).

Known within studies correlation (2)


The data look like this:

Where: PD: Improvement in Probing depth (surgical minus non surgical values) AL: Improvement in Attachment level (surgical minus non surgical values) V11, V22, V12: The within-trial covariance matrix of the two outcomes (means) in trial i

Known within studies correlation (3)


Thus, after running periodontitis.do we get the following output:

Known within studies correlation (4)

Known within studies correlation (5)

Unknown within studies correlation (1)


Example: MYCN data

A systematic review in neuroblastoma sought to establish the prognostic importance of MYCN, a protooncogene. In 17 studies, a log-hazard ratio estimate for amplified versus nonamplified MYCN was available for both disease-free survival (Yi1) and overall survival (Yi2). However, no studies reported the within-study correlations, which are likely to be strongly positive due to the structural relationship between these endpoints. Further, there were 64 studies which provided data for only one of the 2 endpoints.

RILEY, R. D., HENEY, D., JONES, D. R., SUTTON, A. J., LAMBERT, P. C., ABRAMS, K. R., YOUNG, B., WAILOO, A. J. AND BURCHILL, S. A. (2004). A systematic review of molecular and biological tumor markers in neuroblastoma. Clinical Cancer Research 10, 4

Unknown within studies correlation (2)


The data look like this:
Where: b1: log-hazard ratio for amplified versus nonamplified MYCN for diseasefree survival b2: log-hazard ratio for amplified versus nonamplified MYCN for overall survival V11: variance of b1 V22: variance of b2 There is no V12!!!

Unknown within studies correlation (3)


After running mycn.do we get the following output:

Multiple treatments (1)


Example: Cirrhosis and beta-blockers or sclerotherapy

26 clinical trials which investigate the prevention of cirrhosis using betablockers and sclerotherapy Nine randomized clinical trials of beta-blockers and 19 trials of sclerotherapy were reviewed. Crude rates of bleeding and death in treated and control groups were recorded.

Pagliaro L, D'Amico G, Srensen TI, Lebrec D, Burroughs AK, Morabito A, Tin F, Politi F, Traina M. Prevention of first bleeding in cirrhosis. A meta-analysis of randomized trials of nonsurgical treatment. Ann Intern Med.1992 Jul 1;117(1):59-70.

Multiple treatments (2)


The data look like this:
Where: r1: number of bleedings in patients treated with beta-blockers n1: total number of patients treated with beta-blockers r2: number of bleedings in patients treated with sclerotherapy n2: total number of patients treated with sclerotherapy r0: number of bleedings in controls n0: total number of controls

Multiple treatments (3)


After running bleeding.do we get the following output:
. mvmeta b V,vars( b1 b2 ) ml

Multiple treatments (4)


. mvmeta b V,vars( b1 b2) ml bscov(prop C)

.gllamm logit c1 c2 c3 , nocons i(id) nrf(1) eqs(c) s(wgt) constraint(1 ) nip(8) adapt

.gllamm r c1 c2 c3, nocons fam(binom) i(id) link(logit) eqs(c) nrf(1) adapt denom(n)

Genetic association studies (1)


Table 2. A typical layout of the data used in a metaanalysis of genetic association studies involving a single bi-allelic locus with a continuous outcome.

The marginal model on which we base the inference is:

Genetic association studies (2)


Example: Association of AGT M235T polymorphisms with Essential
Hypertension

A total of 7 studies addressed the association of AGT M235T with Hypertension The two logORs derived from the mutant allele (TT vs. MM and MT vs. MM) were modeled simultaneously as a bivariate response.

Bagos PG. A unification of multivariate methods for meta-analysis of genetic association studies. 2008, Statistical Applications in Genetics and Molecular Biology, 7(1), Article 13

Genetic association studies (3)


The data look like this:

Where: aa0: MM genotype for controls, ab0: MT genotype for controls, bb0: TT genotype for controls aa1: MM genotype for cases, ab1: MT genotype for cases, bb1: TT genotype for cases

Genetic association studies (4)


Thus, after running genetics.do we get the following output:
. mvmeta b V,vars(b1 b2) ml

Genetic association studies (5)


It is special case of the general model. It re-parameterizes the y1, y2, using = y1/ y2 and imposes a single between studies variance 2. Thus, is treated as a fixed-effects parameter.

Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 1319-1328

Genetic association studies (6)


Under the genetic model-free approach proposed by Minelli et al. we get the following output (model_free.do):

Minelli C, Thompson JR, Abrams KR, Thakkinstian A, Attia J (2005) The choice of a genetic model in the meta-analysis of molecular association studies. Int J Epidemiol 34(6): 13191328

Mendelian randomisation (1)


Suppose that we have genotype, G, a continuous intermediate
phenotype, IP, and a binary disease outcome D.

The gene is selected because of its influence on the phenotype


and we wish to establish the relationship between phenotype and disease.

The hypothesised causal pathway is:


G IP D

Minelli C, Thompson JR, Tobin MD, Abrams KR (2004) An integrated approach to the meta-analysis of genetic association studies using Mendelian randomization. Am J Epidemiol 160(5): 445-452 Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD (2005) Meta-analysis of genetic studies using Mendelian randomization--a multivariate approach. Stat Med 24(14): 2241-2254

Mendelian randomisation (2)



Let the Log Odds Ratio of Disease given genotype be y1i and the mean difference in phenotype be y2i. The ith study produces two potential estimates y1i and y2i although in practice only one or other may be obtained or reported.

We assume that the within-study correlation of y1i and y2i is negligible. Thus we wish to estimate y1i, y2i and the parameters of between studies heterogeneity.

Mendelian randomisation (3)


Model A

It is a standard multivariate model with W=0. It can be fitted using standard software

Model B

It is similar to model A except that is treated as a random-effects parameter (with variance 2) whereas the within studies correlation is zero (W=0). It needs specialised software

Mendelian randomisation (4)


Example: MTHFR, Homocysteine and CHD

Using the paper by Wald et al. a total of 64 genetic studies were identified (mthfr.dta). 31 evaluated only genotype-disease association, 16 only genotypephenotype association, and 17 both Among the 17 studies evaluating both associations, 7 measured the mean difference in phenotype level with genotype in both cases and controls (2 reporting only combined means), but 4 studies measured homocysteine only in cases and 4 only in controls, while two reports were unclear.

Wald DS, Law M, Morris JK. Homocysteine and cardiovascular disease: evidence on causality from a meta-analysis. BMJ 2002;325:1202.

Mendelian randomisation (5)


The data look like this:
Where: z: logOR TT vs. CC genotypes varz: variance of z y: mean differences in serum homocysteine concentrations (mol/l) for TT vs. CC genotypes vary: variance of y wy: 1/vary wz: 1/varz

Mendelian randomisation (6)


Thus, after running mthfr.do we get the following output:
. mvmeta b V, vars(b1 b2) ml

The results are identical with those derived after fitting Model A.

Mendelian randomisation (7)

Mendelian randomisation (8)


If we assume model B:

We obtain the following parameters estimates:

Mendelian randomisation (9)

Conclusions

Multivariate meta-analysis is an important tool in systematic reviews It can be applied in several settings: in a single 2X2 table with a special scope (baseline risk, diagnostic studies) in cases of several outcomes or risk factors (multiple outcomes) in case of a single outcome or risk factor which is however a vector (genetic association) The within studies correlation is very important The multivariate analysis usually increases the power by borrowing strength from external studies The multivariate analysis takes into account the correlation of the estimates and thus, it allows the comparison of the estimates after fitting the model Recent developments in statistical theory and software allows easily fitting such models

Thank you
Questions?

Вам также может понравиться