Joint Analysis of Independent Datasets: Application to Genetic Effects of Breast Cancer Survival

Igor Akushevich , Duke University
Arseniy Yashkin, Duke University
Bryce Durgin, Duke University
Julia Kravchenko, Duke University
Konstantin G. Arbeev, Duke University
Anatoliy I. Yashin, Duke University

No observational study can collect all of the data necessary to fully model a given pathological process in a diverse population. This leads to the existence of complementary datasets varying in size and level of detail that are not directly linkable (e.g. survey data, genetic data, administrative databases, disease registries) but represent important aspects of the same pathological process. In this paper we develop an approach for joint analyses of such data. We relate genetic markers to stage-specific survival after breast cancer diagnoses using i) HRS-Medicare data with hundreds of cases, extensive genetic measurements, but without stage or other cancer characteristics, and ii) SEER-Medicare with millions of cases, detailed cancer characteristics, but without genetic measurements. Since the same underlying model generates both datasets, the likelihood function is expressed using the same set of model parameters for both datasets. The approach is illustrated by simulation studies and application to real data.

See extended abstract

 Presented in Session 3. Population, Development, & the Environment; Data & Methods; Applied Demography