How to Borrow Information Across Unlinked Data? A Relative Density Approach for Predicting Unobserved Population Distributions

Siwei Cheng , New York University (NYU)

One important development in the current era of social sciences is the growing availability and diversity of data, big and small. Social scientists increasingly rely on multiple datasets in their research. This paper proposes a new methodological approach for borrowing information across unlinked surveys to predict unobserved distributions. This approach relies on the idea of using the observed distributions in both the base and reference data as anchors to adjust for differences across datasets. The key benefit of this approach over prior ones is that it relaxes the conditions of comparable representativeness and comparable measurement across datasets, and instead relies on a weaker condition: that the relative density between the observed and unobserved distributions is the same between datasets. The approach also comes with a method for incorporating and quantifying the uncertainty into its output. We introduce this relative density approach with simulations as well as two empirical applications.

See extended abstract

 Presented in Session 100. Using Big Data in Population Research