print share
Version HistoryVersion History


Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data (Scientific Reports)

Publication Topics

California Health Interview Survey; 2013 California Health Interview Survey (CHIS 2013); 2014 California Health Interview Survey (CHIS 2014)

Publication Type

CHIS Journal Article

Publication Date


Author 1

<a onclick="OpenPopUpPage('http:\u002f\\u002f_layouts\u002flistform.aspx?PageType=4\u0026ListId={7AAD61FA-4BCB-48C0-B0B7-87AFDC3673EF}\u0026ID=1658\u0026RootFolder=*', RefreshPage); return false;" href=";ListId={7AAD61FA-4BCB-48C0-B0B7-87AFDC3673EF}&amp;ID=1658&amp;RootFolder=*">Christina M. Ramirez</a>

Author 2

<a onclick="OpenPopUpPage('http:\u002f\\u002f_layouts\u002flistform.aspx?PageType=4\u0026ListId={7AAD61FA-4BCB-48C0-B0B7-87AFDC3673EF}\u0026ID=151\u0026RootFolder=*', RefreshPage); return false;" href=";ListId={7AAD61FA-4BCB-48C0-B0B7-87AFDC3673EF}&amp;ID=151&amp;RootFolder=*">et al</a>

Author 3

Author 4

Author 5

Author 6

Author 7

Author 8

Author 9

Author 10

Author 11

Author 12

Author 13

Author 14


​Survey responses in public health surveys are heterogeneous. The quality of a respondent's answers depends on many factors, including cognitive abilities, interview context, and whether the interview is in person or self-administered. A largely unexplored issue is how the language used for public health survey interviews is associated with the survey response. Authors introduce a machine learning approach, Fuzzy Forests, which they use for model selection. They use the 2013 California Health Interview Survey (CHIS) as the training sample and the 2014 CHIS as the test sample. 

Authors find that non-English language survey responses differ substantially from English responses in reported health outcomes. 

Heterogeneity among the Asian languages suggest that caution should be used when interpreting results that compare across these languages. The 2013 Fuzzy Forests model also correctly predicted 86% of good health outcomes using 2014 data as the test set. 


Article 1

Journal Article: Using Machine Learning to Uncover Hidden Heterogeneities in Survey Data

Article 2

Article 3

Article 4

Article 5

Article 6

Article 7

Article 8

Article 9

Article 10

Article 11

Article 12

Press Release

Related Link 1

California Health Interview Survey (CHIS)

Related Link 2

Related Link 3

Related Link 4

Related Link 5

Related Link 6

Related Link 7

Related Link 8

Related Link 9

Related Link 10

Related Link 11

Related Link 12

Related Link 13

Related Link 14

Related Link 15

Related Link 16

Version: 3.0
Created at 11/10/2019 5:48 PM by i:0#.f|uclachissqlmembershipprovider|celeste
Last modified at 11/10/2019 6:00 PM by i:0#.f|uclachissqlmembershipprovider|celeste