Imputation Methods for Increasing Racial/Ethnic Data Disaggregation
Imputation Methods for Increasing Racial/Ethnic Data Disaggregation
Published: 10/08/2021

Surveys offer excellent first-person, self-reported data. But sometimes, survey data contains missing data in some response sets. Imputation is a technique for replacing missing data with a substitute value that retains the integrity of the full data set. On October 8, 2021, the National Network of Health Surveys hosted a workshop on Imputation Strategies for Racial/Ethnic Data Disaggregation. The presenter discusses measuring health inequities when administrative or self-reported race/ethnicity data is incomplete, including indirect estimations, the Bayesian Improved Surname Geocoding (BISG) Method, and Medicare Bayesian Improved Surname Geocoding (MBISG) Method.

Marc Elliott, PhD, Senior Principal Researcher and Distinguished Chair in Statistics at RAND Corporation

About the National Network of Health Surveys' Advancing Health Equity Through Data Disaggregation Workshop Series 
Disaggregated race/ethnicity data is needed to expose gaps in health equities and inform policies and programs and close those gaps. The National Network of Health Surveys, part of the UCLA Center for Health Policy Research, offers a series of workshops designed to improve the disaggregation of race and ethnicity measures in health data sources. Our goal is to boost the number of subpopulation categories made available to key constituencies working to improve health equity. This is especially important for representing communities that are often “hidden” in large health data sets.

Marc Elliott
Marc Elliott
Download the presentation slides.

Topics and Timestamps
Measuring Health Inequities when Administrative or Self-Reported Race/Ethnicity Data is Incomplete: Marc N. Elliot, PhD (3:48)

Race/Ethnicity Data Are Essential for Monitoring Quality, Coverage, Cost, and Access (6:02)

  • Datasets are not always perfect. How do you then proceed when self-reported data doesn’t meet the “golden standard.” 

Indirect Estimation of Race/Ethnicity Provides Valid Group-Level Estimates (9:55)

  • The power of indirect estimation on understanding group population race/ethnicity on a group level

The Bayesian Improved Surname Geocoding (BISG) and Medicare Bayesian Improved Surname Geocoding (MBISG) Methods (12:22) 

  • Discussion includes components of the algorithm, and racial/ethnic categories for probabilities

BISG Generates Probabilities, not Classifications (18:52) 

Indirect Estimation Can Fill a Gap as Data Accumulates (21:04) 

  • Blending and smooth incorporation of indirect estimates into datasets  

Understanding the Algorithm – BISG (22:01)  

Surnames Can Be Linked to Race and Ethnicity Probabilities (22:01) 

  • Breakdown of statistical probabilities of race and ethnicity assigned to the 10 most common surnames on the 2010 US Census. 

Enrollee Residential Addresses Can also be Linked to Race/Ethnicity Information (23:31) 

  • Highlights an illustration demonstrating how racial/ethnic probabilities can be drawn from addresses and census block groups


Combining Sources of Information (23:49)  
  • Utilization of Bayes’ Theorem to generate “posterior” and mutually exclusive probabilities

MBISG – Developed to Help CMS Measure Racial/Ethnic Equity in HEDIS Data (25:28) 

  • Discussion includes visualization of misclassifications and overview of the MBISG methods

  • Data and Methods of MBISG (29:41) 
    • Quick overview of performance validation
  • MBISG C-statistics for SSA Race Variable and MBISG 2.1 (30:40)  

Concordance of BISG 1.0, Modified MBISG, MBIFSG 2.1 with Self-Report (31:49) 

  • Understanding when to use each model

Applications (42:05)
Examples of circumstances where these methods have been previously used   

  • Specific and more elaborate breakdowns of individual applications of the methods can be found at the following times:
    • Application 1: National and Contract-Specific Stratified Reporting by Race/Ethnicity (45:36) 
    • Application 2: A Health Equity Summary Score to Incentivize Excellent Care to At-Risk Groups (45:50) 

    • Application 3: Measuring racial/ethnic difference in voluntary disenrollment from Medicare plans (46:33)     
    • Application 4: Inputting Race and Ethnicity for Survey Respondents who did not report Race and Ethnicity (47:42)

    • Application 5: Estimating Enrollment by Race and Ethnicity in Marketplace Plans (49:11) 
    • Application 6: Racial and Ethnic Inequities in Behavioral Health Quality Measures in Medicare Health Plans (49:20)

    • Application 7: Detect and Correct Bias in Clinical Decision Algorithms or p4p Schemes (50:36) 

    • Application 8: Combine with GIS Mapping-Provider A & B Diabetic Patients (53:32) 


When Race and Ethnicity Probabilities Can be Used Directly (43:43) 


    Breakdown of options and examples for direct application

Concerns (53:46) 

MBISG Is Primarily Used to ENHANCE Limited Self-Reported Data (54:03)

Geographic Stratification as an Alternative to MBISG 2.1 (55:00) 

  • Why MBISG 2.1 is more useful

An Analogous Use of Indirect Estimations by the U.S. Census Bureau (56:07) 

  • How indirect estimations address disparities

Using the (M)BISG (58:09) 

  • Links to resources on how to access BISG Website and other tools

Future Directions (58:44) 

  • How (M)BISG will be utilized moving forward