State of the USA logo


DataBasics
Hispanic Life Expectancy: How Statisticians' Data Issues Were Solved

By Scott Gilkeson
October 15, 2010

Recently, the Centers for Disease Control and Prevention released, for the first time, estimates of life expectancy that include Hispanics as a distinct demographic.

It may come as a surprise that we are only now getting official figures on Hispanic life expectancy. The federal government has been collecting statistics for Hispanics since 1972, and Hispanics are the country's largest minority, surpassing non-Hispanic blacks in 2006. But there are several issues that contributed to the delay, and that make calculating such statistics for Hispanics problematic.

Usually, when you see statistics for Hispanics, they come from a survey where respondents are asked to identify their ethnicity, officially termed "Hispanic or Latino." Hispanic ethnicity is self-reported - that is, you are Hispanic if you say you are. The same is true for race, by the way, which is a separate concept. Hispanics can be of any race.

See related article and visualization: New CDC Data Show Hispanics Outlive Other Americans

Death Certificates, Funeral Directors, Unreliable Data

Life expectancy is derived from life tables, which are generally based on data from both vital statistics records (death certificates) and from census data. And it wasn't until 1989 that a checkbox for Hispanic origin was added to the standard death certificate. It was another eight years before all states were using the new standard certificate.

There are other issues, as well. Clearly, no one can self-report race or ethnicity on a death certificate. Ideally, a close relative would do so, but there is not always someone available, and it is often not a good time to have relatives filling out forms. Instead, funeral directors or other officials will make a determination, based on their own perception. Research has shown that misclassification of Hispanic origin results in an underestimate of about 5 percent for Hispanic deaths.

In addition, there is a known problem with misstatement of age at the oldest ages, in both vital statistics and census data. Age exaggeration is most pronounced in the black population and some Latin American populations 80 years old and older, who where born when births may not have been officially recorded.

Medicare data have been used in the estimation of life tables since 1997, but Medicare data are unreliable for the Hispanic population. Racial and ethnic data in Medicare records come from Social Security Administration information, which added an option for Hispanic ethnicity in 1980. But it is difficult to separate out information about pre- and post-1980 Medicare enrollees. One study that linked Current Population Survey and Medicare records found that only eight percent of people who self-identified as Hispanic on the CPS were classified as Hispanic in the Medicare database.

New Approach to Life Tables

To calculate life tables by Hispanic origin, the National Center for Health Statistics used death counts from death certificates, census population estimates, and death and population counts for Medicare beneficiaries aged 66 to 100. There is an excellent paper available on the NCHS website (PDF), which I will summarize here.

Each category (non-Hispanic white and black and Hispanic) is adjusted proportionately for the small number of death certificates with unknown age. Then age-specific ratios derived from the National Longitudinal Mortality Study are applied to adjust for racial and ethnic misclassification. The NLMS links decennial census and Current Population Survey data to vital statistics mortality records, and the classification ratio is derived by comparing age- and sex-specific CPS race and ethnicity counts to death certificate counts for the period 1990 - 1998.

To adjust for anomalies, including those associated with misreporting age at death, the adjusted death counts and midyear population are grouped into five-year age groupings and smoothed using a mathematical process known as Beer's ordinary minimized fifth difference formula.

The probability of dying at a particular age is calculated using different techniques for different ages. For infants (less than one year old), linked birth and infant death records for 2005 and 2006 are used. They report race and ethnicity reliably because the mother's race and ethnicity from the birth certificate is available. For ages one to 99, the smoothed adjusted death counts and midyear population numbers are used. For ages 100 and over, a predictive model is used.

Improving Estimates and the 'Hispanic Paradox'

To improve the estimates for non-Hispanic whites and blacks ages 66 to 100, Medicare records are applied using a formula that gives increasing weight to the Medicare data with increasing age.

For Hispanics, several adjustment methods were tested and the Brass relational logic model, which predicts the mortality of one population relative to another, was chosen. The mortality pattern of the non-Hispanic white population is used as the "standard" and is fit to Hispanic data in the age interval 45 to 80 to estimate probabilities of death for ages 76 to 100.

Estimated life expectancy for the Hispanic population turns out to be 80.6 years--2.5 years longer than for non-Hispanic whites and 7.7 years longer than for the non-Hispanic black population. This is consistent with numerous studies that have identified a "Hispanic paradox." Hispanics have a lower death rate and lower infant mortality than the non-Hispanic white population despite the Hispanic population's lower socio-economic status.

The NCHS report points out that some of this discrepancy could be because the data estimation methods are not error free. But studies designed to see if estimation error could account for all of this advantage have shown that is unlikely. There are a number of differences that could be factors, but no clear cause for the effect.

Scott Gilkeson, chief data officer of State of the USA, is a user experience and data visualization expert with a 20-year history of innovative information delivery.

© 2010 State of the USA.