DUPRI, with the support of SSRI, facilitates access to InfoUSA data from 2006-2022. InfoUSA is a research company that compiles proprietary market data and resells it for market research and strategy. They provide longitudinal data on each household in the United States. Each household is assigned a unique family ID, allowing for tracking of their movements over the past 15 years. The data set contains information about household characteristics (length of residence, number of children, ethnicity, marital status), household location (address, latitude and longitude, census block group), and household socioeconomic standing (estimated income, estimated home value, estimated wealth). The data set contains data for every zip code in the United States.
DUPRI has created a simple codebook that describes the variables available in data set.
Individuals who are interesting in accessing InfoUSA data should contact Mark Yacoub. Users are required to sign a pledge of confidentiality due to the sensitive nature of the data. After the pledge is signed, users will be given access to a restricted folder that contains the data and additional documentation. Instructions for accessing the folder and getting started working with the data are available here.
In addition to the codebook, DUPRI, in collaboration with colleagues in the Department of Civil and Environmental Engineering and the Department of Computer Science, is providing to researchers the ability to automate the subsetting of InfoUSA data geographically and by year. Through a series of Python scripts, researchers are able to submit either a list of desired ZIP codes or a shape file of a desired region, along with the relevant years of data. These inputs are then processed within the Duke Compute Cluster, and researchers will receive merged geographic data for each requested year. We are also working on the development of a web front-end where researchers can submit their requests and receive subsetted and merged data. Documentation for the scripts is available here. Please contact Mark Yacoub if you are interested in using this service.
Illustrative Activities
- An analysis of the North Carolina portion of the InfoUSA data from 2006-2018 can be found here. It examines trends in migration, age, race, age, and home ownership over the time period.
- Christopher Timmins and Peter Christensen have a working paper titled "The Damages and Distortions from Discrimination in the Rental Housing Market" that uses InfoUSA's Residential Historical Database to examine discrimination in the rental housing market. They find that discrimination imposes costs equivalent to 4.7% of annual income for renters of color, and that search behavior results in greater welfare costs for African Americans as their incomes rise. Renters of color must make substantial investments in additional search to mitigate the costs of these constraints.
- Christopher Timmins, Peter Christensen, and Ignacio Sarmiento-Barbieri have a working paper titled "Racial Discrimination and Housing Outcomes in the United States Rental Market" that makes use of InfoUSA's consumer database paired with a large-scale experiment. They find that renters of color continue to face discriminatory constraints in the majority of U.S. cities. Stronger discriminatory constraints on renters of color (particularly African Americans) are also associated with higher levels of residential segregation and larger gaps in intergenerational income mobility.