Module 3: Assessing Data Credibility
Beware of Bias
Be aware that some data sources may be biased such as:
- Organizations reporting on themselves
- Data that is generated by interest groups
- Data that is self-reported where they may be room for embellishment or incentives to inaccurately report (e.g. individuals reporting their own salary data)
Review the data sets you are using and make sure that it makes sense. Review how the data is collected and how terms are defined. Some knowledge and research on the topic will help. Consider: new sets of data against past years, data series that shows drastic changes should be investigated and understood before it is presented. It may not be the quality of the data that needs to be considered but how it is presented.
Examples
Data may not be biased exactly but may be socially constructed. For instance, here is a map showing racial change in Hartford, Connecticut from 1900-2018[1]. Over time, definitions of race have changed and new terminology has emerged and become commonplace. In developing illustrations to visualize this data, you would want to be careful to acknowledge these changes. The explanation at the bottom of the graph helps to explain this as accurately as possible. There is not necessarily one correct way to display this data. When developing the visualization, clearly explain your choices and limitations.
How to Recognize Bad Data
As much as possible, try to recognize bad data. The following could be red flags:
- Empty/blank cells: Ask if the respondents did not answer this information or if it is simply incomplete.
- Data that doesn’t make sense: For instance, dates should be in a date format. Postal codes should be written as Letter/Number/Letter Space Number/Letter/Number.
Many open data sets come with source notes. Take the time to review the notes to understand how the data was collected and what it does (and doesn’t) represent.
- "Steven Manson, Jonathan Schroeder, David Van Riper, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 14.0 [Database]. Minneapolis, MN: IPUMS. 2019. DOI: http://doi.org/10.18128/D050.V14.0. Retrieved February 14, 2022 from https://ontheline.github.io/otl-racial-change/index-caption.html" ↵