← Back to all modules
MODULE 3

Data Provenance Check

Data is never neutral. Every dataset embeds the power dynamics of who collected it, who was excluded, and whether consent was meaningful. This module audits your data sources.

🚨 Central question: Can the system that creates the problem ethically study it?
Audit of data sources: Carceral policy research
Bureau of Justice Statistics (BJS) National Prisoner Statistics
Annual state prison population counts, 1978–2022
Who collected this data?
The U.S. Department of Justice—the same federal entity that funds police, prosecutors, and prisons. The carceral system studying itself.
🔴
Critical concern: The institution benefiting from mass incarceration controls the metrics. No external oversight. Categories designed for system efficiency, not justice.
What power dynamics exist?
Prison administrators report counts to the agency that funds them. Incentive to undercount violence, overcount "rehabilitation" metrics, frame incarceration as necessary.
⚠️
Power imbalance: Incarcerated people have no say in how they're counted, categorized, or represented. Their stories become statistics controlled by their captors.
Who was excluded from this dataset?
People in jails (not prisons), immigration detention, juvenile facilities, psychiatric institutions, those who died in custody but were removed from counts.
🔴
Missing ~700,000 people: Jail populations alone = 630,000 on any given day. Your "incarceration rate" undercounts actual carceral control by 30%+.
Was consent meaningful?
No. This is administrative data. Incarcerated people cannot refuse to be counted. They have no control over how data is used.
🔴
Consent impossible: When the state has total control over your body, "voluntary participation" doesn't exist. Every use of this data is extraction without consent.
FDIC National Survey of Unbanked and Underbanked Households
Biennial household banking access survey, 2009–2021
Who collected this data?
Federal Deposit Insurance Corporation—the agency that regulates and insures banks. Data serves financial institutions' interests first.
⚠️
Regulatory capture risk: FDIC exists to protect banks, not consumers. Survey categories reflect industry concerns (account ownership) not justice concerns (exclusion causes).
What power dynamics exist?
Questions ask "Do you have a bank account?" not "Has a bank ever denied you?" Frames lack of banking as individual choice, not structural barrier.
🔴
Deficit framing embedded: Survey language blames individuals ("unbanked") rather than banks ("exclusionary"). The question design assumes banking is accessible and people opt out.
Who was excluded from this dataset?
Currently incarcerated people (1.9M), homeless individuals without addresses, undocumented immigrants who avoid government surveys, people in psychiatric facilities.
🔴
Invisibility by design: The MOST financially excluded are absent. Survey only reaches people with stable housing and phone access. Your "unbanked rate" is a floor, not a ceiling.
Was consent meaningful?
Technically voluntary, but respondents don't know how data will be used. FDIC shares findings with banks. No compensation for participants.
⚠️
Secondary use problem: People consent to "research" not knowing their responses become market intelligence for the institutions excluding them.
Who is invisible across ALL your datasets?
🚫
Currently incarcerated: ~1.9 million people completely absent from FDIC survey, miscounted in Census
🚫
Jail populations: ~630,000 people (30% of carceral population) excluded from BJS prison counts
🚫
Homeless individuals: 580,000+ people invisible to household surveys, undercounted in Census
🚫
Undocumented immigrants: ~11 million people who avoid government data collection
🚫
Dead from incarceration: People who died during or shortly after custody, removed from counts
The provenance problem
Every dataset you're using was collected BY institutions of power ABOUT people they control, exclude, or surveil. The data you trust was designed to serve the system, not the people. Your analysis captures the aftermath of incarceration but is blind to active incarceration. The most excluded are invisible.
Required reflections before using this data:
You must answer these questions explicitly in your ethics documentation. If you cannot answer them satisfactorily, reconsider whether your study is ethical to conduct.
Given that carceral systems collected data about the people they incarcerate, how can you use it without reinforcing their power?
Your data undercounts the most excluded by 2+ million people. How does this absence shape your conclusions?
Can you justify using data collected without consent from people who had no choice but to participate?
How will you ensure your findings aren't weaponized by the same systems that generated the data?
Continue to Module 4: Language Scanner →