What the structure ignores, the Social Determinants of Data reveals.
This glossary defines key terms within the Social Determinants of Data (SDOD) framework, spanning data conditions, mechanism, systems, and practices that shape how data is produced and interpreted.
Framework Concepts
- Social Determinants of Data (SDOD): A conceptual framework that recognizes data quality, completeness, and representation as outcomes shaped by systemic, institutional, and social forces—not merely technical issues.
- Data Equity: The principle that health equity requires equity in the data itself—ensuring that all groups are accurately and consistently represented in the evidence used for decision-making.
Methodologies
- Decision Point Analysis: A systematic methodology within SDOD for identifying and analyzing specific institutional moments where choices are made that create, maintain, or obscure data visibility or invisibility. Decision Point Analysis traces four interconnected dimensions: (1) What was decided? — What institutional choice was made about data collection, definition, access, or documentation? (2) Who decided and why? — What pressures, incentives, policies, or priorities drove the decision? (3) Which SDOD mechanisms operated at this decision point? — Was it Institutional Optionality? Funding gaps? Technological constraint? Power asymmetries? Political priorities? (4) What were the consequences? — Who became visible or invisible as a result? What became measurable or unmeasurable? What policy decisions were made (or prevented) based on this choice?
- Why it matters: Decision Point Analysis reveals that data invisibility is not accidental—it is the cumulative result of specific choices made by specific institutions at specific moments in time. By analyzing decision points, researchers can identify leverage points for change (if X drove the choice, changing X could alter the decision) and trace power through the system (whose voice was heard? whose was absent?).
- Real-world example: NC DHHS changed its COVID-19 dashboard from allowing Excel downloads to PDF-only downloads. Decision Point Analysis asks: When was this decision made? Who made it? Was it cost-saving? User feedback? Resistance to independent analysis? Which SDOD mechanisms operated here (Technological Design? Institutional Optionality? Political priorities)? What became harder to measure as a result (independent researchers’ ability to analyze data; accountability to external scrutiny)? What leverage points exist to reverse it (policy requirement for accessible formats, funding for interoperable systems, community pressure)?
- The Data Equity Triad: A framework for analyzing and implementing the three mutually interdependent components required to achieve data equity in health systems. The Triad consists of: (1) Redesigned Technology/Infrastructure — System integration, EHR workflows, automated requirements, and technological scaffolding that makes equitable data collection possible; (2) Trained Staff/Capacity Building — Education on why certain data matters, how to collect and document with cultural responsiveness, change management, and institutional cultural shift; (3) Policy/Governance/Requirement — Data collection made mandatory (not optional), adequately funded, monitored for quality and equity outcomes, and subject to institutional accountability.
- Why it matters: All three components must be present simultaneously. Remove one—or even weaken one—and data equity becomes impossible or unstable. The Triad reveals why many data equity initiatives fail: institutions often attempt only one or two components without the third, creating unsustainable burden (e.g., trained staff with no technology support, or mandated data collection with no staff training).
- Real-world example: Duke Health integrated ethnicity and language data collection into their general EHR intake process 20 years ago. Why it worked: They redesigned their system-wide intake workflows to require ethnicity documentation (technology), trained staff on its importance and cultural responsiveness (capacity), and made it institutional policy (governance). The result is visible in their COVID-19 data: in 2020, Durham County—where Duke Health is headquartered—had only 4 COVID-19 deaths with missing ethnicity data, compared to 26.78% missing ethnicity in Robeson County. This wasn’t because Robeson County’s population is different; it was because their health systems lacked one or more components of the Triad. Duke’s success demonstrates that when all three are present, data equity is achievable and sustainable over decades.
- Anti-Dashboard: A data visualization and analytical tool that inverts the logic of traditional dashboards by making visible what data systems are not measuring, rather than what they are. Instead of displaying aggregated metrics and trends, the Anti-Dashboard systematically documents data missingness—including percentages of missing/unknown fields by category, geographic variation in completeness, temporal trends, intersectional gaps, and the policy/clinical consequences of incomplete data.
- Purpose: The Anti-Dashboard operationalizes SDOD analysis by:
- Making institutional choices about data collection visible and measurable
- Documenting whose realities are rendered invisible and at what cost
- Identifying decision points where institutional change could improve data equity
- Creating records of data voids even when official tracking systems are suppressed or dismantled
- Shifting focus from “what can we do with the data we have” to “why are we not measuring this, and what would change if we did?”
- Application: Anti-Dashboards can be developed for any data system where missingness is patterned and consequential—maternal health, occupational health, mental health outcomes, disability documentation, immigration-related data, and more. They are tools for institutional analysis, community advocacy, and research.
- Purpose: The Anti-Dashboard operationalizes SDOD analysis by:
Data Conditions / Outcomes
- Data Quality: The degree to which data is accurate, complete, consistent, and representative of the population it is intended to describe. In SDOD, data quality is understood as an outcome of the systems that produce data.
- Data Disparities: Instances where certain communities (often marginalized or minority populations) are systematically undercounted, misrepresented, or rendered invisible in datasets due to structural inequalities.
- Data Ghost/Statistical Invisibility: A phenomenon where a population exists in reality but is absent from data records, often leading to misdirected resources and ignored health needs.
- Boundary Data Ghosts: The most extreme cases; populations at the furthest edges of institutional systems, with minimal or no meaningful data documentation. Unclaimed decedents exemplify boundary data ghost status.
- Missingness: The state of data being absent or “unknown.” In the SDOD framework, missingness is often non-random and produced by structural, institutional, and procedural conditions. There are three types of missingness: Random, Systematic, and Structural. These three types often coexist in the same dataset, and misidentifying structural missingness as random is itself a political act — it frames an equity problem as a technical inconvenience and forecloses the interventions that would actually address it.
- Random Missingness: Occurs when data is absent due to chance — an isolated human error, a one-time technical glitch, or an unpredictable circumstance. It has no pattern and does not consistently affect any particular group.
- Systematic Missingness: Occurs when data is absent due to consistent, repeatable failures in a process — patterns of omission that recur predictably across time, geography, or actors, but without being rooted in broader social inequality.
- Structural Missingness: Occurs when data is absent as a consequence of deeper social, institutional, and political inequities — when the conditions that make certain communities vulnerable to health disparities are the same conditions that make them less likely to be counted. This is the type of missingness that SDOD is specifically concerned with. It is not a glitch or even a flawed workflow — it is the predictable output of systems that were never designed with certain communities in mind.
- Data Visibility: The extent to which individuals or groups are represented and identifiable within a dataset. Limited visibility can result from missingness, misclassification, or exclusion from data collection processes.
- Categorical Invisibility: The outcome of categorical neglect; a state in which a population exists but cannot be studied, analyzed, or included in decision-making because they have not been identified within any categorical framework. Results from institutional choices, not accidents.
- Protective Nondisclosure: A rational response by individuals or communities to decline disclosing sensitive identity or health information to institutions perceived as unsafe or harmful. Emerges from legitimate fear of surveillance, discrimination, custody loss, immigration enforcement, or other institutional consequences. While protective—a survival strategy—it creates data invisibility. In SDOD, protective nondisclosure is understood not as patient error but as a consequence of power asymmetries and institutional mistrust. Protective Nondisclosure is about the subject—the patient, the community member—having reasons not to disclose information about themselves to institutions.
- Definitional Instability: The condition in which the boundaries of a data category are drawn, redrawn, or inconsistently applied across institutions or time periods, such that who or what gets counted changes not because the underlying phenomenon changed, but because of institutional choices about how to define it. Definitional instability operates both horizontally — when different agencies apply different definitions simultaneously — and vertically — when a single institution sets or narrows definitional boundaries in ways that systematically exclude a known population from the count. In either case, the result is data that reflects institutional choices rather than social reality, making meaningful comparison across time or place unreliable and rendering affected populations statistically invisible.
- Perverse Incentives: The condition in which the institutional or professional incentive structure discourages accurate data collection or disclosure because honesty about outcomes creates risk — legal, professional, reputational, or social — for the person or institution responsible for recording. When disclosure of accurate information harms the discloser, systematic undercounting or misclassification becomes a rational institutional response rather than an aberration. The result is data that reflects the self-protective choices of those with recording power rather than the actual experiences of the population being measured. Perverse Incentives operates differently from Protective Nondisclosure. Perverse Incentives is about the recorder—the physician, the institution, the registrar—having reasons not to document accurately.
Mechanisms
- Denominator Exclusion: A statistical practice in which individuals with “missing” demographic data are excluded from the total count (the denominator) when calculating percentages, leading to skewed results that hide the true impact on a community.
- Misclassification: The incorrect categorization of an individual’s demographic or clinical characteristics in a dataset, often due to assumptions, system limitations, or lack of information.
- Categorical Neglect: The institutional choice to not identify, measure, track, or prioritize a particular demographic category in data systems. Often manifests as making data collection optional, not funding infrastructure, or failing to require documentation. Categorical neglect is reversible through policy and funding decisions.
- Cascading Invisibility: A pattern within SDOD wherein institutional choices to not collect, track, or measure data at one layer create structural conditions that enable and compound invisibility at subsequent layers. Each layer of invisibility makes the next layer easier to maintain, resulting in comprehensive erasure rather than isolated data gaps.
- Example: When ethnicity is not collected in patient records (Layer 1), language access disparities cannot be measured for that population (Layer 2), and the impact on clinical outcomes cannot be analyzed (Layer 3), rendering both the population and the determinants of their health invisible. Cascading invisibility demonstrates how institutional indifference compounds into categorical erasure.
- Funding-Linked Visibility: The phenomenon by which populations, conditions, or activities become countable in data systems as a byproduct of being attached to a financial transaction. When a funding stream is created for a service or population, it generates administrative records that function as data — making that population visible to policymakers, researchers, and planners. Visibility is therefore not a neutral outcome of need, but a consequence of the decision to fund.
- Fiscal Invisibility: The structural absence of data that results from a population or activity falling outside any funded category. Because financial transactions are among the most reliable generators of administrative data, populations for whom no payment mechanism exists generate no data trail by default. Fiscal Invisibility reframes data gaps not as neutral omissions but as the predictable consequence of defunding or non-funding decisions.
- Evidential Gatekeeping: A self-perpetuating mechanism in which the absence of data is used to justify withholding the funding that would generate that data. Decision-makers require evidence of need before allocating resources for data collection, while the evidence of need cannot be produced without funded data collection. The result is a closed loop that systematically maintains the invisibility of underfunded populations and entrenches existing gaps as permanent.
Systems & Infrastructure
- Data Governance: The policies, standards, and practices that manage how data is collected, used, and shared. SDOD-informed governance explicitly accounts for the human, institutional, and social context in which data is produced.
- Data Linkage: A technical process used to improve accuracy by connecting different databases (e.g., linking death records with tribal registries or healthcare records) to correct racial or ethnic misclassifications.
- Data Sovereignty: The right of a community—particularly Indigenous groups—to maintain control over the collection, ownership, and application of data about their people.
- Data Production: The process through which data is generated, recorded, processed, and made usable. In the SDOD framework, data production is understood as a social and institutional process—not a neutral or purely technical one.
- Data Pipeline: The sequence of stages through which data moves—from collection and entry to verification and output. In SDOD, each stage of the data pipeline is shaped by structural and institutional forces.
- Structural Culture: The embedded beliefs and practices of an institution.
- Structurally Responsive (Institutional Culture): The enabler; institution responds to marginalized populations’ needs.
- Structurally Unresponsive (Institutional Culture): The obstacle; institution fails to respond to marginalized populations’ needs.
- Data Blindness: Systems designed in ways that make them unable to see certain data voids or populations. No individual oversight; the system itself lacks mechanisms to recognize what it’s not measuring.
- Structural Indifference: Institution is structured in ways that demonstrate it does not care enough about certain populations to design responsively for their needs. Indifference operates through structure, not through individual malice.
- Institutional Disregard: Active choice (embedded in institutional design and priorities) to not regard/prioritize certain populations’ needs or data. The institution’s structure reflects that these population/needs don’t matter.
Practices & Human Factors
- Culturally Responsive Training: Education for frontline data workers (like funeral directors or intake staff) that emphasizes the importance of accurate demographic data and teaches how to ask for identity information respectfully and responsively.