When AI is Fluent in Data but Illiterate in Context
Beyond the Algorithm
What happens when AI reads African data through the wrong frame and no one in the room knows enough to notice.
The output was clean. Structured. Confident. The generative AI tool had processed survey responses from 191 respondents and returned a set of neatly labelled themes. One of them appeared repeatedly across the data: “Misinformation Resistance.”
I stared at it for a long time.
The survey was about perceptions of Fourth Industrial Revolution technologies (AI, IoT, blockchain) in a specific Congolese context. I had collected the data, I understood the political and historical texture of the community being studied. So what the AI tool(ChatGPT) had labelled “Misinformation Resistance” was not that. Not even close.
What the responses actually reflected was something more specific, more historically grounded, and entirely rational: a deep, politically informed distrust of institutions. A community whose relationship with governance (colonial administration, post-independence instability, extractive foreign intervention, cycles of conflict) gave them every reason to be skeptical of new technologies promising transformation. This was a coherent epistemic posture developed over generations of having good reasons not to trust. The AI tool had taken a political trust phenomenon and filed it under cognitive bias. It had done this cleanly, confidently, and without any visible indication that something had gone wrong.
That gap between what the model produced and what the data actually meant was only visible to me because I knew the context. Which raises a question that I have not been able to stop thinking about: what happens in all the cases where no one in the room does?
This Is Not a Technical Error
That is the first and most important thing to understand. The model did not malfunction. It processed the data correctly, applied a coherent framework, and returned a result that was internally consistent. The problem was not in the computation. It was in the interpretive frame the computation was built on.
Most of the datasets used to train large language models originate from what researchers call WEIRD populations. This is not a new critique. Henrich et al. named it in 2010 but its implications for AI-assisted research in Africa are under-examined and underappreciated.
When a model is trained predominantly on data from contexts where institutional legitimacy is the baseline, it is more likely to inherit the prior that skepticism toward institutions is a deviation. An anomaly. Something to be explained as resistance, susceptibility, or bias. In contexts where institutional distrust is historically rational, where it reflects lived experience of systems that failed, extracted, or actively harmed that prior actively misrepresents the communities being studied.
What happened in my dataset was a textbook case of interpretive drift: the model re-coded a social and political phenomenon into a psychological category that fundamentally changed its meaning. The responses went in as evidence of grounded political skepticism. They came out as evidence of susceptibility to misinformation. Same words. Different world.
What the Data Actually Said
The survey asked respondents two questions relevant to this analysis. The first: are you confident that 4IR technologies can benefit individuals, businesses, and Congolese society? The second: what do you think the negative impacts of adopting 4IR technologies in the DRC would be?
Across 191 respondents spanning all confidence levels in the first question, from “très confiant(e)” very confident to “pas confiant(e)”not confident, the responses to the second question were striking in their consistency. One concern appeared with overwhelming frequency: job loss. “Perte d’emploi.” “Le chômage.” “Réduction de la main d’œuvre.” It appeared in responses from people who were confident the technology would benefit them. It appeared in responses from people who were not. It was not a fringe concern or an edge case. It was the dominant signal in the data.
Alongside it, three other patterns emerged clearly:
Infrastructure and access skepticism, responses that named the reality that limited connectivity, unequal digital penetration, and geographic fragmentation would mean the technology’s benefits would be concentrated in a few cities, leaving the majority of the population as bystanders.
Dependency and dignity concerns, a culturally specific anxiety about what happens to human agency, effort, and self-reliance when machines take over functions people currently perform.
And regulatory distrust, responses that conditioned any optimism on institutions the respondents had little reason to trust, noting explicitly that the governance framework did not yet exist, that authorities had a history of negligence, that without regulation the technology would serve the powerful rather than the population.
The prompt used to instruct the model was deliberately open and methodologically standard:
Prompt: “identify, analyse and organise patterns of meaning (themes) in the data.”
No frame was suggested. No categories were proposed. No interpretive direction was given. The model was handed raw responses and asked to find what was there.
For these two specific questions (on confidence in 4IR benefits and perceived negative impacts) it returned “Misinformation Resistance” as the dominant theme. A label that does not appear in the data, that cannot be traced to any significant cluster of responses, and whose only possible anchor in the entire set of answers to these two questions is a single one-word entry from one respondent out of 191. One neutral respondent submitted a single word: “Misinformation.” The response itself was ambiguous, it may have been naming misinformation as a concern about how the technology could be misused, not expressing any form of resistance to information. But even read generously, one ambiguous word from one respondent does not constitute a theme. These were precisely the questions most loaded with political, structural, and historical meaning and they were the ones the model read most poorly.
What it had done, I understood immediately, was reach for the nearest available category in its interpretive framework for responses that expressed skepticism, hesitation, and conditional trust. In contexts where that framework was built, skepticism toward transformative technology tends to be associated with information deficits, people who resist because they have been misled, or because they lack accurate data, or because they are susceptible to false narratives. The model applied that frame to a dataset where skepticism meant something entirely different: a rational, structurally grounded, historically informed reading of who typically benefits when new systems arrive in the DRC. And who typically does not.
Three Ways This Goes Wrong
The misinformation misclassification is one instance of a broader structural problem. In my work and in the conversations I have had with MERL practitioners across the continent, three patterns emerge consistently.
The first is proxy fragility. AI models making inferences about economic behaviour, creditworthiness, or household wellbeing often rely on proxies developed in formal economy contexts employment history, bank account ownership, documented income. In African contexts with vibrant informal economies, these proxies fail quietly. The model does not flag an error. It simply produces a result that misses the majority of economic activity in the community being studied, without any indication that it has done so. An estimated 83–85% of employment in Sub-Saharan Africa lies within the informal sector, making it the central pillar of the region’s economies. So Informal workers constitute the overwhelming majority of economic activity.
The second is interpretive drift, as described above. The model does not just measure, it categorises. And the categories it reaches for are the ones most represented in its training data. Political phenomena become psychological ones. Collective responses become individual pathologies. Historically grounded behaviours become deviations from an unmarked norm that was never African to begin with.
The third is data absence bias, perhaps the most insidious of the three. When data is sparse or absent for a particular region or community, the model has two possible interpretations: there is nothing happening here, or there is no infrastructure to document what is happening. Models trained on globally uneven data distributions tend toward the first interpretation. Sparse data regions are read as low activity zones rather than as evidence of systemic underreporting. The absence of data becomes a fact about the place, rather than a fact about who has historically had the resources to produce data about it.
In each case, the model is not lying. It is generalising from the evidence it has. The problem is that the evidence it has was never representative of African realities and the model has no mechanism for knowing what it does not know.
One Country, Fifty-Four Countries
I want to be precise about the scale of this problem, because it is easy to read a single case study and treat it as an edge case.
My dataset was 191 respondents in one country, one city, one research context. The interpretive gap was visible to me because I had the contextual knowledge to see it. But consider what happens when the same tools are applied across the continent, across 54 countries with different colonial histories, different languages, different political trajectories, different relationships between communities and institutions, different informal economies, different trust structures.
Errors in local context do not add up linearly across that diversity. They compound. A misclassification that misrepresents political distrust in one context combines with a proxy fragility that misses informal economic activity in another, which combines with a data absence bias that renders entire communities statistically invisible in a third. The result is more than a collection of small errors. It is a systematically distorted picture of African realities produced at speed, at scale, with the appearance of rigour. And then that picture feeds into evidence bases. Which feeds into policy decisions. Which feeds into programme design. Which affects the people the research or M&E was supposed to serve. That is the direction of travel if we do not intervene deliberately.
Before tracing what this means at scale, it is worth being precise about what kind of problem this actually is because that precision matters for who takes responsibility for fixing it.
The model did not invent data. The responses were real. The respondents were real. The concerns they expressed were genuine, coherent, and internally consistent. What the model did was apply an interpretive frame that transformed valid evidence into a misrepresentation. The data were accurate. The interpretive frame was imported.
That distinction matters because it means this is not primarily an AI problem. It is also a Monitoring, Evaluation, Research and Learning(MERL) problem. AI is the mechanism through which the error occurred but the error itself is an evidence integrity failure. And evidence integrity is not the responsibility of the tool. It is the responsibility of the practitioner who deploys it, interprets its outputs, and decides what enters the evidence base.
MERL exists precisely to ensure that evidence reflects contextual reality before it informs decisions. When an AI tool alters the interpretation of valid evidence, recoding a political trust phenomenon as a cognitive bias, taking what was grounded skepticism and filing it under susceptibility and that alteration goes undetected, the MERL function has failed. Not because the tool malfunctioned. Because the interpretive frame was never interrogated.
This is why the DRC misclassification is not an edge case or a cautionary tale about AI limitations. It is a test of whether MERL practice is equipped to govern the tools it is increasingly adopting. The question it poses is not “can AI be trusted?” but: what does responsible evidence practice look like when AI is part of the analytical chain? And that question does not belong only to MERL practitioners in African contexts. It belongs to anyone using AI-assisted analysis to generate evidence about communities whose realities were not the baseline when the models were built.
In the MERL context, misclassification moves like this: a misclassification becomes a coded theme. A coded theme becomes a finding. A finding enters a report. A report informs a programme design or a policy recommendation. By the time anyone might trace the error back to its source (if they ever do) it has already shaped an intervention, allocated a budget, or defined a problem in a way that serves the model’s interpretive framework rather than the community’s actual reality.
In the case of the DRC dataset, the implications are specific and serious. If “Misinformation Resistance” had been accepted as a valid finding, the logical programmatic response would have been some form of information campaign, digital literacy, fact-checking infrastructure, communications strategy. Resources directed at correcting a cognitive problem that did not exist. Meanwhile the actual concerns (structural unemployment anxiety, infrastructure exclusion, regulatory absence, and institutional distrust rooted in historical experience) would have gone unaddressed. The intervention would have been designed for the wrong problem. Confidently. With data to back it up.
But this is where the African MERL case becomes a window into something much larger. The conditions that produced this misclassification are not unique to the DRC, to Africa, or to development research. They are present wherever AI tools trained predominantly on WEIRD-population data are applied to communities that were not the baseline. That is a vast and growing category.
A public health researcher in rural Indonesia using AI to code community responses to a vaccine rollout. A criminal justice analyst in the United States using AI to classify patterns in policing data from predominantly Black neighbourhoods. A market researcher in Brazil using AI to theme consumer feedback from informal economy participants. An education policy team in India using AI to analyse survey data from first-generation learners. In each of these contexts, the gap between what the model assumes and what the data means could produce the same category error that appeared in my dataset, cleanly, confidently, and invisibly.
The scale of AI adoption in research and policy analysis is accelerating faster than the conversation about interpretive validity. Tools are being adopted because they are fast, affordable, and produce outputs that look rigorous. The appearance of rigour is doing significant work here because outputs that look structured and evidence-based carry institutional authority, regardless of whether the interpretive frame that produced them was appropriate for the context.
This is the implication that extends beyond any single dataset, any single continent, any single discipline.We are building evidence infrastructure at speed, and we are not asking, systematically enough, what the tools assume when they read human realities they were never designed to understand.
The Expertise the Conversation Is Missing
The misclassification in this dataset was caught because someone in the room had the contextual knowledge to recognise it. That is the central variable. Not the tool used, not the prompt written, not the methodology followed. The knowledge. Specifically: historically grounded, politically informed, community-specific knowledge of how a particular population thinks, communicates, and has learned to relate to institutions over time.
That kind of knowledge is not exotic. It exists across sectors, across disciplines, across every context where AI is now being deployed to interpret human realities. But it is consistently undervalued in the AI adoption conversation, which tends to centre technical capability, speed, and scale; and treat contextual expertise as a nice-to-have rather than a non-negotiable safeguard.
In MERL, in public health, in criminal justice, in education policy, in market research, the question of who is in the room when AI outputs are reviewed is not administrative. It is epistemological. The difference between a finding that reflects reality and one that systematically misrepresents it may have nothing to do with the model and everything to do with whether the person reviewing its output knows enough about the community being studied to see what the model cannot.
We do not have a shortage of AI tools. We have a shortage of people with the depth of contextual knowledge required to govern them responsibly. That is the expertise gap that actually matters.
The Conversation That Needs to Happen
In February 2026, I presented this analysis at an online session hosted by the AI in Africa Working Group of the Natural Language Processing Community of Practice, part of the MERL Tech Initiative, a space where the tension between contextual knowledge and AI-assisted research could be examined with the depth it deserves. The session, “Turning Principles into Actions: Made in Africa AI in MERL,” brought together practitioners grappling with these questions from across the continent and the broader development sector. You can find the session recording and presentation slides in this shared folder. Feel free to watch the full session but if time is short, my presentation starts at the 16-minute mark.
What’s even more interesting to me in that conversation is how many practitioners might have encountered similar moments, where an AI output felt wrong, where something in the result did not match the reality they knew but had not had a framework for naming what had happened or for acting on it institutionally. This misclassification is a felt experience without a vocabulary.
That vocabulary matters because the solution to this problem is not to stop using AI tools in African MERL contexts. The tools are fast, scalable, and genuinely useful when applied appropriately. The solution is:
to use them with epistemic honesty,
to treat their outputs as preliminary hypotheses to be tested, not conclusions to be reported,
to build mandatory human validation loops that are substantive rather than perfunctory,
to develop context annotation practices that flag culturally specific interpretations before they harden into findings.
And beyond individual practice, there is a collective infrastructure problem. If misclassifications are happening across the continent (and they are) we need shared documentation of where they occur, what form they take, and what the pattern of failure reveals about the models we are using. Edge cases that happen repeatedly are not edge cases. They are systemic features. And the only way to see that pattern is to build the infrastructure to aggregate it.
I wrote about my study’s findings earlier(see link below) but this specific insight, about what the misclassification revealed structurally, only became fully visible when I had to articulate it to a room of practitioners asking the same questions. Sometimes the analysis clarifies itself in the telling.
What the Model Cannot Know
AI tools will be used to research, evaluate, and make decisions about African communities whether we engage with the problem or not. The choice is not between AI and no AI. It is between AI applied with contextual literacy and AI applied without it. Contextual literacy is not a nice-to-have. It is the difference between research that represents communities accurately and research that systematically misrepresents them while producing outputs that look like evidence.
The model that coded political distrust as misinformation resistance was not malicious. It was doing exactly what it was built to do. It was applying the interpretive frameworks embedded in its training data, frameworks built largely from contexts where those frameworks made sense. The problem is that we were not in that context. And the model had no way of knowing the difference.
Thank you for reading!
Support this work
Your support keeps it independent and community-rooted.
Thank you for standing with this work.





This is excellent work that highlights significant issues that need attention when people use AI systems. Your "interpretive drift" maps precisely onto two things we've been documenting in our research.
First, in our color perception studies (2,400+ controlled trials across eight models, three vendors), we found that AI systems often report what they expect to see rather than what is actually present. When shown an image of a yellow object, the model's knowledge of what color that object "should" be can override what the image actually contains. We call it Semantic Coherence Enforcement: the training prior captures the output. Your case appears to be the qualitative version of the same mechanism.
Second, we've been developing a framework called Functional Perceptual Grounding, which argues that these systems aren't ungrounded. They have genuine, functional grounding inherited from training data. The grounding works. But it's scoped to whatever the training data represents. Your DRC case is a powerful illustration: the model isn't failing to reason. It's reasoning competently within a frame that was never built to include Congolese political history. The grounding is real but incomplete (and hence incompetent for this use), which is a different problem from "no understanding at all.
Your point in the comments that the imported frame doesn't begin with the model is crucial. Freely's observation is equally important: the model's explicitness may actually be an advantage, because when a human researcher silently applies the same WEIRD frame, nobody catches it.
Our research is at synthsentience.substack.com if you're interested in the cross-domain parallels. Keep up the great work. Look forward to reading more.
Sadly, "what happens in all the cases where no one in the room does?" = the people who designed and deploy the LLM don't care. 😮💨