Calls to decolonise knowledge have transformed the discursive landscape of African higher education over the past decade. Movements such as #RhodesMustFall and #FeesMustFall demanded structural change to institutional cultures and curricula, and advocated for a deeper epistemological challenge, insisting that the production and adoption of knowledge in African universities be refigured around African intellectual traditions, researchers, and institutions. Yet the gap between discursive commitment and measurable practice remains poorly understood. How can one assess, empirically and at scale, whether these demands have altered the actual conduct of research — the granular, everyday decisions about which scholars to cite, which frameworks to invoke, and whose intellectual labour to acknowledge?
This paper presents the methodological foundations and early findings of a collaborative project — developed under the auspices of the SA-UK Chair in the Digital Humanities at the University of the Witwatersrand and the Helsinki Institute for Social Sciences and Humanities — designed to answer this question. The project uses Electronic Theses and Dissertations (ETDs) as its primary corpus: a systematically archived, metadata-rich, and computationally tractable record of knowledge production at the postgraduate level. Drawing on PhD theses submitted at Wits University between 2021 and 2025, the project develops a methodology for identifying and quantifying the presence of African-authored and Africa-based scholarship within doctoral citation practices, disaggregated by faculty and discipline.
The paper engages with three challenges that sit at the heart of this workshop’s agenda. First, it confronts a fundamental methodological problem: how to define “African knowledge” in ways that are both computationally operable and epistemologically defensible. The project tests one principal proxy — institutional affiliation — but also seeks to explore alternative routes to capture African knowledge production, including diasporic scholarship, against their technical tractability when applied to large unstructured bibliographic datasets. This requires the integration of tools such as OpenAlex, Scopus, and ORCID, and raises broader questions about how existing digital scholarly infrastructures encode — and often obscure — African academic labour.
Second, the paper addresses the challenge of building research infrastructure under African conditions. The ETD corpus at Wits presents a case study in the practical obstacles of African DH research: heterogeneous document formats, inconsistent metadata schemas, and the absence of standardised citation extraction practices calibrated for African institutional contexts. The project’s iterative, pilot-based design — moving from data collection through methodology development to scalable metrics — offers a model for capacity-building that prioritises replicability and South-South transferability over high-end computational dependency.
Third, and most critically, the project offers an opportunity to reflect on the encounter between computational method and decolonial inquiry. The use of AI-assisted citation analysis is not treated as a neutral technical exercise but as a methodological intervention in an ongoing political debate. Digitisation practices, bibliometric databases, and citation indices are themselves the products of particular choices about what counts as scholarship, which institutions belong to the archive, and whose research warrants discoverability. The paper reflects on how these infrastructural conditions shape what can be measured, and what cannot — and argues that any credible computational approach to African knowledge must account for the political economy of the tools it deploys.