Building NLP/HTR Infrastructure for Ajami Manuscripts

Présenté par

  • Oreen Yousuf
    Université d’Uppsala

Ajami refers to African languages written in the Arabic script. The use of Ajami began in the 10th century and spread to various sociopolitical states across Africa. Both existing handwritten and optical character recognition systems, as well as text-based NLP models perform poorly on recognition of Ajami manuscripts and analyzing digital Ajami text. This is mainly due to a lack of research and inclusion of Ajami manuscripts in NLP and Digital Humanities as a whole. I will present my work on building HTR/OCR and NLP infrastructure for Ajami manuscripts by providing historical and linguistic background of Ajami, data curation, and tasks currently being worked on.

Soutenu par

Point SudSTIAS — Stellenbosch Institute for Advanced StudyDeutsche Forschungsgemeinschaft (DFG)Goethe University FrankfurtUniversity of Bayreuth / Africa MultipleKing's College LondonSADiLaR

© 2026 Frédérick Madore, Vincent Hiribarren, Emmanuel Ngue Um, Menno van Zaanen. Tous droits réservés.