Bakhtawar Abdalla (2310738) Bakhtawar Abdalla

Central Kurdish Named Entity Recognition (CKNER)

Project Abstract

The central Kurdish (Sorani) dialect has a significant shortage of NLP services and capabilities, specifically for NER tasks. This cerebral shortage holds the development of language technologies that can help the millions of speakers of such a language. Therefore, learning to advance NER for an under-represented language is urgently needed. The main objective of this research project is to fulfill the existing gap and to implement an NER model, especially one built for the Sorani Kurdish dialect, titled CKNER (Central Kurdish Named Entity Recognition). The primary purpose is to design its own corpus, an annotation dataset, and an NER model that can accurately identify and choose proper entities for named entities in Sorani texts, just as there is currently no specialized resource for this low-resource language environment. The task involves constructing a new specific textual corpus in Central Kurdish (Sorani) to gather information from various sources. Native people will annotate all the named entities, such as person names, locations, and organizations. Once the system has integrated the generated annotated accurate data, the conditional random fields statistical, another Machine Learning, or any other techniques for the NER modeling approach will be adapted and put into action based on the unique language characteristics of the Sorani dialect. With an after-and-after process, including model training and evaluation, the main goal is finally to come up with a Sorani Kurdish-oriented NER system that is precise and well-performed.

Keywords: Natural Language Processing (NLP), Named Entity Recognition (NER), Low-resource languages

 

 Conference Details

 

Session: Presentation Stream 1 at Presentation Slot 3

Location: GH049 at Tuesday 7th 13:30 – 17:00

Markers: Chen Hu (GTA), Simon Robinson

Course: MSc Advanced Computer Science, Masters PG

Future Plans: I’m looking for work