Saraiki language characters dataset (SLCD)

Bibliographic Details
Title: Saraiki language characters dataset (SLCD)
Authors: Muhammad Ahmad Khan, Khalil Khan, Abdulrahman Aloraini, Rehan Ullah Khan
Source: Data in Brief, Vol 54, Iss , Pp 110473- (2024)
Publisher Information: Elsevier, 2024.
Publication Year: 2024
Collection: LCC:Computer applications to medicine. Medical informatics
LCC:Science (General)
Subject Terms: Natural language processing, Text recognition, Machine learning, Optical character recognition, Computer applications to medicine. Medical informatics, R858-859.7, Science (General), Q1-390
More Details: About 26 million people worldwide use the Saraiki language [1]. In the southern part of Punjab and Sindh, Saraiki language is extensively spoken. One of the most important Saraiki cultural hubs is Dera Ghazi Khan. In Dera Ghazi Khan, the Saraiki language is spoken by over 90 % of the population. Calligraphers use a sophisticated script to write this language. Despite the vast body of Optical Character Recognition (OCR) literature and research dedicated to other languages, a fully functional OCR system is still needed for Saraiki language [2,3]. This work presents a genuine dataset of Saraiki handwritten characters, consisting of 50,000 scanned photos, and makes it accessible to the public for use. All of the photographs include handwritten text contributed by teachers and students from Pak-Austria Fachhochschule for Applied Sciences and Technology, Pakistan. Around 1000 people, roughly half men and half women, contributed in writing this text. For scientific research, the dataset will be made accessible to the general public.
Document Type: article
File Description: electronic resource
Language: English
ISSN: 2352-3409
Relation: http://www.sciencedirect.com/science/article/pii/S2352340924004426; https://doaj.org/toc/2352-3409
DOI: 10.1016/j.dib.2024.110473
Access URL: https://doaj.org/article/957d2e56afb04722b7df3c75b2c83152
Accession Number: edsdoj.957d2e56afb04722b7df3c75b2c83152
Database: Directory of Open Access Journals
More Details
ISSN:23523409
DOI:10.1016/j.dib.2024.110473
Published in:Data in Brief
Language:English