Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention

Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation m...

Full description

Saved in:
Bibliographic Details
Main Authors: Ali, Wazir (Author) , Kumar, Jay (Author) , Tumrani, Saifullah (Author) , Nour, Redhwan (Author) , Noor, Adeeb (Author) , Xu, Zenglin (Author)
Format: Article (Journal)
Language:English
Published: 2025
In: IEEE access
Year: 2025, Volume: 13, Pages: 183133-183142
ISSN:2169-3536
DOI:10.1109/ACCESS.2024.3507382
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1109/ACCESS.2024.3507382
Verlag, kostenfrei, Volltext: https://ieeexplore.ieee.org/document/10769409/authors
Get full text
Author Notes:Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE)

MARC

LEADER 00000naa a2200000 c 4500
001 1927457734
003 DE-627
005 20250604130053.0
007 cr uuu---uuuuu
008 250604s2025 xx |||||o 00| ||eng c
024 7 |a 10.1109/ACCESS.2024.3507382  |2 doi 
035 |a (DE-627)1927457734 
035 |a (DE-599)KXP1927457734 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 32  |2 sdnb 
100 1 |a Ali, Wazir  |e VerfasserIn  |0 (DE-588)1367748224  |0 (DE-627)1927458455  |4 aut 
245 1 0 |a Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention  |c Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE) 
264 1 |c 2025 
300 |b Illustrationen 
300 |a 10 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024 
500 |a Gesehen am 04.06.2025 
520 |a Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets. 
650 4 |a Attention mechanism 
650 4 |a Computer science 
650 4 |a Context modeling 
650 4 |a Labeling 
650 4 |a Long short term memory 
650 4 |a long short-term memory 
650 4 |a neural network 
650 4 |a Noise measurement 
650 4 |a Recurrent neural networks 
650 4 |a representation learning 
650 4 |a Representation learning 
650 4 |a Robustness 
650 4 |a Tagging 
650 4 |a White spaces 
650 4 |a word segmentation 
700 1 |a Kumar, Jay  |e VerfasserIn  |4 aut 
700 1 |a Tumrani, Saifullah  |e VerfasserIn  |0 (DE-588)1367748585  |0 (DE-627)192745879X  |4 aut 
700 1 |a Nour, Redhwan  |e VerfasserIn  |4 aut 
700 1 |a Noor, Adeeb  |e VerfasserIn  |4 aut 
700 1 |a Xu, Zenglin  |e VerfasserIn  |4 aut 
773 0 8 |i Enthalten in  |a Institute of Electrical and Electronics Engineers  |t IEEE access  |d New York, NY : IEEE, 2013  |g 13(2025), Seite 183133-183142  |h Online-Ressource  |w (DE-627)728440385  |w (DE-600)2687964-5  |w (DE-576)373180713  |x 2169-3536  |7 nnas 
773 1 8 |g volume:13  |g year:2025  |g pages:183133-183142  |g extent:10  |a Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention 
856 4 0 |u https://doi.org/10.1109/ACCESS.2024.3507382  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
856 4 0 |u https://ieeexplore.ieee.org/document/10769409/authors  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20250604 
993 |a Article 
994 |a 2025 
998 |g 1367748585  |a Tumrani, Saifullah  |m 1367748585:Tumrani, Saifullah  |d 700000  |d 716000  |e 700000PT1367748585  |e 716000PT1367748585  |k 0/700000/  |k 1/700000/716000/  |p 3 
999 |a KXP-PPN1927457734  |e 4731007747 
BIB |a Y 
SER |a journal 
JSO |a {"name":{"displayForm":["Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE)"]},"origin":[{"dateIssuedDisp":"2025","dateIssuedKey":"2025"}],"id":{"eki":["1927457734"],"doi":["10.1109/ACCESS.2024.3507382"]},"physDesc":[{"noteIll":"Illustrationen","extent":"10 S."}],"relHost":[{"origin":[{"publisher":"IEEE","dateIssuedKey":"2013","dateIssuedDisp":"2013-","publisherPlace":"New York, NY"}],"id":{"issn":["2169-3536"],"zdb":["2687964-5"],"eki":["728440385"]},"name":{"displayForm":["Institute of Electrical and Electronics Engineers"]},"physDesc":[{"extent":"Online-Ressource"}],"title":[{"title_sort":"IEEE access","title":"IEEE access","subtitle":"practical research, open solutions"}],"pubHistory":["1.2013 -"],"part":{"extent":"10","text":"13(2025), Seite 183133-183142","volume":"13","pages":"183133-183142","year":"2025"},"titleAlt":[{"title":"Access"}],"disp":"Institute of Electrical and Electronics EngineersIEEE access","note":["Gesehen am 24.10.12"],"type":{"media":"Online-Ressource","bibl":"periodical"},"language":["eng"],"corporate":[{"role":"aut","display":"Institute of Electrical and Electronics Engineers","roleDisplay":"VerfasserIn"}],"recId":"728440385"}],"person":[{"given":"Wazir","family":"Ali","role":"aut","display":"Ali, Wazir","roleDisplay":"VerfasserIn"},{"role":"aut","display":"Kumar, Jay","roleDisplay":"VerfasserIn","given":"Jay","family":"Kumar"},{"roleDisplay":"VerfasserIn","display":"Tumrani, Saifullah","role":"aut","family":"Tumrani","given":"Saifullah"},{"role":"aut","roleDisplay":"VerfasserIn","display":"Nour, Redhwan","given":"Redhwan","family":"Nour"},{"display":"Noor, Adeeb","roleDisplay":"VerfasserIn","role":"aut","family":"Noor","given":"Adeeb"},{"role":"aut","display":"Xu, Zenglin","roleDisplay":"VerfasserIn","given":"Zenglin","family":"Xu"}],"title":[{"title":"Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention","title_sort":"Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention"}],"note":["Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024","Gesehen am 04.06.2025"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"recId":"1927457734","language":["eng"]} 
SRT |a ALIWAZIRKUENHANCINGS2025