Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention
Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation m...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article (Journal) |
| Language: | English |
| Published: |
2025
|
| In: |
IEEE access
Year: 2025, Volume: 13, Pages: 183133-183142 |
| ISSN: | 2169-3536 |
| DOI: | 10.1109/ACCESS.2024.3507382 |
| Online Access: | Verlag, kostenfrei, Volltext: https://doi.org/10.1109/ACCESS.2024.3507382 Verlag, kostenfrei, Volltext: https://ieeexplore.ieee.org/document/10769409/authors |
| Author Notes: | Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE) |
MARC
| LEADER | 00000naa a2200000 c 4500 | ||
|---|---|---|---|
| 001 | 1927457734 | ||
| 003 | DE-627 | ||
| 005 | 20250604130053.0 | ||
| 007 | cr uuu---uuuuu | ||
| 008 | 250604s2025 xx |||||o 00| ||eng c | ||
| 024 | 7 | |a 10.1109/ACCESS.2024.3507382 |2 doi | |
| 035 | |a (DE-627)1927457734 | ||
| 035 | |a (DE-599)KXP1927457734 | ||
| 040 | |a DE-627 |b ger |c DE-627 |e rda | ||
| 041 | |a eng | ||
| 084 | |a 32 |2 sdnb | ||
| 100 | 1 | |a Ali, Wazir |e VerfasserIn |0 (DE-588)1367748224 |0 (DE-627)1927458455 |4 aut | |
| 245 | 1 | 0 | |a Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention |c Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE) |
| 264 | 1 | |c 2025 | |
| 300 | |b Illustrationen | ||
| 300 | |a 10 | ||
| 336 | |a Text |b txt |2 rdacontent | ||
| 337 | |a Computermedien |b c |2 rdamedia | ||
| 338 | |a Online-Ressource |b cr |2 rdacarrier | ||
| 500 | |a Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024 | ||
| 500 | |a Gesehen am 04.06.2025 | ||
| 520 | |a Sindhi word segmentation is a challenging task due to space omission and insertion issues. The Sindhi language itself adds to this complexity. It’s cursive and consists of characters with inherent joining and non-joining properties, independent of word boundaries. Existing Sindhi word segmentation methods rely on designing and combining hand-crafted features. However, these methods have limitations, such as difficulty handling out-of-vocabulary words, limited robustness for other languages, and inefficiency with large amounts of noisy or raw text. Neural network-based models, in contrast, can automatically capture word boundary information without requiring prior knowledge. In this paper, we propose a Subword-Guided Neural Word Segmenter (SGNWS) that addresses word segmentation as a sequence labeling task. The SGNWS model incorporates subword representation learning through a bidirectional long short-term memory encoder, position-aware self-attention, and a conditional random field. Our empirical results demonstrate that the SGNWS model achieves state-of-the-art performance in Sindhi word segmentation on six datasets. | ||
| 650 | 4 | |a Attention mechanism | |
| 650 | 4 | |a Computer science | |
| 650 | 4 | |a Context modeling | |
| 650 | 4 | |a Labeling | |
| 650 | 4 | |a Long short term memory | |
| 650 | 4 | |a long short-term memory | |
| 650 | 4 | |a neural network | |
| 650 | 4 | |a Noise measurement | |
| 650 | 4 | |a Recurrent neural networks | |
| 650 | 4 | |a representation learning | |
| 650 | 4 | |a Representation learning | |
| 650 | 4 | |a Robustness | |
| 650 | 4 | |a Tagging | |
| 650 | 4 | |a White spaces | |
| 650 | 4 | |a word segmentation | |
| 700 | 1 | |a Kumar, Jay |e VerfasserIn |4 aut | |
| 700 | 1 | |a Tumrani, Saifullah |e VerfasserIn |0 (DE-588)1367748585 |0 (DE-627)192745879X |4 aut | |
| 700 | 1 | |a Nour, Redhwan |e VerfasserIn |4 aut | |
| 700 | 1 | |a Noor, Adeeb |e VerfasserIn |4 aut | |
| 700 | 1 | |a Xu, Zenglin |e VerfasserIn |4 aut | |
| 773 | 0 | 8 | |i Enthalten in |a Institute of Electrical and Electronics Engineers |t IEEE access |d New York, NY : IEEE, 2013 |g 13(2025), Seite 183133-183142 |h Online-Ressource |w (DE-627)728440385 |w (DE-600)2687964-5 |w (DE-576)373180713 |x 2169-3536 |7 nnas |
| 773 | 1 | 8 | |g volume:13 |g year:2025 |g pages:183133-183142 |g extent:10 |a Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention |
| 856 | 4 | 0 | |u https://doi.org/10.1109/ACCESS.2024.3507382 |x Verlag |x Resolving-System |z kostenfrei |3 Volltext |
| 856 | 4 | 0 | |u https://ieeexplore.ieee.org/document/10769409/authors |x Verlag |z kostenfrei |3 Volltext |
| 951 | |a AR | ||
| 992 | |a 20250604 | ||
| 993 | |a Article | ||
| 994 | |a 2025 | ||
| 998 | |g 1367748585 |a Tumrani, Saifullah |m 1367748585:Tumrani, Saifullah |d 700000 |d 716000 |e 700000PT1367748585 |e 716000PT1367748585 |k 0/700000/ |k 1/700000/716000/ |p 3 | ||
| 999 | |a KXP-PPN1927457734 |e 4731007747 | ||
| BIB | |a Y | ||
| SER | |a journal | ||
| JSO | |a {"name":{"displayForm":["Wazir Ali, Jay Kumar, Saifullah Tumrani, Redhwan Nour, Adeeb Noor, and Zenglin Xu (Senior Member, IEEE)"]},"origin":[{"dateIssuedDisp":"2025","dateIssuedKey":"2025"}],"id":{"eki":["1927457734"],"doi":["10.1109/ACCESS.2024.3507382"]},"physDesc":[{"noteIll":"Illustrationen","extent":"10 S."}],"relHost":[{"origin":[{"publisher":"IEEE","dateIssuedKey":"2013","dateIssuedDisp":"2013-","publisherPlace":"New York, NY"}],"id":{"issn":["2169-3536"],"zdb":["2687964-5"],"eki":["728440385"]},"name":{"displayForm":["Institute of Electrical and Electronics Engineers"]},"physDesc":[{"extent":"Online-Ressource"}],"title":[{"title_sort":"IEEE access","title":"IEEE access","subtitle":"practical research, open solutions"}],"pubHistory":["1.2013 -"],"part":{"extent":"10","text":"13(2025), Seite 183133-183142","volume":"13","pages":"183133-183142","year":"2025"},"titleAlt":[{"title":"Access"}],"disp":"Institute of Electrical and Electronics EngineersIEEE access","note":["Gesehen am 24.10.12"],"type":{"media":"Online-Ressource","bibl":"periodical"},"language":["eng"],"corporate":[{"role":"aut","display":"Institute of Electrical and Electronics Engineers","roleDisplay":"VerfasserIn"}],"recId":"728440385"}],"person":[{"given":"Wazir","family":"Ali","role":"aut","display":"Ali, Wazir","roleDisplay":"VerfasserIn"},{"role":"aut","display":"Kumar, Jay","roleDisplay":"VerfasserIn","given":"Jay","family":"Kumar"},{"roleDisplay":"VerfasserIn","display":"Tumrani, Saifullah","role":"aut","family":"Tumrani","given":"Saifullah"},{"role":"aut","roleDisplay":"VerfasserIn","display":"Nour, Redhwan","given":"Redhwan","family":"Nour"},{"display":"Noor, Adeeb","roleDisplay":"VerfasserIn","role":"aut","family":"Noor","given":"Adeeb"},{"role":"aut","display":"Xu, Zenglin","roleDisplay":"VerfasserIn","given":"Zenglin","family":"Xu"}],"title":[{"title":"Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention","title_sort":"Enhancing Sindhi word segmentation using subword representation learning and position-aware self-attention"}],"note":["Online veröffentlicht: 27. November 2024, Artikelversion: 13. Dezember 2024","Gesehen am 04.06.2025"],"type":{"bibl":"article-journal","media":"Online-Ressource"},"recId":"1927457734","language":["eng"]} | ||
| SRT | |a ALIWAZIRKUENHANCINGS2025 | ||