Crosslinguistic semantic textual similarity of Buddhist Chinese and Classical Tibetan

In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist sūtra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-l...

Full description

Saved in:

Bibliographic Details
Main Authors:	Felbur, Rafal (Author) , Meelen, Marieke (Author) , Vierthaler, Paul (Author)
Format:	Article (Journal)
Language:	English
Published:	04 October 2022
In:	Journal of open humanities data Year: 2022, Volume: 8, Pages: 1-14
ISSN:	2059-481X
DOI:	10.5334/johd.86
Online Access:	Verlag, kostenfrei, Volltext: https://doi.org/10.5334/johd.86 Verlag, kostenfrei, Volltext: https://openhumanitiesdata.metajnl.com/articles/10.5334/johd.86
Author Notes:	Rafal Felbur, Marieke Meelen, Paul Vierthaler

Description
Summary:	In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist sūtra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low-resource historical languages.
Item Description:	Gesehen am 09.12.2025
Physical Description:	Online Resource
ISSN:	2059-481X
DOI:	10.5334/johd.86

Crosslinguistic semantic textual similarity of Buddhist Chinese and Classical Tibetan

Similar Items