X-SRL dataset and mBERT word aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by trans...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Daza, Angel (VerfasserIn)
Dokumenttyp:	Datenbank Forschungsdaten
Sprache:	Englisch
Veröffentlicht:	Heidelberg Universität 2021-02-17
DOI:	10.11588/data/HVXXIJ
Schlagworte:	Forschungsdaten Datenbank
Online-Zugang:	Verlag, lizenzpflichtig, Volltext: https://doi.org/10.11588/data/HVXXIJ Verlag, kostenfrei, Volltext: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ
Verfasserangaben:	Angel Daza

Beschreibung
Zusammenfassung:	This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages.
Beschreibung:	Kind of data: Program source code Gesehen am 18.02.2021
Beschreibung:	Online Resource
DOI:	10.11588/data/HVXXIJ

X-SRL dataset and mBERT word aligner

Ähnliche Einträge