X-SRL dataset and mBERT word aligner
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by trans...
Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Dokumenttyp: | Datenbank Forschungsdaten |
| Sprache: | Englisch |
| Veröffentlicht: |
Heidelberg
Universität
2021-02-17
|
| DOI: | 10.11588/data/HVXXIJ |
| Schlagworte: | |
| Online-Zugang: | Verlag, lizenzpflichtig, Volltext: https://doi.org/10.11588/data/HVXXIJ Verlag, kostenfrei, Volltext: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ |
| Verfasserangaben: | Angel Daza |
| Zusammenfassung: | This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages. |
|---|---|
| Beschreibung: | Kind of data: Program source code Gesehen am 18.02.2021 |
| Beschreibung: | Online Resource |
| DOI: | 10.11588/data/HVXXIJ |