X-SRL dataset and mBERT word aligner
This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by trans...
Gespeichert in:
| 1. Verfasser: | |
|---|---|
| Dokumenttyp: | Datenbank Forschungsdaten |
| Sprache: | Englisch |
| Veröffentlicht: |
Heidelberg
Universität
2021-02-17
|
| DOI: | 10.11588/data/HVXXIJ |
| Schlagworte: | |
| Online-Zugang: | Verlag, lizenzpflichtig, Volltext: https://doi.org/10.11588/data/HVXXIJ Verlag, kostenfrei, Volltext: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ |
| Verfasserangaben: | Angel Daza |
MARC
| LEADER | 00000nmi a2200000 c 4500 | ||
|---|---|---|---|
| 001 | 1748601830 | ||
| 003 | DE-627 | ||
| 005 | 20210218092806.0 | ||
| 006 | su| d|o |0 |0 | ||
| 007 | cr uuu---uuuuu | ||
| 008 | 210218c20219999xx |o | eng c | ||
| 024 | 7 | |a 10.11588/data/HVXXIJ |2 doi | |
| 035 | |a (DE-627)1748601830 | ||
| 035 | |a (DE-599)KXP1748601830 | ||
| 040 | |a DE-627 |b ger |c DE-627 |e rda | ||
| 041 | |a eng | ||
| 084 | |a 28 |2 sdnb | ||
| 100 | 1 | |a Daza, Angel |d 1989- |e VerfasserIn |0 (DE-588)1203323360 |0 (DE-627)1688152938 |4 aut | |
| 245 | 1 | 0 | |a X-SRL dataset and mBERT word aligner |c Angel Daza |
| 264 | 1 | |a Heidelberg |b Universität |c 2021-02-17 | |
| 300 | |a 1 Online-Ressource (2 Files) | ||
| 336 | |a Text |b txt |2 rdacontent | ||
| 336 | |a Computerdaten |b cod |2 rdacontent | ||
| 337 | |a Computermedien |b c |2 rdamedia | ||
| 338 | |a Online-Ressource |b cr |2 rdacarrier | ||
| 500 | |a Kind of data: Program source code | ||
| 500 | |a Gesehen am 18.02.2021 | ||
| 520 | |a This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages. | ||
| 655 | 7 | |a Forschungsdaten |0 (DE-588)1098579690 |0 (DE-627)857755366 |0 (DE-576)469182156 |2 gnd-content | |
| 655 | 7 | |a Datenbank |0 (DE-588)4011119-2 |0 (DE-627)106354256 |0 (DE-576)208891943 |2 gnd-content | |
| 787 | 0 | 8 | |i Forschungsdaten zu |a Daza, Angel, 1989 - |t X-SRL |d 2020 |w (DE-627)1748602551 |
| 856 | 4 | 0 | |u https://doi.org/10.11588/data/HVXXIJ |x Verlag |x Resolving-System |z lizenzpflichtig |3 Volltext |
| 856 | 4 | 0 | |u https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ |x Verlag |z kostenfrei |3 Volltext |
| 951 | |a BO | ||
| 992 | |a 20210218 | ||
| 993 | |a ResearchData | ||
| 994 | |a 2021 | ||
| 998 | |g 1203323360 |a Daza, Angel |m 1203323360:Daza, Angel |d 90000 |e 90000PD1203323360 |k 0/90000/ |p 1 |x j |y j | ||
| 999 | |a KXP-PPN1748601830 |e 3858525421 | ||
| BIB | |a Y | ||
| JSO | |a {"language":["eng"],"recId":"1748601830","physDesc":[{"extent":"1 Online-Ressource (2 Files)"}],"note":["Kind of data: Program source code","Gesehen am 18.02.2021"],"type":{"bibl":"dataset","media":"Online-Ressource"},"person":[{"role":"aut","display":"Daza, Angel","roleDisplay":"VerfasserIn","given":"Angel","family":"Daza"}],"name":{"displayForm":["Angel Daza"]},"id":{"eki":["1748601830"],"doi":["10.11588/data/HVXXIJ"]},"origin":[{"dateIssuedDisp":"2021-02-17","publisher":"Universität","dateIssuedKey":"2021","publisherPlace":"Heidelberg"}],"title":[{"title":"X-SRL dataset and mBERT word aligner","title_sort":"X-SRL dataset and mBERT word aligner"}]} | ||
| SRT | |a DAZAANGELXSRLDATASE2021 | ||