X-SRL dataset and mBERT word aligner

This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by trans...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: Daza, Angel (VerfasserIn)
Dokumenttyp: Datenbank Forschungsdaten
Sprache:Englisch
Veröffentlicht: Heidelberg Universität 2021-02-17
DOI:10.11588/data/HVXXIJ
Schlagworte:
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.11588/data/HVXXIJ
Verlag, kostenfrei, Volltext: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ
Volltext
Verfasserangaben:Angel Daza

MARC

LEADER 00000nmi a2200000 c 4500
001 1748601830
003 DE-627
005 20210218092806.0
006 su| d|o |0 |0
007 cr uuu---uuuuu
008 210218c20219999xx |o | eng c
024 7 |a 10.11588/data/HVXXIJ  |2 doi 
035 |a (DE-627)1748601830 
035 |a (DE-599)KXP1748601830 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Daza, Angel  |d 1989-  |e VerfasserIn  |0 (DE-588)1203323360  |0 (DE-627)1688152938  |4 aut 
245 1 0 |a X-SRL dataset and mBERT word aligner  |c Angel Daza 
264 1 |a Heidelberg  |b Universität  |c 2021-02-17 
300 |a 1 Online-Ressource (2 Files) 
336 |a Text  |b txt  |2 rdacontent 
336 |a Computerdaten  |b cod  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Kind of data: Program source code 
500 |a Gesehen am 18.02.2021 
520 |a This code contains a method to automatically align words from parallel sentences by using multilingual BERT pre-trained embeddings. This can be used to transfer source annotations (for example labeled English sentences) into the target side (for example a German translation of the sentence) by transferring the label into the best-aligned target word. This newly labeled data can be used to train different multilingual SOTA models to improve performance, especially for the lower-resource languages. 
655 7 |a Forschungsdaten  |0 (DE-588)1098579690  |0 (DE-627)857755366  |0 (DE-576)469182156  |2 gnd-content 
655 7 |a Datenbank  |0 (DE-588)4011119-2  |0 (DE-627)106354256  |0 (DE-576)208891943  |2 gnd-content 
787 0 8 |i Forschungsdaten zu  |a Daza, Angel, 1989 -   |t X-SRL  |d 2020  |w (DE-627)1748602551 
856 4 0 |u https://doi.org/10.11588/data/HVXXIJ  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/HVXXIJ  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a BO 
992 |a 20210218 
993 |a ResearchData 
994 |a 2021 
998 |g 1203323360  |a Daza, Angel  |m 1203323360:Daza, Angel  |d 90000  |e 90000PD1203323360  |k 0/90000/  |p 1  |x j  |y j 
999 |a KXP-PPN1748601830  |e 3858525421 
BIB |a Y 
JSO |a {"language":["eng"],"recId":"1748601830","physDesc":[{"extent":"1 Online-Ressource (2 Files)"}],"note":["Kind of data: Program source code","Gesehen am 18.02.2021"],"type":{"bibl":"dataset","media":"Online-Ressource"},"person":[{"role":"aut","display":"Daza, Angel","roleDisplay":"VerfasserIn","given":"Angel","family":"Daza"}],"name":{"displayForm":["Angel Daza"]},"id":{"eki":["1748601830"],"doi":["10.11588/data/HVXXIJ"]},"origin":[{"dateIssuedDisp":"2021-02-17","publisher":"Universität","dateIssuedKey":"2021","publisherPlace":"Heidelberg"}],"title":[{"title":"X-SRL dataset and mBERT word aligner","title_sort":"X-SRL dataset and mBERT word aligner"}]} 
SRT |a DAZAANGELXSRLDATASE2021