What do we need to know about an unknown word when parsing German

We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained...

Full description

Saved in:
Bibliographic Details
Main Authors: Do, Bich-Ngoc (Author) , Rehbein, Ines (Author) , Frank, Anette (Author)
Format: Chapter/Article Conference Paper
Language:English
Published: September 2017
In: First Workshop on Subword and Character Level Models in NLP - proceedings of the workshop
Year: 2017, Pages: 117-123
DOI:10.18653/v1/W17-4117
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.18653/v1/W17-4117
Verlag, lizenzpflichtig, Volltext: https://aclanthology.org/W17-4117
Get full text
Author Notes:Bich-Ngoc Do, Ines Rehbein, Anette Frank

MARC

LEADER 00000naa a2200000 c 4500
001 1870904451
003 DE-627
005 20231122191843.0
007 cr uuu---uuuuu
008 231122s2017 xx |||||o 00| ||eng c
024 7 |a 10.18653/v1/W17-4117  |2 doi 
035 |a (DE-627)1870904451 
035 |a (DE-599)KXP1870904451 
035 |a (OCoLC)1410406972 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Do, Bich-Ngoc  |d 1989-  |e VerfasserIn  |0 (DE-588)1208354051  |0 (DE-627)1694619745  |4 aut 
245 1 0 |a What do we need to know about an unknown word when parsing German  |c Bich-Ngoc Do, Ines Rehbein, Anette Frank 
264 1 |c September 2017 
300 |a 7 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 22.11.2023 
520 |a We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information. 
700 1 |a Rehbein, Ines  |e VerfasserIn  |0 (DE-588)1207353833  |0 (DE-627)1693632373  |4 aut 
700 1 |a Frank, Anette  |e VerfasserIn  |0 (DE-588)1020288108  |0 (DE-627)691172161  |0 (DE-576)36005689X  |4 aut 
773 0 8 |i Enthalten in  |a Workshop on Subword and Character Level Models in NLP (1. : 2017 : Kopenhagen)  |t First Workshop on Subword and Character Level Models in NLP - proceedings of the workshop  |d Stroudsburg, PA : Association for Computational Linguistics (ACL), 2017  |g (2017), Seite 117-123  |h 1 Online-Ressource (xii, 169 Seiten, 4,14 MB)  |w (DE-627)1001283872  |z 9781945626913  |7 nnam 
773 1 8 |g year:2017  |g pages:117-123  |g extent:7  |a What do we need to know about an unknown word when parsing German 
787 0 8 |i Forschungsdaten  |a Do, Bich-Ngoc, 1989 -   |t Head selection parsers and LSTM labelers  |d Heidelberg : Universität, 2023  |h 1 Online-Ressource (19 Files)  |w (DE-627)1870904370 
856 4 0 |u https://doi.org/10.18653/v1/W17-4117  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u https://aclanthology.org/W17-4117  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20231122 
993 |a ConferencePaper 
994 |a 2017 
998 |g 1020288108  |a Frank, Anette  |m 1020288108:Frank, Anette  |d 90000  |d 90500  |e 90000PF1020288108  |e 90500PF1020288108  |k 0/90000/  |k 1/90000/90500/  |p 3  |y j 
998 |g 1207353833  |a Rehbein, Ines  |m 1207353833:Rehbein, Ines  |d 90000  |e 90000PR1207353833  |k 0/90000/  |p 2 
998 |g 1208354051  |a Do, Bich-Ngoc  |m 1208354051:Do, Bich-Ngoc  |d 90000  |d 90500  |e 90000PD1208354051  |e 90500PD1208354051  |k 0/90000/  |k 1/90000/90500/  |p 1  |x j 
999 |a KXP-PPN1870904451  |e 4415876641 
BIB |a Y 
JSO |a {"note":["Gesehen am 22.11.2023"],"type":{"media":"Online-Ressource","bibl":"chapter"},"recId":"1870904451","language":["eng"],"title":[{"title":"What do we need to know about an unknown word when parsing German","title_sort":"What do we need to know about an unknown word when parsing German"}],"person":[{"given":"Bich-Ngoc","family":"Do","role":"aut","display":"Do, Bich-Ngoc","roleDisplay":"VerfasserIn"},{"family":"Rehbein","given":"Ines","display":"Rehbein, Ines","roleDisplay":"VerfasserIn","role":"aut"},{"role":"aut","display":"Frank, Anette","roleDisplay":"VerfasserIn","given":"Anette","family":"Frank"}],"physDesc":[{"extent":"7 S."}],"relHost":[{"part":{"extent":"7","text":"(2017), Seite 117-123","pages":"117-123","year":"2017"},"titleAlt":[{"title":"Proceedings of the First Workshop on Subword and Character Level Models in NLP"}],"recId":"1001283872","language":["eng"],"corporate":[{"display":"Workshop on Subword and Character Level Models in NLP (1., 2017, Kopenhagen)","roleDisplay":"VerfasserIn","role":"aut"},{"role":"isb","roleDisplay":"Herausgebendes Organ","display":"Association for Computational Linguistics"}],"type":{"bibl":"book","media":"Online-Ressource"},"disp":"Workshop on Subword and Character Level Models in NLP (1. : 2017 : Kopenhagen)First Workshop on Subword and Character Level Models in NLP - proceedings of the workshop","note":["\"Editors: Manaal Faruqui, Hinrich Schuetze, Isabel Trancoso, Yadollah Yaghoobzadeh\" - Startseite der Ressource","Literaturangaben"],"title":[{"title_sort":"First Workshop on Subword and Character Level Models in NLP - proceedings of the workshop","title":"First Workshop on Subword and Character Level Models in NLP - proceedings of the workshop","subtitle":"September 7, 2017, Copenhagen, Denmark : EMNLP 2017"}],"person":[{"role":"edt","roleDisplay":"HerausgeberIn","display":"Faruqui, Manaal","given":"Manaal","family":"Faruqui"},{"role":"edt","roleDisplay":"HerausgeberIn","display":"Schuetze, Hinrich","given":"Hinrich","family":"Schuetze"},{"family":"Trancoso","given":"Isabel","display":"Trancoso, Isabel","roleDisplay":"HerausgeberIn","role":"edt"},{"given":"Yadollah","family":"Yaghoobzadeh","role":"edt","display":"Yaghoobzadeh, Yadollah","roleDisplay":"HerausgeberIn"}],"physDesc":[{"noteIll":"Illustrationen","extent":"1 Online-Ressource (xii, 169 Seiten, 4,14 MB)"}],"id":{"isbn":["9781945626913"],"eki":["1001283872"]},"origin":[{"dateIssuedDisp":"[2017]","publisher":"Association for Computational Linguistics (ACL)","dateIssuedKey":"2017","publisherPlace":"Stroudsburg, PA"}]}],"origin":[{"dateIssuedKey":"2017","dateIssuedDisp":"September 2017"}],"id":{"doi":["10.18653/v1/W17-4117"],"eki":["1870904451"]},"name":{"displayForm":["Bich-Ngoc Do, Ines Rehbein, Anette Frank"]}} 
SRT |a DOBICHNGOCWHATDOWENE2017