A harmonised testsuite for POS tagging of German social media data
We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data,...
Gespeichert in:
| Hauptverfasser: | , , |
|---|---|
| Dokumenttyp: | Kapitel/Artikel Konferenzschrift |
| Sprache: | Deutsch |
| Veröffentlicht: |
29.09.2018
|
| In: |
The 27th International Conference on Computational Linguistics - proceedings of the conference
Year: 2018, Pages: 18-28 |
| Online-Zugang: | Verlag, lizenzpflichtig, Volltext: https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2018/docId/7931 |
| Verfasserangaben: | Ines Rehbein, Josef Ruppenhofer, Victor Zimmermann |
MARC
| LEADER | 00000caa a2200000 c 4500 | ||
|---|---|---|---|
| 001 | 1697045855 | ||
| 003 | DE-627 | ||
| 005 | 20220818065238.0 | ||
| 007 | cr uuu---uuuuu | ||
| 008 | 200504s2018 xx |||||o 00| ||ger c | ||
| 035 | |a (DE-627)1697045855 | ||
| 035 | |a (DE-599)KXP1697045855 | ||
| 035 | |a (OCoLC)1341318179 | ||
| 040 | |a DE-627 |b ger |c DE-627 |e rda | ||
| 041 | |a ger | ||
| 084 | |a 28 |2 sdnb | ||
| 100 | 1 | |a Rehbein, Ines |e VerfasserIn |0 (DE-588)1207353833 |0 (DE-627)1693632373 |4 aut | |
| 245 | 1 | 2 | |a A harmonised testsuite for POS tagging of German social media data |c Ines Rehbein, Josef Ruppenhofer, Victor Zimmermann |
| 264 | 1 | |c 29.09.2018 | |
| 300 | |a 11 | ||
| 336 | |a Text |b txt |2 rdacontent | ||
| 337 | |a Computermedien |b c |2 rdamedia | ||
| 338 | |a Online-Ressource |b cr |2 rdacarrier | ||
| 500 | |a Gesehen am 04.05.2020 | ||
| 520 | |a We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community. | ||
| 700 | 1 | |a Ruppenhofer, Josef |d 1971- |e VerfasserIn |0 (DE-588)132071037 |0 (DE-627)517466287 |0 (DE-576)298927993 |4 aut | |
| 700 | 1 | |a Zimmermann, Victor |e VerfasserIn |0 (DE-588)1209510340 |0 (DE-627)1697044492 |4 aut | |
| 773 | 0 | 8 | |i Enthalten in |a International Conference on Computational Linguistics (27. : 2018 : Santa Fe, NM) |t The 27th International Conference on Computational Linguistics - proceedings of the conference |d [Stroudsburg, PA] : Association for Computational Linguistics, 2018 |g (2018), Seite 18-28 |h 1 Online-Ressource (lxxi, 3927 Seiten) |w (DE-627)1678566128 |z 9781948087506 |7 nnam |
| 773 | 1 | 8 | |g year:2018 |g pages:18-28 |g extent:11 |a A harmonised testsuite for POS tagging of German social media data |
| 787 | 0 | 8 | |i Forschungsdaten |a Rehbein, Ines |t A harmonised testsuite for social media POS tagging (DE) |d Heidelberg : Universität, 2020 |h 1 Online-Ressource (1 File) |w (DE-627)1697044018 |
| 787 | 0 | 8 | |i Forschungsdaten |a Rehbein, Ines |t Pre-trained POS tagging models for German social media |d Heidelberg : Universität, 2020 |h 1 Online-Ressource (4 Files) |w (DE-627)1697046681 |
| 856 | 4 | 0 | |u https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2018/docId/7931 |x Verlag |z lizenzpflichtig |3 Volltext |
| 951 | |a AR | ||
| 992 | |a 20200504 | ||
| 993 | |a ConferencePaper | ||
| 994 | |a 2018 | ||
| 998 | |g 1209510340 |a Zimmermann, Victor |m 1209510340:Zimmermann, Victor |p 3 |y j | ||
| 998 | |g 1207353833 |a Rehbein, Ines |m 1207353833:Rehbein, Ines |p 1 |x j | ||
| 999 | |a KXP-PPN1697045855 |e 3656856702 | ||
| BIB | |a Y | ||
| JSO | |a {"relHost":[{"id":{"eki":["1678566128"],"isbn":["9781948087506"]},"origin":[{"publisher":"Association for Computational Linguistics","dateIssuedKey":"2018","dateIssuedDisp":"2018","publisherPlace":"[Stroudsburg, PA]"}],"physDesc":[{"extent":"1 Online-Ressource (lxxi, 3927 Seiten)","noteIll":"Illustrationen"}],"title":[{"subtitle":"August 20-26, 2018, Santa Fe, New Mexico, USA : COLING 2018","title":"The 27th International Conference on Computational Linguistics - proceedings of the conference","title_sort":"27th International Conference on Computational Linguistics - proceedings of the conference"}],"person":[{"role":"edt","roleDisplay":"HerausgeberIn","display":"Bender, Emily M.","given":"Emily M.","family":"Bender"}],"titleAlt":[{"title":"Proceedings of the 27th International Conference on Computational Linguistics"}],"part":{"extent":"11","text":"(2018), Seite 18-28","pages":"18-28","year":"2018"},"recId":"1678566128","language":["eng"],"corporate":[{"role":"aut","display":"International Conference on Computational Linguistics (27., 2018, Santa Fe, NM)","roleDisplay":"VerfasserIn"},{"display":"Association for Computational Linguistics","roleDisplay":"Herausgebendes Organ","role":"isb"},{"display":"International Committee on Computational Linguistics","role":"oth"}],"note":["\"Emily M. Bender, Leon Derczynski, Pierre Isabelle (Editors)\" - Startseite der Ressource","Literaturangaben","\"On behalf of the International Committee on Computational Linguistics, I am thrilled to welcome you all to the 27th International Conference on Computational Linguistics (COLING 2018) here in Santa Fe, New-Mexico.\" - Seite iii","\"Emily M. Bender, Leon Derczynski, Pierre Isabelle (editors)\" - Startseite der Ressource"],"disp":"International Conference on Computational Linguistics (27. : 2018 : Santa Fe, NM)The 27th International Conference on Computational Linguistics - proceedings of the conference","type":{"media":"Online-Ressource","bibl":"book"}}],"physDesc":[{"extent":"11 S."}],"id":{"eki":["1697045855"]},"origin":[{"dateIssuedDisp":"29.09.2018","dateIssuedKey":"2018"}],"name":{"displayForm":["Ines Rehbein, Josef Ruppenhofer, Victor Zimmermann"]},"recId":"1697045855","language":["ger"],"type":{"bibl":"chapter","media":"Online-Ressource"},"note":["Gesehen am 04.05.2020"],"title":[{"title_sort":"harmonised testsuite for POS tagging of German social media data","title":"A harmonised testsuite for POS tagging of German social media data"}],"person":[{"role":"aut","display":"Rehbein, Ines","roleDisplay":"VerfasserIn","given":"Ines","family":"Rehbein"},{"given":"Josef","family":"Ruppenhofer","role":"aut","display":"Ruppenhofer, Josef","roleDisplay":"VerfasserIn"},{"given":"Victor","family":"Zimmermann","role":"aut","display":"Zimmermann, Victor","roleDisplay":"VerfasserIn"}]} | ||
| SRT | |a REHBEININEHARMONISED2909 | ||