Word embeddings for entity-annotated texts

Many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizen...

Full description

Saved in:
Bibliographic Details
Main Authors: Almasian, Satya (Author) , Spitz, Andreas (Author) , Gertz, Michael (Author)
Format: Article (Journal) Chapter/Article
Language:English
Published: 6 Feb 2019
In: Arxiv

Online Access:Verlag, Volltext: http://arxiv.org/abs/1902.02078
Get full text
Author Notes:Satya Almasian, Andreas Spitz, and Michael Gertz

MARC

LEADER 00000caa a2200000 c 4500
001 1587741687
003 DE-627
005 20220815104433.0
007 cr uuu---uuuuu
008 190218s2019 xx |||||o 00| ||eng c
035 |a (DE-627)1587741687 
035 |a (DE-576)517741687 
035 |a (DE-599)BSZ517741687 
035 |a (OCoLC)1341038517 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Almasian, Satya  |e VerfasserIn  |0 (DE-588)1178432440  |0 (DE-627)1049356586  |0 (DE-576)517741652  |4 aut 
245 1 0 |a Word embeddings for entity-annotated texts  |c Satya Almasian, Andreas Spitz, and Michael Gertz 
264 1 |c 6 Feb 2019 
300 |a 15 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 19.02.2019 
520 |a Many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training, corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance. 
650 4 |a Computer Science - Computation and Language 
700 1 |a Spitz, Andreas  |d 1981-  |e VerfasserIn  |0 (DE-588)1071830643  |0 (DE-627)826370527  |0 (DE-576)43334105X  |4 aut 
700 1 |a Gertz, Michael  |e VerfasserIn  |0 (DE-588)1038076579  |0 (DE-627)756636973  |0 (DE-576)392095645  |4 aut 
773 0 8 |i Enthalten in  |t Arxiv  |d Ithaca, NY : Cornell University, 1991  |g (2019) Artikel-Nummer 1902.02078, 15 Seiten  |h Online-Ressource  |w (DE-627)509006531  |w (DE-600)2225896-6  |w (DE-576)28130436X  |7 nnas  |a Word embeddings for entity-annotated texts 
773 1 8 |g year:2019  |g extent:15  |a Word embeddings for entity-annotated texts 
856 4 0 |u http://arxiv.org/abs/1902.02078  |x Verlag  |3 Volltext 
951 |a AR 
992 |a 20190218 
993 |a Article 
994 |a 2019 
998 |g 1038076579  |a Gertz, Michael  |m 1038076579:Gertz, Michael  |d 110000  |d 110300  |e 110000PG1038076579  |e 110300PG1038076579  |k 0/110000/  |k 1/110000/110300/  |p 3  |y j 
998 |g 1071830643  |a Spitz, Andreas  |m 1071830643:Spitz, Andreas  |d 110000  |d 110300  |e 110000PS1071830643  |e 110300PS1071830643  |k 0/110000/  |k 1/110000/110300/  |p 2 
998 |g 1178432440  |a Almasian, Satya  |m 1178432440:Almasian, Satya  |d 110000  |d 110300  |e 110000PA1178432440  |e 110300PA1178432440  |k 0/110000/  |k 1/110000/110300/  |p 1  |x j 
999 |a KXP-PPN1587741687  |e 3056123097 
BIB |a Y 
JSO |a {"name":{"displayForm":["Satya Almasian, Andreas Spitz, and Michael Gertz"]},"id":{"eki":["1587741687"]},"physDesc":[{"extent":"15 S."}],"recId":"1587741687","person":[{"display":"Almasian, Satya","given":"Satya","role":"aut","family":"Almasian"},{"display":"Spitz, Andreas","family":"Spitz","given":"Andreas","role":"aut"},{"family":"Gertz","role":"aut","given":"Michael","display":"Gertz, Michael"}],"relHost":[{"id":{"eki":["509006531"],"zdb":["2225896-6"]},"part":{"extent":"15","text":"(2019) Artikel-Nummer 1902.02078, 15 Seiten","year":"2019"},"pubHistory":["1991 -"],"titleAlt":[{"title":"Arxiv.org"},{"title":"Arxiv.org e-print archive"},{"title":"Arxiv e-print archive"},{"title":"De.arxiv.org"}],"language":["eng"],"type":{"media":"Online-Ressource","bibl":"edited-book"},"note":["Gesehen am 28.05.2024"],"title":[{"title":"Arxiv","title_sort":"Arxiv"}],"origin":[{"dateIssuedKey":"1991","dateIssuedDisp":"1991-","publisherPlace":"Ithaca, NY ; [Erscheinungsort nicht ermittelbar]","publisher":"Cornell University ; Arxiv.org"}],"disp":"Word embeddings for entity-annotated textsArxiv","physDesc":[{"extent":"Online-Ressource"}],"recId":"509006531"}],"origin":[{"dateIssuedDisp":"6 Feb 2019","dateIssuedKey":"2019"}],"note":["Gesehen am 19.02.2019"],"type":{"media":"Online-Ressource","bibl":"chapter"},"language":["eng"],"title":[{"title_sort":"Word embeddings for entity-annotated texts","title":"Word embeddings for entity-annotated texts"}]} 
SRT |a ALMASIANSAWORDEMBEDD6201