De-Identification of German medical admission notes

Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named enti...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Richter-Pechanski, Phillip (VerfasserIn) , Riezler, Stefan (VerfasserIn) , Dieterich, Christoph (VerfasserIn)
Dokumenttyp: Kapitel/Artikel Konferenzschrift
Sprache:Englisch
Veröffentlicht: [2018]
In: German medical data sciences
Year: 2018, Jahrgang: 253, Pages: 165-169
DOI:10.3233/978-1-61499-896-9-165
Online-Zugang:Resolving-System: https://doi.org/10.3233/978-1-61499-896-9-165
Volltext
Verfasserangaben:Phillip Richter-Pechanski, Stefan Riezler and Christoph Dieterich

MARC

LEADER 00000caa a2200000 c 4500
001 1689724153
003 DE-627
005 20250114090620.0
007 cr uuu---uuuuu
008 200210s2018 xx |||||o 00| ||eng c
024 7 |a 10.3233/978-1-61499-896-9-165  |2 doi 
035 |a (DE-627)1689724153 
035 |a (DE-599)KXP1689724153 
035 |a (OCoLC)1341304218 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Richter-Pechanski, Phillip  |e VerfasserIn  |0 (DE-588)1204395470  |0 (DE-627)1689724056  |4 aut 
245 1 0 |a De-Identification of German medical admission notes  |c Phillip Richter-Pechanski, Stefan Riezler and Christoph Dieterich 
264 1 |c [2018] 
300 |a 5 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 10.02.2020 
520 |a Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named entity recognition using a three-step approach. First, we used well known rule-based models based on regular expressions and gazetteers, second we used a spelling variant detector based on Levenshtein distance, exploiting the fact that the medical texts contain semi-structured headers including sensible personal data, and third we trained a named entity recognition model on out of domain data to add statistical capabilities to our prototype. Using a baseline based on regular expressions and gazetteers we could improve F2-score from 78% to 85% for de-identification. Our prototype is a first step for further research on German medical text de-identification and could show that using spelling variant detection and out of domain trained statistical models can improve de-identification performance significantly. 
650 4 |a anonymization 
650 4 |a Data Anonymization 
650 4 |a De-identification 
650 4 |a Electronic Health Records 
650 4 |a Germany 
650 4 |a medical admission notes 
650 4 |a named entity recognition 
650 4 |a Natural Language Processing 
650 4 |a Patient Admission 
650 4 |a personal health information 
700 1 |a Riezler, Stefan  |e VerfasserIn  |0 (DE-588)1033925454  |0 (DE-627)743677528  |0 (DE-576)381607615  |4 aut 
700 1 |a Dieterich, Christoph  |d 1975-  |e VerfasserIn  |0 (DE-588)130054844  |0 (DE-627)494359269  |0 (DE-576)297972448  |4 aut 
773 0 8 |i Enthalten in  |a Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (63. : 2018 : Osnabrück)  |t German medical data sciences  |d Amsterdam [u.a.] : IOS Press [u.a.], 2018  |g 253(2018), Seite 165-169  |h 1 Online-Ressource  |w (DE-627)1655133330  |w (DE-576)517787385  |z 9781614998969  |7 nnam 
773 1 8 |g volume:253  |g year:2018  |g pages:165-169  |g extent:5  |a De-Identification of German medical admission notes 
856 4 0 |u https://doi.org/10.3233/978-1-61499-896-9-165  |x Resolving-System 
951 |a AR 
992 |a 20200210 
993 |a ConferencePaper 
994 |a 2018 
998 |g 130054844  |a Dieterich, Christoph  |m 130054844:Dieterich, Christoph  |d 910000  |d 910100  |e 910000PD130054844  |e 910100PD130054844  |k 0/910000/  |k 1/910000/910100/  |p 3  |y j 
998 |g 1033925454  |a Riezler, Stefan  |m 1033925454:Riezler, Stefan  |d 90000  |d 90500  |e 90000PR1033925454  |e 90500PR1033925454  |k 0/90000/  |k 1/90000/90500/  |p 2 
998 |g 1204395470  |a Richter-Pechanski, Phillip  |m 1204395470:Richter-Pechanski, Phillip  |p 1  |x j 
999 |a KXP-PPN1689724153  |e 3591798568 
BIB |a Y 
JSO |a {"id":{"eki":["1689724153"],"doi":["10.3233/978-1-61499-896-9-165"]},"name":{"displayForm":["Phillip Richter-Pechanski, Stefan Riezler and Christoph Dieterich"]},"title":[{"title_sort":"De-Identification of German medical admission notes","title":"De-Identification of German medical admission notes"}],"note":["Gesehen am 10.02.2020"],"language":["eng"],"type":{"bibl":"chapter","media":"Online-Ressource"},"person":[{"display":"Richter-Pechanski, Phillip","family":"Richter-Pechanski","given":"Phillip","role":"aut"},{"display":"Riezler, Stefan","family":"Riezler","role":"aut","given":"Stefan"},{"given":"Christoph","role":"aut","family":"Dieterich","display":"Dieterich, Christoph"}],"relHost":[{"corporate":[{"role":"aut","display":"Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (63., 2018, Osnabrück)"}],"name":{"displayForm":["edited by Ursula Hübner, Ulrich Sax, Hans-Ulrich Prokosch, Bernhard Breil, Harald Binder, Antonia Zapf, Brigitte Strahwald, Tim Beißbarth, Niels Grabe, Anke Schöler"]},"id":{"isbn":["9781614998969"],"eki":["1655133330"]},"part":{"extent":"5","text":"253(2018), Seite 165-169","volume":"253","year":"2018","pages":"165-169"},"physDesc":[{"extent":"1 Online-Ressource"}],"relMultPart":[{"origin":[{"dateIssuedDisp":"1991-","dateIssuedKey":"1991","publisherPlace":"Amsterdam ˜[u.a.]œ","publisher":"IOS Press ˜[u.a.]œ"}],"pubHistory":["1.1991 -"],"type":{"bibl":"serial","media":"Online-Ressource"},"language":["eng"],"id":{"zdb":["2708884-4"],"issn":["1879-8365"],"eki":["739899465"]},"title":[{"title_sort":"Studies in health technology and informatics","title":"Studies in health technology and informatics"}],"part":{"number_sort":["253"],"number":["volume 253"]},"physDesc":[{"extent":"Online-Ressource"}],"dispAlt":"Studies in health technology and informatics","recId":"739899465","disp":"Studies in health technology and informatics"}],"recId":"1655133330","disp":"Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (63. : 2018 : Osnabrück)German medical data sciences","origin":[{"publisher":"IOS Press [u.a.]","publisherPlace":"Amsterdam [u.a.]","dateIssuedDisp":"2018","dateIssuedKey":"2018"}],"person":[{"role":"edt","given":"Ursula","family":"Hübner","display":"Hübner, Ursula"}],"language":["eng"],"type":{"media":"Online-Ressource","bibl":"book"},"note":["Gesehen am 19.02.2019"],"title":[{"title_sort":"German medical data sciences","title":"German medical data sciences","subtitle":"a learning healthcare system : proceedings of the 63rd annual meeting of the German Association of Medical Informatics, Biometry and Epidemiology (gmds e.V.) 2018 in Osnabrück, Germany - GMDS 2018"}]}],"origin":[{"dateIssuedDisp":"[2018]","dateIssuedKey":"2018"}],"recId":"1689724153","physDesc":[{"extent":"5 S."}]} 
SRT |a RICHTERPECDEIDENTIFI2018