De-Identification of German medical admission notes
Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named enti...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Chapter/Article Conference Paper |
| Language: | English |
| Published: |
[2018]
|
| In: |
German medical data sciences
Year: 2018, Volume: 253, Pages: 165-169 |
| DOI: | 10.3233/978-1-61499-896-9-165 |
| Online Access: | Resolving-System: https://doi.org/10.3233/978-1-61499-896-9-165 |
| Author Notes: | Phillip Richter-Pechanski, Stefan Riezler and Christoph Dieterich |
| Summary: | Medical texts are a vast resource for medical and computational research. In contrast to newswire or wikipedia texts medical texts need to be de-identified before making them accessible to a wider NLP research community. We created a prototype for German medical text de-identification and named entity recognition using a three-step approach. First, we used well known rule-based models based on regular expressions and gazetteers, second we used a spelling variant detector based on Levenshtein distance, exploiting the fact that the medical texts contain semi-structured headers including sensible personal data, and third we trained a named entity recognition model on out of domain data to add statistical capabilities to our prototype. Using a baseline based on regular expressions and gazetteers we could improve F2-score from 78% to 85% for de-identification. Our prototype is a first step for further research on German medical text de-identification and could show that using spelling variant detection and out of domain trained statistical models can improve de-identification performance significantly. |
|---|---|
| Item Description: | Gesehen am 10.02.2020 |
| Physical Description: | Online Resource |
| ISBN: | 9781614998969 |
| DOI: | 10.3233/978-1-61499-896-9-165 |