Validation of semantic analyses of unstructured medical data for research purposes = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke

Background In secondary data there are often unstructured free texts. The aim of this study was to validate a text mining system to extract unstructured medical data for research purposes. Methods From a radiological department, 1,000 out of 7,102 CT findings were randomly selected. These were manua...

Full description

Saved in:
Bibliographic Details
Main Authors: Pokora, Roman (Author) , Le Cornet, Lucian (Author) , Daumke, Philipp (Author) , Mildenberger, Peter (Author) , Zeeb, Hajo (Author) , Blettner, Maria (Author)
Format: Article (Journal)
Language:English
German
Published: 2020
In: Das Gesundheitswesen
Year: 2019, Volume: 82, Issue: S 02, Pages: S158-S164
ISSN:1439-4421
DOI:10.1055/a-1007-8540
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1055/a-1007-8540
Verlag, lizenzpflichtig, Volltext: http://www.thieme-connect.de/DOI/DOI?10.1055/a-1007-8540
Get full text
Author Notes:Roman Michael Pokora, Lucian Le Cornet, Philipp Daumke, Peter Mildenberger, Hajo Zeeb, Maria Blettner

MARC

LEADER 00000caa a2200000 c 4500
001 1698161204
003 DE-627
005 20241224003334.0
007 cr uuu---uuuuu
008 200513r20202019xx |||||o 00| ||eng c
024 7 |a 10.1055/a-1007-8540  |2 doi 
035 |a (DE-627)1698161204 
035 |a (DE-599)KXP1698161204 
035 |a (OCoLC)1341323734 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng  |a ger 
084 |a 33  |2 sdnb 
100 1 |a Pokora, Roman  |d 1980-  |e VerfasserIn  |0 (DE-588)1150319844  |0 (DE-627)1010601008  |0 (DE-576)497018438  |4 aut 
245 1 0 |a Validation of semantic analyses of unstructured medical data for research purposes  |b  = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke  |c Roman Michael Pokora, Lucian Le Cornet, Philipp Daumke, Peter Mildenberger, Hajo Zeeb, Maria Blettner 
246 3 1 |a Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke 
264 1 |c 2020 
300 |a 7 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Published online: 2019-10-09 
500 |a Gesehen am 13.05.2020 
520 |a Background In secondary data there are often unstructured free texts. The aim of this study was to validate a text mining system to extract unstructured medical data for research purposes. Methods From a radiological department, 1,000 out of 7,102 CT findings were randomly selected. These were manually divided into defined groups by 2 physicians. For automated tagging and reporting, the text analysis software Averbis Extraction Platform (AEP) was used. Special features of the system are a morphological analysis for the decomposition of compound words as well as the recognition of noun phrases, abbreviations and negated statements. Based on the extracted standardized keywords, findings reports were assigned to the given findings groups using machine learning methods. To assess the reliability and validity of the automated process, the automated and two independent manual mappings were compared for matches in multiple runs. Results Manual classification was too time-consuming. In the case of automated keywording, the classification according to ICD-10 turned out to be unsuitable for our data. It also showed that the keyword search does not deliver reliable results. Computer-aided text mining and machine learning resulted in reliable results. The inter-rater reliability of the two manual classifications, as well as the machine and manual classification was very high. Both manual classifications were consistent in 93% of all findings. The kappa coefficient is 0.89 [95% confidence interval (CI) 0.87-0.92]. The automatic classification agreed with the independent, second manual classification in 86% of all findings (Kappa coefficient 0.79 [95% CI 0.75-0.81]). Discussion The classification of the software AEP was very good. In our study, however, it followed a systematic pattern. Most misclassifications were found in findings that indicate an increased risk of cancer. The free-text structure of the findings raises concerns about the feasibility of a purely automated analysis. The combination of human intellect and intelligent, adaptive software appears most suitable for mining unstructured but important textual information for research. 
534 |c 2019 
700 1 |a Le Cornet, Lucian  |e VerfasserIn  |0 (DE-588)1210096501  |0 (DE-627)1698161700  |4 aut 
700 1 |a Daumke, Philipp  |e VerfasserIn  |4 aut 
700 1 |a Mildenberger, Peter  |e VerfasserIn  |4 aut 
700 1 |8 1\p  |a Zeeb, Hajo  |d 1963-  |e VerfasserIn  |0 (DE-588)1144397618  |0 (DE-627)1004724160  |0 (DE-576)166925349  |4 aut 
700 1 |a Blettner, Maria  |e VerfasserIn  |4 aut 
773 0 8 |i Enthalten in  |t Das Gesundheitswesen  |d Stuttgart [u.a.] : Thieme, 2000  |g 82(2020), S 02, Seite S158-S164  |h Online-Ressource  |w (DE-627)333809513  |w (DE-600)2056559-8  |w (DE-576)281687455  |x 1439-4421  |7 nnas  |a Validation of semantic analyses of unstructured medical data for research purposes = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke 
773 1 8 |g volume:82  |g year:2020  |g number:S 02  |g pages:S158-S164  |g extent:7  |a Validation of semantic analyses of unstructured medical data for research purposes = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke 
856 4 0 |u https://doi.org/10.1055/a-1007-8540  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u http://www.thieme-connect.de/DOI/DOI?10.1055/a-1007-8540  |x Verlag  |z lizenzpflichtig  |3 Volltext 
883 |8 1\p  |a cgwrk  |d 20241001  |q DE-101  |u https://d-nb.info/provenance/plan#cgwrk 
951 |a AR 
992 |a 20200513 
993 |a Article 
994 |a 2020 
998 |g 1210096501  |a Le Cornet, Lucian  |m 1210096501:Le Cornet, Lucian  |d 910000  |e 910000PL1210096501  |k 0/910000/  |p 2 
999 |a KXP-PPN1698161204  |e 3666552706 
BIB |a Y 
SER |a journal 
JSO |a {"titleTranslated":[{"translated":"Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke"}],"title":[{"subtitle":" = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für Forschungszwecke","title":"Validation of semantic analyses of unstructured medical data for research purposes","title_sort":"Validation of semantic analyses of unstructured medical data for research purposes"}],"origin":[{"dateIssuedKey":"2020","dateIssuedDisp":"2020"}],"name":{"displayForm":["Roman Michael Pokora, Lucian Le Cornet, Philipp Daumke, Peter Mildenberger, Hajo Zeeb, Maria Blettner"]},"language":["eng","ger"],"person":[{"given":"Roman","roleDisplay":"VerfasserIn","role":"aut","display":"Pokora, Roman","family":"Pokora"},{"display":"Le Cornet, Lucian","family":"Le Cornet","role":"aut","given":"Lucian","roleDisplay":"VerfasserIn"},{"roleDisplay":"VerfasserIn","given":"Philipp","role":"aut","display":"Daumke, Philipp","family":"Daumke"},{"roleDisplay":"VerfasserIn","given":"Peter","role":"aut","display":"Mildenberger, Peter","family":"Mildenberger"},{"roleDisplay":"VerfasserIn","given":"Hajo","role":"aut","display":"Zeeb, Hajo","family":"Zeeb"},{"given":"Maria","roleDisplay":"VerfasserIn","role":"aut","family":"Blettner","display":"Blettner, Maria"}],"physDesc":[{"extent":"7 S."}],"type":{"bibl":"article-journal","media":"Online-Ressource"},"relHost":[{"recId":"333809513","language":["ger"],"part":{"extent":"7","year":"2020","volume":"82","pages":"S158-S164","text":"82(2020), S 02, Seite S158-S164","issue":"S 02"},"id":{"doi":["10.1055/s-00000022"],"issn":["1439-4421"],"eki":["333809513"],"zdb":["2056559-8"]},"physDesc":[{"extent":"Online-Ressource"}],"title":[{"subtitle":"Sozialmedizin, Gesundheits-System-Forschung, public health, öffentlicher Gesundheitsdienst, medizinischer Dienst","title":"Das Gesundheitswesen","title_sort":"Gesundheitswesen"}],"pubHistory":["Nachgewiesen 62.2000 -"],"disp":"Validation of semantic analyses of unstructured medical data for research purposes = Validierung von semantischen Analysen von unstrukturierten medizinischen Daten für ForschungszweckeDas Gesundheitswesen","type":{"bibl":"periodical","media":"Online-Ressource"},"origin":[{"dateIssuedDisp":"2000-","publisher":"Thieme","publisherPlace":"Stuttgart [u.a.]","dateIssuedKey":"2000"}]}],"recId":"1698161204","note":["Published online: 2019-10-09","Gesehen am 13.05.2020"],"id":{"eki":["1698161204"],"doi":["10.1055/a-1007-8540"]}} 
SRT |a POKORAROMAVALIDATION2020