Privacy-preserving large language models for structured medical information retrieval

Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) “Llama 2” to extract quantitative information from clinical text and evaluates its performance in identifying features o...

Full description

Saved in:
Bibliographic Details
Main Authors: Wiest, Isabella (Author) , Ferber, Dyke (Author) , Zhu, Jiefu (Author) , van Treeck, Marko (Author) , Meyer, Sonja K. (Author) , Juglan, Radhika (Author) , Carrero, Zunamys I. (Author) , Paech, Daniel (Author) , Kleesiek, Jens Philipp (Author) , Ebert, Matthias (Author) , Truhn, Daniel (Author) , Kather, Jakob Nikolas (Author)
Format: Article (Journal)
Language:English
Published: 20 September 2024
In: npj digital medicine
Year: 2024, Volume: 7, Issue: 1, Pages: 1-9
ISSN:2398-6352
DOI:10.1038/s41746-024-01233-2
Online Access:Verlag, kostenfrei, Volltext: https://doi.org/10.1038/s41746-024-01233-2
Verlag, kostenfrei, Volltext: https://www.nature.com/articles/s41746-024-01233-2
Get full text
Author Notes:Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn & Jakob Nikolas Kather

MARC

LEADER 00000caa a2200000 c 4500
001 1906325219
003 DE-627
005 20241205181858.0
007 cr uuu---uuuuu
008 241021s2024 xx |||||o 00| ||eng c
024 7 |a 10.1038/s41746-024-01233-2  |2 doi 
035 |a (DE-627)1906325219 
035 |a (DE-599)KXP1906325219 
035 |a (OCoLC)1475316209 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 33  |2 sdnb 
100 1 |a Wiest, Isabella  |d 1992-  |e VerfasserIn  |0 (DE-588)1198882956  |0 (DE-627)168103638X  |4 aut 
245 1 0 |a Privacy-preserving large language models for structured medical information retrieval  |c Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn & Jakob Nikolas Kather 
264 1 |c 20 September 2024 
300 |b Illustrationen 
300 |a 9 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 21.10.2024 
520 |a Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) “Llama 2” to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements. 
650 4 |a Digestive signs and symptoms 
650 4 |a Health care 
650 4 |a Liver diseases 
700 1 |a Ferber, Dyke  |e VerfasserIn  |0 (DE-588)1171467079  |0 (DE-627)1040545629  |0 (DE-576)513746056  |4 aut 
700 1 |a Zhu, Jiefu  |e VerfasserIn  |4 aut 
700 1 |a van Treeck, Marko  |e VerfasserIn  |4 aut 
700 1 |a Meyer, Sonja K.  |e VerfasserIn  |4 aut 
700 1 |a Juglan, Radhika  |e VerfasserIn  |4 aut 
700 1 |a Carrero, Zunamys I.  |e VerfasserIn  |4 aut 
700 1 |a Paech, Daniel  |d 1986-  |e VerfasserIn  |0 (DE-588)1080278214  |0 (DE-627)844124893  |0 (DE-576)453464742  |4 aut 
700 1 |a Kleesiek, Jens Philipp  |d 1977-  |e VerfasserIn  |0 (DE-588)132998076  |0 (DE-627)530080745  |0 (DE-576)299554465  |4 aut 
700 1 |a Ebert, Matthias  |d 1968-  |e VerfasserIn  |0 (DE-588)1030133522  |0 (DE-627)734827083  |0 (DE-576)377938432  |4 aut 
700 1 |a Truhn, Daniel  |e VerfasserIn  |0 (DE-588)1047348306  |0 (DE-627)778145913  |0 (DE-576)400927314  |4 aut 
700 1 |a Kather, Jakob Nikolas  |d 1989-  |e VerfasserIn  |0 (DE-588)1064064914  |0 (DE-627)812897587  |0 (DE-576)423589091  |4 aut 
773 0 8 |i Enthalten in  |t npj digital medicine  |d [Basingstoke] : Macmillan Publishers Limited, 2016  |g 7(2024), 1, Seite 1-9  |h Online-Ressource  |w (DE-627)1016587104  |w (DE-600)2925182-5  |w (DE-576)501513582  |x 2398-6352  |7 nnas  |a Privacy-preserving large language models for structured medical information retrieval 
773 1 8 |g volume:7  |g year:2024  |g number:1  |g pages:1-9  |g extent:9  |a Privacy-preserving large language models for structured medical information retrieval 
856 4 0 |u https://doi.org/10.1038/s41746-024-01233-2  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
856 4 0 |u https://www.nature.com/articles/s41746-024-01233-2  |x Verlag  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20241021 
993 |a Article 
994 |a 2024 
998 |g 1064064914  |a Kather, Jakob Nikolas  |m 1064064914:Kather, Jakob Nikolas  |d 910000  |d 910100  |e 910000PK1064064914  |e 910100PK1064064914  |k 0/910000/  |k 1/910000/910100/  |p 12  |y j 
998 |g 1030133522  |a Ebert, Matthias  |m 1030133522:Ebert, Matthias  |d 60000  |d 61100  |e 60000PE1030133522  |e 61100PE1030133522  |k 0/60000/  |k 1/60000/61100/  |p 10 
998 |g 132998076  |a Kleesiek, Jens Philipp  |m 132998076:Kleesiek, Jens Philipp  |d 50000  |e 50000PK132998076  |k 0/50000/  |p 9 
998 |g 1080278214  |a Paech, Daniel  |m 1080278214:Paech, Daniel  |d 50000  |e 50000PP1080278214  |k 0/50000/  |p 8 
998 |g 1171467079  |a Ferber, Dyke  |m 1171467079:Ferber, Dyke  |d 910000  |d 910100  |e 910000PF1171467079  |e 910100PF1171467079  |k 0/910000/  |k 1/910000/910100/  |p 2 
998 |g 1198882956  |a Wiest, Isabella  |m 1198882956:Wiest, Isabella  |d 60000  |d 61100  |e 60000PW1198882956  |e 61100PW1198882956  |k 0/60000/  |k 1/60000/61100/  |p 1  |x j 
999 |a KXP-PPN1906325219  |e 4600486730 
BIB |a Y 
SER |a journal 
JSO |a {"id":{"doi":["10.1038/s41746-024-01233-2"],"eki":["1906325219"]},"name":{"displayForm":["Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn & Jakob Nikolas Kather"]},"title":[{"title_sort":"Privacy-preserving large language models for structured medical information retrieval","title":"Privacy-preserving large language models for structured medical information retrieval"}],"note":["Gesehen am 21.10.2024"],"type":{"media":"Online-Ressource","bibl":"article-journal"},"language":["eng"],"relHost":[{"physDesc":[{"extent":"Online-Ressource"}],"recId":"1016587104","disp":"Privacy-preserving large language models for structured medical information retrievalnpj digital medicine","origin":[{"dateIssuedDisp":"[2016]-","publisherPlace":"[Basingstoke]","publisher":"Macmillan Publishers Limited"}],"pubHistory":["2016-"],"language":["eng"],"type":{"media":"Online-Ressource","bibl":"periodical"},"note":["Gesehen am 06. September 2019"],"id":{"zdb":["2925182-5"],"eki":["1016587104"],"issn":["2398-6352"]},"title":[{"title":"npj digital medicine","title_sort":"npj digital medicine"}],"part":{"year":"2024","volume":"7","pages":"1-9","text":"7(2024), 1, Seite 1-9","extent":"9","issue":"1"}}],"person":[{"family":"Wiest","given":"Isabella","role":"aut","display":"Wiest, Isabella"},{"family":"Ferber","given":"Dyke","role":"aut","display":"Ferber, Dyke"},{"role":"aut","given":"Jiefu","family":"Zhu","display":"Zhu, Jiefu"},{"family":"van Treeck","given":"Marko","role":"aut","display":"van Treeck, Marko"},{"display":"Meyer, Sonja K.","family":"Meyer","given":"Sonja K.","role":"aut"},{"display":"Juglan, Radhika","given":"Radhika","role":"aut","family":"Juglan"},{"display":"Carrero, Zunamys I.","family":"Carrero","role":"aut","given":"Zunamys I."},{"given":"Daniel","role":"aut","family":"Paech","display":"Paech, Daniel"},{"family":"Kleesiek","given":"Jens Philipp","role":"aut","display":"Kleesiek, Jens Philipp"},{"display":"Ebert, Matthias","given":"Matthias","role":"aut","family":"Ebert"},{"display":"Truhn, Daniel","family":"Truhn","role":"aut","given":"Daniel"},{"display":"Kather, Jakob Nikolas","family":"Kather","given":"Jakob Nikolas","role":"aut"}],"origin":[{"dateIssuedDisp":"20 September 2024","dateIssuedKey":"2024"}],"recId":"1906325219","physDesc":[{"noteIll":"Illustrationen","extent":"9 S."}]} 
SRT |a WIESTISABEPRIVACYPRE2020