Source code and data for the PhD Thesis "On-premise medical information extraction from German doctor’s letters under clinical constraints"

Dataset overview This dataset contains source code and annotation guidelines used in the PhD thesis: “On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints” Repository structure The dataset is split into five repositories: Source code for Chapter 2.6 De-id...

Full description

Saved in:
Bibliographic Details
Main Author: Richter-Pechanski, Phillip (Author)
Format: Database Research Data
Language:English
Published: Heidelberg Universität 2026-04-21
DOI:10.11588/DATA/USQLMB
Subjects:
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.11588/DATA/USQLMB
Verlag, lizenzpflichtig, Volltext: https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/DATA/USQLMB
Get full text
Author Notes:Phillip Richter-Pechanski
Description
Summary:Dataset overview This dataset contains source code and annotation guidelines used in the PhD thesis: “On-Premise Medical Information Extraction from German Doctor’s Letters under Clinical Constraints” Repository structure The dataset is split into five repositories: Source code for Chapter 2.6 De-identification of German doctor’s letters Source code for Chapter 5 Clinical Section Classification using Pretrained Language Models and Prompting Source code for Chapter 6 Medication Information Extraction using Local Large Language Models Source code for Chapter 7Clinical Application: Medication Trends and Polypharmacy Annotation guidelines for Chapters 2.6, 4, 5, and 7 CARDIO:DE The main dataset used for experiments in Chapters 5, 6, and 7: CARDIO:DE - https://doi.org/10.11588/DATA/AFYQDY Additional datasets (not included here) Other datasets used include: n2c2 2018 Track 2 (used in Chapter 6) - https://doi.org/10.1093/jamia/ocz166 Notes on additional data and model availability Doctor’s letters from the cardiology domain used in Chapters 2, 5, 6, and 7 (except for CARDIO:DE) and all further-pretrained and finetuned models cannot be distributed due to data protection regulations.
Item Description:Gesehen am 27.04.2026
Physical Description:Online Resource
DOI:10.11588/DATA/USQLMB