Detection of schizophrenia spectrum disorder and major depression disorder using automated speech analysis

Objective biomarkers for differential diagnosis in psychiatry are still scarce. Voice atypicalities characterize two prominent, often co-occurring psychiatric disorders: schizophrenia-spectrum disorders (SSD) and major depressive disorders (MDD). Given that voice recordings can be easily obtained, a...

Full description

Saved in:
Bibliographic Details
Main Authors: Hiß, Inka C. (Author) , Krajewski, Jarek (Author) , Canzler, Ulrich (Author) , Leonhardt, Steffen (Author) , Weiss, Christoph (Author) , Clemens, Benjamin (Author) , Habel, Ute (Author)
Format: Article (Journal)
Language:English
Published: Oct.-Dec. 2025
In: IEEE transactions on affective computing
Year: 2025, Volume: 16, Issue: 4, Pages: 2988-2999
ISSN:1949-3045
DOI:10.1109/TAFFC.2025.3564531
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1109/TAFFC.2025.3564531
Verlag, lizenzpflichtig, Volltext: https://ieeexplore.ieee.org/document/11002678
Get full text
Author Notes:Inka C. Hiß, Jarek Krajewski, Ulrich Canzler, Steffen Leonhardt, Christoph Weiss, Benjamin Clemens, and Ute Habel
Description
Summary:Objective biomarkers for differential diagnosis in psychiatry are still scarce. Voice atypicalities characterize two prominent, often co-occurring psychiatric disorders: schizophrenia-spectrum disorders (SSD) and major depressive disorders (MDD). Given that voice recordings can be easily obtained, advanced speech analysis might facilitate the development of diagnostic biomarkers for SSD and MDD. Speech was recorded from a transdiagnostic sample comprising 47 SSD patients, 62 MDD patients, and 41 healthy controls (HC), during three different tasks: a semi-structured interview, a reading task and an empathy task. We evaluated the discriminative power of standardized speech parameters and compared the performance of the three tasks. The extended Geneva Acoustic Minimalistic Parameter Set (eGeMAPS) was extracted using openSMILE and fed into random forest (RF) algorithms with 10-fold cross-validation. Model performances were evaluated using accuracy, F1-score, precision, and recall. Importance of specific predictors was assessed using Gini importance. In this three-class problem, a simple 1-minute video task reached best results with 57% accuracy. The acoustic parameters revealed distinct vocal profiles associated with each disorder. Considering the chance probability of 33%, our results show that automated speech analysis could predict diagnostic classes with good to high accuracy.
Item Description:Online veröffentlicht: 12. Mai 2025
Gesehen am 08.01.2026
Physical Description:Online Resource
ISSN:1949-3045
DOI:10.1109/TAFFC.2025.3564531