Semantic search helper: a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility study

BackgroundRecent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, sp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Gottfried, Karl (VerfasserIn) , Janson, Karina (VerfasserIn) , Holz, Nathalie E. (VerfasserIn) , Reis, Olaf (VerfasserIn) , Kornhuber, Johannes (VerfasserIn) , Eichler, Anna (VerfasserIn) , Banaschewski, Tobias (VerfasserIn) , Nees, Frauke (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 20 January 2025
In: European psychiatry
Year: 2025, Jahrgang: 68, Heft: 1, Pages: 1-12
ISSN:1778-3585
DOI:10.1192/j.eurpsy.2024.1808
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1192/j.eurpsy.2024.1808
Verlag, lizenzpflichtig, Volltext: http://www.cambridge.org/core/journals/european-psychiatry/article/semantic-search-helper-a-tool-based-on-the-use-of-embeddings-in-multiitem-questionnaires-as-a-harmonization-opportunity-for-merging-large-datasets-a-feasibility-study/CE6873B9599B525B95E09ED8D4CAF48B
Volltext
Verfasserangaben:Karl Gottfried, Karina Janson, Nathalie E. Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees and IMAC-Mind Consortium

MARC

LEADER 00000caa a2200000 c 4500
001 1929770383
003 DE-627
005 20251001171913.0
007 cr uuu---uuuuu
008 250704s2025 xx |||||o 00| ||eng c
024 7 |a 10.1192/j.eurpsy.2024.1808  |2 doi 
035 |a (DE-627)1929770383 
035 |a (DE-599)KXP1929770383 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Gottfried, Karl  |e VerfasserIn  |0 (DE-588)1372584692  |0 (DE-627)1931894124  |4 aut 
245 1 0 |a Semantic search helper  |b a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility study  |c Karl Gottfried, Karina Janson, Nathalie E. Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees and IMAC-Mind Consortium 
264 1 |c 20 January 2025 
300 |b Illustrationen, Diagramme 
300 |a 12 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Gesehen am 28.07.2025 
520 |a BackgroundRecent advances in natural language processing (NLP), particularly in language processing methods, have opened new avenues in semantic data analysis. A promising application of NLP is data harmonization in questionnaire-based cohort studies, where it can be used as an additional method, specifically when only different instruments are available for one construct as well as for the evaluation of potentially new construct-constellations. The present article therefore explores embedding models’ potential to detect opportunities for semantic harmonization.MethodsUsing models like SBERT and OpenAI’s ADA, we developed a prototype application (“Semantic Search Helper”) to facilitate the harmonization process of detecting semantically similar items within extensive health-related datasets. The approach’s feasibility and applicability were evaluated through a use case analysis involving data from four large cohort studies with heterogeneous data obtained with a different set of instruments for common constructs.ResultsWith the prototype, we effectively identified potential harmonization pairs, which significantly reduced manual evaluation efforts. Expert ratings of semantic similarity candidates showed high agreement with model-generated pairs, confirming the validity of our approach.ConclusionsThis study demonstrates the potential of embeddings in matching semantic similarity as a promising add-on tool to assist harmonization processes of multiplex data sets and instruments but with similar content, within and across studies. 
650 4 |a big data 
650 4 |a harmonization 
650 4 |a natural language processing 
650 4 |a questionnaires 
650 4 |a semantic 
700 1 |a Janson, Karina  |d 1995-  |e VerfasserIn  |0 (DE-588)1344628796  |0 (DE-627)1905482493  |4 aut 
700 1 |a Holz, Nathalie E.  |d 1985-  |e VerfasserIn  |0 (DE-588)1093240776  |0 (DE-627)853204470  |0 (DE-576)462656926  |4 aut 
700 1 |a Reis, Olaf  |d 1963-  |e VerfasserIn  |0 (DE-588)136255760  |0 (DE-627)577822969  |0 (DE-576)171718593  |4 aut 
700 1 |a Kornhuber, Johannes  |e VerfasserIn  |4 aut 
700 1 |a Eichler, Anna  |e VerfasserIn  |4 aut 
700 1 |a Banaschewski, Tobias  |d 1961-  |e VerfasserIn  |0 (DE-588)115856110  |0 (DE-627)507227301  |0 (DE-576)178364703  |4 aut 
700 1 |a Nees, Frauke  |e VerfasserIn  |0 (DE-588)140286527  |0 (DE-627)703627244  |0 (DE-576)325389764  |4 aut 
773 0 8 |i Enthalten in  |t European psychiatry  |d Cambridge : Cambridge University Press, 1991  |g 68(2025), 1, Artikel-ID e8, Seite 1-12  |h Online-Ressource  |w (DE-627)320445070  |w (DE-600)2005377-0  |w (DE-576)10684606X  |x 1778-3585  |7 nnas  |a Semantic search helper a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility study 
773 1 8 |g volume:68  |g year:2025  |g number:1  |g elocationid:e8  |g pages:1-12  |g extent:12  |a Semantic search helper a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility study 
856 4 0 |u https://doi.org/10.1192/j.eurpsy.2024.1808  |x Verlag  |x Resolving-System  |z lizenzpflichtig  |3 Volltext 
856 4 0 |u http://www.cambridge.org/core/journals/european-psychiatry/article/semantic-search-helper-a-tool-based-on-the-use-of-embeddings-in-multiitem-questionnaires-as-a-harmonization-opportunity-for-merging-large-datasets-a-feasibility-study/CE6873B9599B525B95E09ED8D4CAF48B  |x Verlag  |z lizenzpflichtig  |3 Volltext 
951 |a AR 
992 |a 20250728 
993 |a Article 
994 |a 2025 
998 |g 115856110  |a Banaschewski, Tobias  |m 115856110:Banaschewski, Tobias  |d 60000  |e 60000PB115856110  |k 0/60000/  |p 7 
998 |g 1093240776  |a Holz, Nathalie E.  |m 1093240776:Holz, Nathalie E.  |d 60000  |e 60000PH1093240776  |k 0/60000/  |p 3 
998 |g 1344628796  |a Janson, Karina  |m 1344628796:Janson, Karina  |d 60000  |e 60000PJ1344628796  |k 0/60000/  |p 2 
998 |g 1372584692  |a Gottfried, Karl  |m 1372584692:Gottfried, Karl  |d 60000  |e 60000PG1372584692  |k 0/60000/  |p 1  |x j 
999 |a KXP-PPN1929770383  |e 4750099155 
BIB |a Y 
SER |a journal 
JSO |a {"note":["Gesehen am 28.07.2025"],"origin":[{"dateIssuedDisp":"20 January 2025","dateIssuedKey":"2025"}],"title":[{"title":"Semantic search helper","title_sort":"Semantic search helper","subtitle":"a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility study"}],"type":{"media":"Online-Ressource","bibl":"article-journal"},"recId":"1929770383","relHost":[{"part":{"issue":"1","text":"68(2025), 1, Artikel-ID e8, Seite 1-12","year":"2025","pages":"1-12","volume":"68","extent":"12"},"language":["eng"],"type":{"media":"Online-Ressource","bibl":"periodical"},"disp":"Semantic search helper a tool based on the use of embeddings in multi-item questionnaires as a harmonization opportunity for merging large datasets : a feasibility studyEuropean psychiatry","title":[{"title":"European psychiatry","subtitle":"the official journal of the Association of European Psychiatrists (AEP)","title_sort":"European psychiatry"}],"pubHistory":["Nachgewiesen 6.1991 -"],"origin":[{"publisherPlace":"Cambridge ; Amsterdam","dateIssuedDisp":"1991-","dateIssuedKey":"1991","publisher":"Cambridge University Press ; Elsevier Science"}],"note":["Gesehen am 20.05.2020","Fortsetzung der Druck-Ausgabe"],"id":{"eki":["320445070"],"zdb":["2005377-0"],"issn":["1778-3585"]},"physDesc":[{"extent":"Online-Ressource"}],"recId":"320445070"}],"name":{"displayForm":["Karl Gottfried, Karina Janson, Nathalie E. Holz, Olaf Reis, Johannes Kornhuber, Anna Eichler, Tobias Banaschewski, Frauke Nees and IMAC-Mind Consortium"]},"physDesc":[{"noteIll":"Illustrationen, Diagramme","extent":"12 S."}],"id":{"eki":["1929770383"],"doi":["10.1192/j.eurpsy.2024.1808"]},"person":[{"display":"Gottfried, Karl","family":"Gottfried","role":"aut","given":"Karl"},{"role":"aut","given":"Karina","display":"Janson, Karina","family":"Janson"},{"display":"Holz, Nathalie E.","family":"Holz","role":"aut","given":"Nathalie E."},{"role":"aut","given":"Olaf","family":"Reis","display":"Reis, Olaf"},{"role":"aut","given":"Johannes","family":"Kornhuber","display":"Kornhuber, Johannes"},{"family":"Eichler","display":"Eichler, Anna","role":"aut","given":"Anna"},{"family":"Banaschewski","display":"Banaschewski, Tobias","role":"aut","given":"Tobias"},{"family":"Nees","display":"Nees, Frauke","given":"Frauke","role":"aut"}],"language":["eng"]} 
SRT |a GOTTFRIEDKSEMANTICSE2020