Lifting the curse from high-dimensional data: automated projection pursuit clustering for a variety of biological data modalities

Unsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structures, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, hi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Simpson, Claire (VerfasserIn) , Tabatsky, Evgeniy (VerfasserIn) , Rahil, Zainab (VerfasserIn) , Eddins, Devon J (VerfasserIn) , Tkachev, Sasha (VerfasserIn) , Georgescauld, Florian (VerfasserIn) , Papalegis, Derek (VerfasserIn) , Culka, Martin (VerfasserIn) , Levy, Tyler (VerfasserIn) , Gregoretti, Ivan (VerfasserIn) , Meehan, Connor (VerfasserIn) , Schiller, Chiara (VerfasserIn) , Bestak, Kresimir (VerfasserIn) , Schapiro, Denis (VerfasserIn) , Chernyshev, Andrei (VerfasserIn) , Walther, Guenther (VerfasserIn) , Ghosn, Eliver E B (VerfasserIn) , Orlova, Darya (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: 2025
In: GigaScience
Year: 2025, Jahrgang: 14, Pages: 1-20
ISSN:2047-217X
DOI:10.1093/gigascience/giaf052
Online-Zugang:Verlag, kostenfrei, Volltext: https://doi.org/10.1093/gigascience/giaf052
Volltext
Verfasserangaben:Claire Simpson, Evgeniy Tabatsky, Zainab Rahil, Devon J. Eddins, Sasha Tkachev, Florian Georgescauld, Derek Papalegis, Martin Culka, Tyler Levy, Ivan Gregoretti, Connor Meehan, Chiara Schiller, Kresimir Bestak, Denis Schapiro, Andrei Chernyshev, Guenther Walther, Eliver E.B. Ghosn, and Darya Orlova

MARC

LEADER 00000naa a2200000 c 4500
001 1936547872
003 DE-627
005 20250923114640.0
007 cr uuu---uuuuu
008 250923s2025 xx |||||o 00| ||eng c
024 7 |a 10.1093/gigascience/giaf052  |2 doi 
035 |a (DE-627)1936547872 
035 |a (DE-599)KXP1936547872 
040 |a DE-627  |b ger  |c DE-627  |e rda 
041 |a eng 
084 |a 28  |2 sdnb 
100 1 |a Simpson, Claire  |e VerfasserIn  |0 (DE-588)1377216632  |0 (DE-627)1936549697  |4 aut 
245 1 0 |a Lifting the curse from high-dimensional data  |b automated projection pursuit clustering for a variety of biological data modalities  |c Claire Simpson, Evgeniy Tabatsky, Zainab Rahil, Devon J. Eddins, Sasha Tkachev, Florian Georgescauld, Derek Papalegis, Martin Culka, Tyler Levy, Ivan Gregoretti, Connor Meehan, Chiara Schiller, Kresimir Bestak, Denis Schapiro, Andrei Chernyshev, Guenther Walther, Eliver E.B. Ghosn, and Darya Orlova 
264 1 |c 2025 
300 |b Illustrationen 
300 |a 20 
336 |a Text  |b txt  |2 rdacontent 
337 |a Computermedien  |b c  |2 rdamedia 
338 |a Online-Ressource  |b cr  |2 rdacarrier 
500 |a Veröffentlicht: 29. Mai 2025 
500 |a Gesehen am 23.09.2025 
520 |a Unsupervised clustering is a powerful machine-learning technique widely used to analyze high-dimensional biological data. It plays a crucial role in uncovering patterns, structures, and inherent relationships within complex datasets without relying on predefined labels. In the context of biology, high-dimensional data may include transcriptomics, proteomics, and a variety of single-cell omics data. Most existing clustering algorithms operate directly in the high-dimensional space, and their performance may be negatively affected by the phenomenon known as the curse of dimensionality. Here, we show an alternative clustering approach that alleviates the curse by sequentially projecting high-dimensional data into a low-dimensional representation. We validated the effectiveness of our approach, named automated projection pursuit (APP), across various biological data modalities, including flow and mass cytometry data, scRNA-seq, multiplex imaging data, and T-cell receptor repertoire data. APP efficiently recapitulated experimentally validated cell-type definitions and revealed new biologically meaningful patterns. 
700 1 |a Tabatsky, Evgeniy  |e VerfasserIn  |4 aut 
700 1 |a Rahil, Zainab  |e VerfasserIn  |4 aut 
700 1 |a Eddins, Devon J  |e VerfasserIn  |4 aut 
700 1 |a Tkachev, Sasha  |e VerfasserIn  |4 aut 
700 1 |a Georgescauld, Florian  |e VerfasserIn  |4 aut 
700 1 |a Papalegis, Derek  |e VerfasserIn  |4 aut 
700 1 |a Culka, Martin  |e VerfasserIn  |4 aut 
700 1 |a Levy, Tyler  |e VerfasserIn  |4 aut 
700 1 |a Gregoretti, Ivan  |e VerfasserIn  |4 aut 
700 1 |a Meehan, Connor  |e VerfasserIn  |4 aut 
700 1 |a Schiller, Chiara  |e VerfasserIn  |0 (DE-588)1377216861  |0 (DE-627)193655030X  |4 aut 
700 1 |a Bestak, Kresimir  |e VerfasserIn  |0 (DE-588)1372834605  |0 (DE-627)1932182837  |4 aut 
700 1 |a Schapiro, Denis  |d 1986-  |e VerfasserIn  |0 (DE-588)1165616769  |0 (DE-627)1029684693  |0 (DE-576)51047666X  |4 aut 
700 1 |a Chernyshev, Andrei  |e VerfasserIn  |4 aut 
700 1 |a Walther, Guenther  |e VerfasserIn  |4 aut 
700 1 |a Ghosn, Eliver E B  |e VerfasserIn  |4 aut 
700 1 |a Orlova, Darya  |e VerfasserIn  |4 aut 
773 0 8 |i Enthalten in  |t GigaScience  |d Oxford : Oxford University Press, 2012  |g 14(2025), Artikel-ID giaf052, Seite 1-20  |h Online-Ressource  |w (DE-627)739900358  |w (DE-600)2708999-X  |w (DE-576)380472600  |x 2047-217X  |7 nnas  |a Lifting the curse from high-dimensional data automated projection pursuit clustering for a variety of biological data modalities 
773 1 8 |g volume:14  |g year:2025  |g elocationid:giaf052  |g pages:1-20  |g extent:20  |a Lifting the curse from high-dimensional data automated projection pursuit clustering for a variety of biological data modalities 
856 4 0 |u https://doi.org/10.1093/gigascience/giaf052  |x Verlag  |x Resolving-System  |z kostenfrei  |3 Volltext 
951 |a AR 
992 |a 20250923 
993 |a Article 
994 |a 2025 
998 |g 1165616769  |a Schapiro, Denis  |m 1165616769:Schapiro, Denis  |d 910000  |d 912900  |e 910000PS1165616769  |e 912900PS1165616769  |k 0/910000/  |k 1/910000/912900/  |p 14 
998 |g 1372834605  |a Bestak, Kresimir  |m 1372834605:Bestak, Kresimir  |d 910000  |d 912900  |e 910000PB1372834605  |e 912900PB1372834605  |k 0/910000/  |k 1/910000/912900/  |p 13 
998 |g 1377216861  |a Schiller, Chiara  |m 1377216861:Schiller, Chiara  |d 910000  |d 912900  |e 910000PS1377216861  |e 912900PS1377216861  |k 0/910000/  |k 1/910000/912900/  |p 12 
999 |a KXP-PPN1936547872  |e 4775288903 
BIB |a Y 
SER |a journal 
JSO |a {"title":[{"title_sort":"Lifting the curse from high-dimensional data","title":"Lifting the curse from high-dimensional data","subtitle":"automated projection pursuit clustering for a variety of biological data modalities"}],"person":[{"role":"aut","roleDisplay":"VerfasserIn","display":"Simpson, Claire","given":"Claire","family":"Simpson"},{"given":"Evgeniy","family":"Tabatsky","role":"aut","roleDisplay":"VerfasserIn","display":"Tabatsky, Evgeniy"},{"given":"Zainab","family":"Rahil","role":"aut","display":"Rahil, Zainab","roleDisplay":"VerfasserIn"},{"roleDisplay":"VerfasserIn","display":"Eddins, Devon J","role":"aut","family":"Eddins","given":"Devon J"},{"roleDisplay":"VerfasserIn","display":"Tkachev, Sasha","role":"aut","family":"Tkachev","given":"Sasha"},{"family":"Georgescauld","given":"Florian","display":"Georgescauld, Florian","roleDisplay":"VerfasserIn","role":"aut"},{"role":"aut","display":"Papalegis, Derek","roleDisplay":"VerfasserIn","given":"Derek","family":"Papalegis"},{"given":"Martin","family":"Culka","role":"aut","display":"Culka, Martin","roleDisplay":"VerfasserIn"},{"role":"aut","roleDisplay":"VerfasserIn","display":"Levy, Tyler","given":"Tyler","family":"Levy"},{"given":"Ivan","family":"Gregoretti","role":"aut","display":"Gregoretti, Ivan","roleDisplay":"VerfasserIn"},{"given":"Connor","family":"Meehan","role":"aut","roleDisplay":"VerfasserIn","display":"Meehan, Connor"},{"role":"aut","display":"Schiller, Chiara","roleDisplay":"VerfasserIn","given":"Chiara","family":"Schiller"},{"role":"aut","roleDisplay":"VerfasserIn","display":"Bestak, Kresimir","given":"Kresimir","family":"Bestak"},{"roleDisplay":"VerfasserIn","display":"Schapiro, Denis","role":"aut","family":"Schapiro","given":"Denis"},{"role":"aut","display":"Chernyshev, Andrei","roleDisplay":"VerfasserIn","given":"Andrei","family":"Chernyshev"},{"display":"Walther, Guenther","roleDisplay":"VerfasserIn","role":"aut","family":"Walther","given":"Guenther"},{"display":"Ghosn, Eliver E B","roleDisplay":"VerfasserIn","role":"aut","family":"Ghosn","given":"Eliver E B"},{"given":"Darya","family":"Orlova","role":"aut","roleDisplay":"VerfasserIn","display":"Orlova, Darya"}],"language":["eng"],"recId":"1936547872","type":{"bibl":"article-journal","media":"Online-Ressource"},"note":["Veröffentlicht: 29. Mai 2025","Gesehen am 23.09.2025"],"id":{"doi":["10.1093/gigascience/giaf052"],"eki":["1936547872"]},"origin":[{"dateIssuedKey":"2025","dateIssuedDisp":"2025"}],"name":{"displayForm":["Claire Simpson, Evgeniy Tabatsky, Zainab Rahil, Devon J. Eddins, Sasha Tkachev, Florian Georgescauld, Derek Papalegis, Martin Culka, Tyler Levy, Ivan Gregoretti, Connor Meehan, Chiara Schiller, Kresimir Bestak, Denis Schapiro, Andrei Chernyshev, Guenther Walther, Eliver E.B. Ghosn, and Darya Orlova"]},"relHost":[{"origin":[{"dateIssuedKey":"2012","publisher":"Oxford University Press ; Biomed Central","dateIssuedDisp":"2012-","publisherPlace":"Oxford ; London"}],"id":{"zdb":["2708999-X"],"eki":["739900358"],"issn":["2047-217X"]},"physDesc":[{"extent":"Online-Ressource"}],"title":[{"title_sort":"GigaScience","title":"GigaScience"}],"pubHistory":["1.2012 -"],"part":{"pages":"1-20","year":"2025","extent":"20","volume":"14","text":"14(2025), Artikel-ID giaf052, Seite 1-20"},"type":{"bibl":"periodical","media":"Online-Ressource"},"note":["Gesehen am 25.10.2018"],"disp":"Lifting the curse from high-dimensional data automated projection pursuit clustering for a variety of biological data modalitiesGigaScience","language":["eng"],"recId":"739900358"}],"physDesc":[{"noteIll":"Illustrationen","extent":"20 S."}]} 
SRT |a SIMPSONCLALIFTINGTHE2025