Matrix of orthogonalized atomic orbital coefficients representation for radicals and ions

Chemical (molecular, quantum) machine learning relies on representing molecules in unique and informative ways. Here, we present the matrix of orthogonalized atomic orbital coefficients (MAOC) as a quantum-inspired molecular and atomic representation containing both structural (composition and geome...

Full description

Saved in:
Bibliographic Details
Main Authors: Llenga, Stiv (Author) , Gryn’ova, Ganna (Author)
Format: Article (Journal)
Language:English
Published: 2023
In: The journal of chemical physics
Year: 2023, Volume: 158, Issue: 21, Pages: 1-14
ISSN:1089-7690
DOI:10.1063/5.0151122
Online Access:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1063/5.0151122
Get full text
Author Notes:Stiv Llenga and Ganna Gryn’ova
Description
Summary:Chemical (molecular, quantum) machine learning relies on representing molecules in unique and informative ways. Here, we present the matrix of orthogonalized atomic orbital coefficients (MAOC) as a quantum-inspired molecular and atomic representation containing both structural (composition and geometry) and electronic (charge and spin multiplicity) information. MAOC is based on a cost-effective localization scheme that represents localized orbitals via a predefined set of atomic orbitals. The latter can be constructed from such small atom-centered basis sets as pcseg-0 and STO-3G in conjunction with guess (non-optimized) electronic configuration of the molecule. Importantly, MAOC is suitable for representing monatomic, molecular, and periodic systems and can distinguish compounds with identical compositions and geometries but distinct charges and spin multiplicities. Using principal component analysis, we constructed a more compact but equally powerful version of MAOC—PCX-MAOC. To test the performance of full and reduced MAOC and several other representations (CM, SOAP, SLATM, and SPAHM), we used a kernel ridge regression machine learning model to predict frontier molecular orbital energy levels and ground state single-point energies for chemically diverse neutral and charged, closed- and open-shell molecules from an extended QM7b dataset, as well as two new datasets, N-HPC-1 (N-heteropolycycles) and REDOX (nitroxyl and phenoxyl radicals, carbonyl, and cyano compounds). MAOC affords accuracy that is either similar or superior to other representations for a range of chemical properties and systems.
Item Description:Online veröffentlicht: 2. Juni 2023
Gesehen am 21.07.2023
Physical Description:Online Resource
ISSN:1089-7690
DOI:10.1063/5.0151122