Deep learning in high dimension: neural network expression rates for analytic functions in L2(Rd,γd)

Multigrid modeling algorithms are a technique used to accelerate iterative method models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-dist...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Schwab, Christoph (VerfasserIn) , Zech, Jakob (VerfasserIn)
Dokumenttyp: Article (Journal)
Sprache:Englisch
Veröffentlicht: March 3, 2023
In: SIAM ASA journal on uncertainty quantification
Year: 2023, Jahrgang: 11, Heft: 1, Pages: 199-234
ISSN:2166-2525
DOI:10.1137/21M1462738
Online-Zugang:Verlag, lizenzpflichtig, Volltext: https://doi.org/10.1137/21M1462738
Verlag, lizenzpflichtig, Volltext: https://epubs.siam.org/doi/10.1137/21M1462738
Volltext
Verfasserangaben:Christoph Schwab and Jakob Zech
Beschreibung
Zusammenfassung:Multigrid modeling algorithms are a technique used to accelerate iterative method models running on a hierarchy of similar graphlike structures. We introduce and demonstrate a new method for training neural networks which uses multilevel methods. Using an objective function derived from a graph-distance metric, we perform orthogonally-constrained optimization to find optimal prolongation and restriction maps between graphs. We compare and contrast several methods for performing this numerical optimization, and additionally present some new theoretical results on upper bounds of this type of objective function. Once calculated, these optimal maps between graphs form the core of multiscale artificial neural network (MsANN) training, a new procedure we present which simultaneously trains a hierarchy of neural network models of varying spatial resolution. Parameter information is passed between members of this hierarchy according to standard coarsening and refinement schedules from the multiscale modeling literature. In our machine learning experiments, these models are able to learn faster than training at the fine scale alone, achieving a comparable level of error with fewer weight updates (by an order of magnitude).
Beschreibung:Gesehen am 29.06.2023
Im Titel sind die Zahl 2 und der Buchstabe d im Ausdruck "Rd" hochgestellt
Beschreibung:Online Resource
ISSN:2166-2525
DOI:10.1137/21M1462738