| Register
\(\newcommand{\Cat}{{\rm Cat}} \) \(\newcommand{\A}{\mathcal A} \) \(\newcommand{\freestar}{ \framebox[7pt]{$\star$} }\)

1. Questions

    1. Problem 1.02.

      [Song Mei] Quantify the convergence rate (in terms of the Wasserstein distance, or KL-divergence) for the two-layer mean-field limit diffusion equation \[ \partial_{t}\rho=\nabla\cdot(\rho\nabla\Psi(\theta,\rho))+\kappa\Delta\rho \] to its steady-state solution as $t\to\infty$. If $\Psi$ does not depend on $\rho$, this is a Fokker-Plank equation, and the convergence rate is the log-Sobolev constant of the optimal measure. So the question is to understand what happens when $\Psi$ depends on $\rho$.
        1. Remark. Even when Ψ does not depend on ρ there’s still some issues when the confining potential Ψ is not convex: it’s not clear that one should to be able to conclude that the log-Sobolev constant exists. With the interaction term, I understand there are works by J. Tugaut (and perhaps others) that handle toy versions of the problem.
            • Problem 1.04.

              [Song Mei] How can we extend the mean-field model from two layers to three or more layers?
                • Problem 1.06.

                  [Song Mei] What can we say about truncations of the neural tangent hierarchy? See https://arxiv.org/abs/1909.11304.
                    • Problem 1.08.

                      [Lexing Ying] Is there a mean-field limit for ResNet as the depth of the network goes to infinity, with an appropriate scaling?
                        • Problem 1.1.

                          [Weinan E] How do we quantify the intrinsic difficulty of a classification problem (besides by how big the margin of the set of interest is).
                            • Problem 1.12.

                              [Grant Rotskoff] Is there a comparison of NTK and neural networks using Monte Carlo bounds? Are there regimes in which one is generically better than the other?
                                • Problem 1.14.

                                  Is there a phase transition between lazy learning (NTK regime, $\alpha=1$) and feature-based regimes ($\alpha=n^{-1/2}$)? Can we characterize at which value of $\alpha$ we switch between these two regimes? Is there a sharp transition or a smooth transition? And what happens at that transition?
                                    • Problem 1.16.

                                      [Bin Dong] When we use deep neural networks we are trying to move away from the overparameterization scenario. In order to pull this off, you need to do very smart architecture design. But we don’t really understand this. Can we use the mathematical analysis to design better architectures?
                                        • Problem 1.18.

                                          [Xiuyuan Cheng] How can we extend the theory of fully-connected neural networks (i.e. the problems in section 1) to convolutional neural networks? Can the geometry of the data (e.g. the spectral structure of images) help with this?
                                            • Problem 1.2.

                                              [Lin Lin] How does the generalization error for a convolutional neural network depend on the input dimension (e.g. the number of pixels in the image). (Here we fixed the width of the channel.)
                                                • Problem 1.22.

                                                  [Xiuyuan Cheng] How does data distribution/geometry/group invariances impinge on the questions we are raising above? The work of Stephane Mallat may be relevant.
                                                    • Problem 1.24.

                                                      [Lin Lin] Suppose that $f(x_{1},\ldots,x_{n})=f(x_{\sigma(1)},\ldots,x_{\sigma(n)})$ for all $\sigma\in S_{n}$ (the permutation group on $n$ elements). Can we use this group action in a similar way to the translation group action? How can we define a norm that helps us control the generalization error? This will require some understanding of a three-layer network. An example: \[ f(x_{1},\ldots,x_{n})=\varphi\left(\sum_{i=1}^{n}g(x_{i})\right). \] How can we analyze the error in learning this function?
                                                        • Problem 1.26.

                                                          [Lin Lin] Same question as the last but if $f$ is antisymmetric (alternating). This means that \[ f(x_{\sigma(1)},\ldots,x_{\sigma(n)})=(-1)^{\sigma}f(x_{1},\ldots,x_{n}) \] for any permutation $\sigma$. How do we design an architecture for such functions? How can we analyze such architectures?
                                                            • Problem 1.28.

                                                              Can we use mathematical analysis to design better optimization/training schemes?
                                                                • Problem 1.3.

                                                                  How can we design architecture and optimization to train well in practice?
                                                                    • Problem 1.32.

                                                                      Do solutions of elliptic/parabolic PDEs live in Barron spaces? What about solutions to Burgers?
                                                                        • Problem 1.34.

                                                                          How does uncertainty (noise in labels or nonexact solutions) affect these results?
                                                                            • Problem 1.36.

                                                                              How to use deep learning to solve PDEs in the viscosity sense (Hamilton-Jacobi-Bellman and Isoacs equations, e.g.) where solutions are not classical and need a mechanism to select viscosity solution? learn the flux or directly approximate the solution?

                                                                              For first order equation, we can add a viscosity term \begin{align*} H(x,u,\nabla u)+\epsilon \Delta u = 0 \end{align*} but how can we deal with higher-order one can’t add artificial viscosity like \begin{align*} H(x,u,\nabla u,\nabla^2 u)=0 \end{align*}
                                                                                • Problem 1.38.

                                                                                  Solving hyperbolic conservation laws with NN? How to preserve symmetry of the physic? Also invariance with respect to symmetrcies.
                                                                                    • Problem 1.4.

                                                                                      How to design+ train NN to be adversarial robust? How to select training data to improve robustness?

                                                                                          Cite this as: AimPL: Deep learning and partial differential equations, available at http://aimpl.org/deeppde.