Deep learning and partial differential equations

Add remark

Problem 1.02.

[Song Mei] Quantify the convergence rate (in terms of the Wasserstein distance, or KL-divergence) for the two-layer mean-field limit diffusion equation \[ \partial_{t}\rho=\nabla\cdot(\rho\nabla\Psi(\theta,\rho))+\kappa\Delta\rho \] to its steady-state solution as $t\to\infty$. If $\Psi$ does not depend on $\rho$, this is a Fokker-Plank equation, and the convergence rate is the log-Sobolev constant of the optimal measure. So the question is to understand what happens when $\Psi$ depends on $\rho$.

Remark. Even when Ψ does not depend on ρ there’s still some issues when the confining potential Ψ is not convex: it’s not clear that one should to be able to conclude that the log-Sobolev constant exists. With the interaction term, I understand there are works by J. Tugaut (and perhaps others) that handle toy versions of the problem.

Add remark

Problem 1.04.

[Song Mei] How can we extend the mean-field model from two layers to three or more layers?

Add remark

Problem 1.06.

[Song Mei] What can we say about truncations of the neural tangent hierarchy? See https://arxiv.org/abs/1909.11304.

Add remark

Problem 1.08.

[Lexing Ying] Is there a mean-field limit for ResNet as the depth of the network goes to infinity, with an appropriate scaling?

Add remark

Problem 1.1.

[Weinan E] How do we quantify the intrinsic difficulty of a classification problem (besides by how big the margin of the set of interest is).

Add remark

Problem 1.12.

[Grant Rotskoff] Is there a comparison of NTK and neural networks using Monte Carlo bounds? Are there regimes in which one is generically better than the other?

Add remark

Problem 1.14.

Is there a phase transition between lazy learning (NTK regime, $\alpha=1$) and feature-based regimes ($\alpha=n^{-1/2}$)? Can we characterize at which value of $\alpha$ we switch between these two regimes? Is there a sharp transition or a smooth transition? And what happens at that transition?

Add remark

Problem 1.16.

[Bin Dong] When we use deep neural networks we are trying to move away from the overparameterization scenario. In order to pull this off, you need to do very smart architecture design. But we don’t really understand this. Can we use the mathematical analysis to design better architectures?

Add remark

Problem 1.18.

[Xiuyuan Cheng] How can we extend the theory of fully-connected neural networks (i.e. the problems in section 1) to convolutional neural networks? Can the geometry of the data (e.g. the spectral structure of images) help with this?

Add remark

Problem 1.2.

[Lin Lin] How does the generalization error for a convolutional neural network depend on the input dimension (e.g. the number of pixels in the image). (Here we fixed the width of the channel.)

Add remark

Problem 1.22.

[Xiuyuan Cheng] How does data distribution/geometry/group invariances impinge on the questions we are raising above? The work of Stephane Mallat may be relevant.

Add remark

Problem 1.24.

[Lin Lin] Suppose that $f(x_{1},\ldots,x_{n})=f(x_{\sigma(1)},\ldots,x_{\sigma(n)})$ for all $\sigma\in S_{n}$ (the permutation group on $n$ elements). Can we use this group action in a similar way to the translation group action? How can we define a norm that helps us control the generalization error? This will require some understanding of a three-layer network. An example: \[ f(x_{1},\ldots,x_{n})=\varphi\left(\sum_{i=1}^{n}g(x_{i})\right). \] How can we analyze the error in learning this function?

Add remark

Problem 1.26.

[Lin Lin] Same question as the last but if $f$ is antisymmetric (alternating). This means that \[ f(x_{\sigma(1)},\ldots,x_{\sigma(n)})=(-1)^{\sigma}f(x_{1},\ldots,x_{n}) \] for any permutation $\sigma$. How do we design an architecture for such functions? How can we analyze such architectures?

Add remark

Problem 1.28.

Can we use mathematical analysis to design better optimization/training schemes?

Add remark

Problem 1.3.

How can we design architecture and optimization to train well in practice?

Add remark

Problem 1.32.

Do solutions of elliptic/parabolic PDEs live in Barron spaces? What about solutions to Burgers?

Add remark

Problem 1.34.

How does uncertainty (noise in labels or nonexact solutions) affect these results?

Add remark

Problem 1.36.

How to use deep learning to solve PDEs in the viscosity sense (Hamilton-Jacobi-Bellman and Isoacs equations, e.g.) where solutions are not classical and need a mechanism to select viscosity solution? learn the flux or directly approximate the solution?

For first order equation, we can add a viscosity term \begin{align*} H(x,u,\nabla u)+\epsilon \Delta u = 0 \end{align*} but how can we deal with higher-order one can’t add artificial viscosity like \begin{align*} H(x,u,\nabla u,\nabla^2 u)=0 \end{align*}

Add remark

Problem 1.38.

Solving hyperbolic conservation laws with NN? How to preserve symmetry of the physic? Also invariance with respect to symmetrcies.

Add remark

Problem 1.4.

How to design+ train NN to be adversarial robust? How to select training data to improve robustness?

Deep learning and partial differential equations

1. Questions

Problem 1.02.

Problem 1.04.

Problem 1.06.

Problem 1.08.

Problem 1.1.

Problem 1.12.

Problem 1.14.

Problem 1.16.

Problem 1.18.

Problem 1.2.

Problem 1.22.

Problem 1.24.

Problem 1.26.

Problem 1.28.

Problem 1.3.

Problem 1.32.

Problem 1.34.

Problem 1.36.

Problem 1.38.

Problem 1.4.