Stein's method and applications in high-dimensional statistics

Add remark

Problem 1.05.

[Persi Diaconis] Let $\mu$ be a probability density on $\mathbb{R}$, with positive density near 0. Let $X_1, X_2, \ldots, \overset{i.i.d}\sim \mu$, and define the harmonic mean \[H_n = \frac{n}{\frac{1}{X_1} + \cdots + \frac{1}{X_n}}.\] Prove, using Stein’s method, that $H_n$ converges to a Cauchy distribution as $n \rightarrow \infty$, and find its rate.

Add remark

Problem 1.1.

[Persi Diaconis] Let $\pi$ be a random permutation in $S_n$. For each $1 \le i \le n - 1$, define the random variables \begin{align*} X_i &= \begin{cases} 1 &\mbox{if } \pi(i + 1) < \pi(i) \\ -1 &\mbox{otherwise} \end{cases} \\ S_i &= \sum_{j = 1}^i X_i \end{align*} Show that \[\frac{\#\{j: S_j > 0\}}{n} \rightarrow \text{Beta}(1/2, 1/2),\] and compute the rate of convergence.

Add remark

Problem 1.15.

[Aaditya Ramdas] Let $X_1, X_2, \ldots$ be i.i.d. symmetric Cauchy random variables, and let \begin{align*} S_t &= \sum_{i = 1}^t X_i \\ V_t &= \sum_{i = 1}^t X_i^2 \end{align*} Using Stein’s method, prove that for all $m, x \ge 0$, \[\mathbb{P}\left( \exists t \in \mathbb{N}: \frac{S_t}{V_t + m} > x \right) \le e^{-2mx^2}.\]

Remark. [Aaditya Ramdas] This inequality can be found as Example 4 in https://arxiv.org/pdf/1808.03204.pdf. There is nothing special about Cauchy, it applies to any symmetric increments. The Cauchy instance just drives home the point that under symmetry, no moments are needed (and hence S_t is not a martingale) to derive a self-normalized concentration inequality as stated.

Add remark

Problem 1.2.

[Persi Diaconis] Let $X \sim MVN(\theta, I_d)$, where $I_d$ denotes the $d$-dimensional identity matrix. Consider the problem of estimating $\theta$ using estimators of the form $X + h(X)$, where $h$ is in some class $\mathcal{H}$ of functions. By SURE, \[\hat{r}_h = d + 2\nabla \cdot h(X) + \|h(X)\|^2\] is an unbiased estimator of the risk associated with $h$. Let \begin{align*} h^* &= \text{arg min} (h \in \mathcal{H}: \hat{r}_h) \\ \hat{\theta}^* &= X + h^*(X). \end{align*} Find an unbiased estimator $r^{**}$ of the risk $\mathbb{E}(\|\hat{\theta}^* - \theta\|^2)$ of $\hat{\theta}^*$.

Add remark

Problem 1.25.

[Jon Wellner] Let $t_r$ denote the $t$ distribution with $r$ degrees of freedom. As $r \rightarrow \infty$, we know that $t_r \overset{d}\rightarrow Z$, where $Z$ is distributed as $N(0, 1)$.

[(a)] Show that there exists a constant $c > 0$ for which \[d_{TV}(t_r, Z) \le c/r.\]
[(b)] Find the best such constant.
[(c)] Does there exist a constant $c'$ for which \[d_{TV}(t_r, Z) \ge c'/r?\]

Add remark

Problem 1.3.

[Jon Wellner] A random variable $X$ with distribution $f$ is called $s$-concave if, for every $\lambda \in [0, 1]$ and all $x$ and $y$ such that $f(x), f(y) > 0$, \[f((1 - \lambda)x + \lambda y) > ((1 - \lambda) f(x)^s + \lambda f(y)^s)^{1/s}.\] Suppose $Y$ is a random variable with characteristic function \[\mathbb{E}(e^{itY}) = e^{-|t|^{\alpha}}.\] Show that $Y$ is $s$-concave for some $s = s(\alpha)$. Further, if $Y$ has density $f$, show that the curvature of $f$ at its mode is negative.

Add remark

Problem 1.35.

[Han Gan] Consider a Markov queueing system where there are $N$ servers who serve at rate 1, and new customers arrive into the system at a constant rate of $\lambda$. When a new customer arrives, they choose to join the queue behind the server with the shortest existing length. If there is more than one queue with the shortest length, simply choose one uniformly at random. Let us assume that $\lambda < N$ so we have a stationary distribution for this chain. Denote $Q = (q_1, q_2, \ldots)$ as the stationary distribution of this process, where $q_i$ is the number of queues with at least $i$ people in the queue. This model is known as the Join the shortest queue (JSQ) model.

As a simplification to this process, suppose that when a new customer arrives, and all of the server queues have 2 people, then this customer leaves straight away. Let $Q_2$ denote the corresponding stationary distribution for this modified process. This is a commonly used approximation for the full JSQ model. Can we use Stein’s method to explicitly bound the difference $|\mathbb{E} f(Q) - \mathbb{E} f(Q_2)|$ for suitable $f$?

Consider another simplification of the full JSQ process, where an new customer joins the shortest of $k$ randomly chosen queues. Let $Q_3$ denote the stationary distribution of this process. What can we say about $|\mathbb{E} f(Q) - \mathbb{E}f(Q_3)|$?

Add remark

Problem 1.4.

[Larry Goldstein] Two approaches to generalize Stein’s method to non-Gaussian distributions are the following:

[(i)] Stein coefficients: for a random variable $Y$, the Stein coefficient $T(Y)$ satisfies \[\mathbb{E}(Y f(Y)) = \mathbb{E}(T f'(Y)).\]
[(ii)] Zero-biasing: for $Y$ of mean 0 and variance $\sigma^2$, compute the $Y^*$ that satisfies \[\mathbb{E}(Yf(Y)) = \sigma^2 \mathbb{E} f'(Y^*).\]

Let $Y_1, \ldots, Y_d \overset{i.i.d.}\sim Y$, and let $\overline{Y} = (Y_1, \ldots, Y_d)$. Suppose that $\mathbb{E} Y_i = 0$, and let $X = Y + \theta$. Here we will assume $Y$ is a known distribution while $\theta$ is unknown.

Consider estimators $\hat{\theta}$ of $\Theta$ of the form $\hat{\theta} = X + h(X)$. Recall that SURE for Gaussian random variables gives the unbiased estimate of risk \[\hat{r}_h = d + 2\nabla \cdot h(X) + \|h(X)\|^2.\] Calculate the bias \[e_d := \mathbb{E}(\hat{r}_h - \mathbb{E} \|\hat{\theta} - \theta\|^2)\] using Stein coefficients or zero-biasing. Does $\frac{e_d}{d}$ converge to 0 as $d \rightarrow \infty$? Can we apply this to wavelet shrinkage?

Add remark

Problem 1.45.

Let $X_1, X_2, \ldots$ be i.i.d. random variables with mean 0 and variance 1, and let $S_n = \sum_{i = 1}^n X_i$.

For $b > 0$, let $T_b$ be a sequence of random times. The following theorem is due to Renyi (1961):

Theorem.

If $\frac{T_b}{n_b} \overset{p}\rightarrow 1$ as $b \rightarrow \infty$, where $n_b \rightarrow \infty$ is deterministic, then \[\frac{S_{T_b}}{\sqrt{n_b}} \rightarrow N(0, 1).\] Can we get a bound on $d\left( \frac{S_{T_b}}{n_b}, Z\right)$?

Some remarks: if $T_b$ is independent of $X_1, X_2, \ldots$, then Doebler has some results. Also, see “Randomly Stopped Sums” by Allan Gut

Add remark

Problem 1.5.

[Gesine Reinert] Let $(X_i, Y_i)_{i = 1}^n$ be i.i.d. pairs of random variables, where $X_i \in \mathbb{R}^p$ and $Y_i \in \mathbb{R}^q$.

[1.] Find $a \in \mathbb{R}^p$ and $b \in \mathbb{R}^1$ such that \[\text{corr}\left( a^TX, b^T Y \right)\] is maximized.
[2.] Pick $\hat{a}, \hat{b}$ such that $\sum \hat{a}_i = \sum \hat{b}_i = 1$ so that \[\hat{c} := \text{corr}\left( \{\hat{a}^TX_i, \hat{b}^TY_i\}_{i = 1}^n \right)\] is maximized.

It is shown in Bartlett(1939) and Hotelling(1936) that if $X_i$ is independent of $Y_i$, and both have finite fourth moments, then \[A_n := -\log(1 - \hat{c})\left( n - \frac{1}{2}(p + q + 3) \right) \overset{d}\rightarrow \chi^2_{p + q - 2}\] as $n \rightarrow \infty$. Can we get bounds on $d(A_n, \chi^2_{p + q - 2})$?

Stein's method and applications in high-dimensional statistics

1. Problems

Problem 1.05.

Problem 1.1.

Problem 1.15.

Problem 1.2.

Problem 1.25.

Problem 1.3.

Problem 1.35.

Problem 1.4.

Problem 1.45.

Theorem.

Problem 1.5.