Loading Web-Font TeX/Math/Italic
| Register
\newcommand{\Cat}{{\rm Cat}} \newcommand{\A}{\mathcal A} \newcommand{\freestar}{ \framebox[7pt]{$\star$} }

1. Problems

    1. Problem 1.05.

      [Persi Diaconis] Let \mu be a probability density on \mathbb{R}, with positive density near 0. Let X_1, X_2, \ldots, \overset{i.i.d}\sim \mu, and define the harmonic mean H_n = \frac{n}{\frac{1}{X_1} + \cdots + \frac{1}{X_n}}.
      Prove, using Stein’s method, that H_n converges to a Cauchy distribution as n \rightarrow \infty, and find its rate.
        • Problem 1.1.

          [Persi Diaconis] Let \pi be a random permutation in S_n. For each 1 \le i \le n - 1, define the random variables \begin{align*} X_i &= \begin{cases} 1 &\mbox{if } \pi(i + 1) < \pi(i) \\ -1 &\mbox{otherwise} \end{cases} \\ S_i &= \sum_{j = 1}^i X_i \end{align*}
          Show that \frac{\#\{j: S_j > 0\}}{n} \rightarrow \text{Beta}(1/2, 1/2),
          and compute the rate of convergence.
            • Problem 1.15.

              [Aaditya Ramdas] Let X_1, X_2, \ldots be i.i.d. symmetric Cauchy random variables, and let \begin{align*} S_t &= \sum_{i = 1}^t X_i \\ V_t &= \sum_{i = 1}^t X_i^2 \end{align*}
              Using Stein’s method, prove that for all m, x \ge 0, \mathbb{P}\left( \exists t \in \mathbb{N}: \frac{S_t}{V_t + m} > x \right) \le e^{-2mx^2}.
                1. Remark. [Aaditya Ramdas] This inequality can be found as Example 4 in https://arxiv.org/pdf/1808.03204.pdf. There is nothing special about Cauchy, it applies to any symmetric increments. The Cauchy instance just drives home the point that under symmetry, no moments are needed (and hence S_t is not a martingale) to derive a self-normalized concentration inequality as stated.
                    • Problem 1.2.

                      [Persi Diaconis] Let X \sim MVN(\theta, I_d), where I_d denotes the d-dimensional identity matrix. Consider the problem of estimating \theta using estimators of the form X + h(X), where h is in some class \mathcal{H} of functions. By SURE, \hat{r}_h = d + 2\nabla \cdot h(X) + \|h(X)\|^2
                      is an unbiased estimator of the risk associated with h. Let \begin{align*} h^* &= \text{arg min} (h \in \mathcal{H}: \hat{r}_h) \\ \hat{\theta}^* &= X + h^*(X). \end{align*}
                      Find an unbiased estimator r^{**} of the risk \mathbb{E}(\|\hat{\theta}^* - \theta\|^2) of \hat{\theta}^*.
                        • Problem 1.25.

                          [Jon Wellner] Let t_r denote the t distribution with r degrees of freedom. As r \rightarrow \infty, we know that t_r \overset{d}\rightarrow Z, where Z is distributed as N(0, 1).
                          • [(a)] Show that there exists a constant c > 0 for which d_{TV}(t_r, Z) \le c/r.
                          • [(b)] Find the best such constant.
                          • [(c)] Does there exist a constant c' for which d_{TV}(t_r, Z) \ge c'/r?
                            • Problem 1.3.

                              [Jon Wellner] A random variable X with distribution f is called s-concave if, for every \lambda \in [0, 1] and all x and y such that f(x), f(y) > 0, f((1 - \lambda)x + \lambda y) > ((1 - \lambda) f(x)^s + \lambda f(y)^s)^{1/s}.
                              Suppose Y is a random variable with characteristic function \mathbb{E}(e^{itY}) = e^{-|t|^{\alpha}}.
                              Show that Y is s-concave for some s = s(\alpha). Further, if Y has density f, show that the curvature of f at its mode is negative.
                                • Problem 1.35.

                                  [Han Gan] Consider a Markov queueing system where there are N servers who serve at rate 1, and new customers arrive into the system at a constant rate of \lambda. When a new customer arrives, they choose to join the queue behind the server with the shortest existing length. If there is more than one queue with the shortest length, simply choose one uniformly at random. Let us assume that \lambda < N so we have a stationary distribution for this chain. Denote Q = (q_1, q_2, \ldots) as the stationary distribution of this process, where q_i is the number of queues with at least i people in the queue. This model is known as the Join the shortest queue (JSQ) model.

                                  As a simplification to this process, suppose that when a new customer arrives, and all of the server queues have 2 people, then this customer leaves straight away. Let Q_2 denote the corresponding stationary distribution for this modified process. This is a commonly used approximation for the full JSQ model. Can we use Stein’s method to explicitly bound the difference |\mathbb{E} f(Q) - \mathbb{E} f(Q_2)| for suitable f?

                                  Consider another simplification of the full JSQ process, where an new customer joins the shortest of k randomly chosen queues. Let Q_3 denote the stationary distribution of this process. What can we say about |\mathbb{E} f(Q) - \mathbb{E}f(Q_3)|?
                                    • Problem 1.4.

                                      [Larry Goldstein] Two approaches to generalize Stein’s method to non-Gaussian distributions are the following:
                                      • [(i)] Stein coefficients: for a random variable Y, the Stein coefficient T(Y) satisfies \mathbb{E}(Y f(Y)) = \mathbb{E}(T f'(Y)).
                                      • [(ii)] Zero-biasing: for Y of mean 0 and variance \sigma^2, compute the Y^* that satisfies \mathbb{E}(Yf(Y)) = \sigma^2 \mathbb{E} f'(Y^*).


                                        Let Y_1, \ldots, Y_d \overset{i.i.d.}\sim Y, and let \overline{Y} = (Y_1, \ldots, Y_d). Suppose that \mathbb{E} Y_i = 0, and let X = Y + \theta. Here we will assume Y is a known distribution while \theta is unknown.

                                        Consider estimators \hat{\theta} of \Theta of the form \hat{\theta} = X + h(X). Recall that SURE for Gaussian random variables gives the unbiased estimate of risk \hat{r}_h = d + 2\nabla \cdot h(X) + \|h(X)\|^2.
                                        Calculate the bias e_d := \mathbb{E}(\hat{r}_h - \mathbb{E} \|\hat{\theta} - \theta\|^2)
                                        using Stein coefficients or zero-biasing. Does \frac{e_d}{d} converge to 0 as d \rightarrow \infty? Can we apply this to wavelet shrinkage?
                                        • Problem 1.45.

                                          Let X_1, X_2, \ldots be i.i.d. random variables with mean 0 and variance 1, and let S_n = \sum_{i = 1}^n X_i.

                                          For b > 0, let T_b be a sequence of random times. The following theorem is due to Renyi (1961):
                                          Theorem.
                                          If \frac{T_b}{n_b} \overset{p}\rightarrow 1 as b \rightarrow \infty, where n_b \rightarrow \infty is deterministic, then \frac{S_{T_b}}{\sqrt{n_b}} \rightarrow N(0, 1).
                                          Can we get a bound on d\left( \frac{S_{T_b}}{n_b}, Z\right)?

                                          Some remarks: if T_b is independent of X_1, X_2, \ldots, then Doebler has some results. Also, see “Randomly Stopped Sums” by Allan Gut
                                            • Problem 1.5.

                                              [Gesine Reinert] Let (X_i, Y_i)_{i = 1}^n be i.i.d. pairs of random variables, where X_i \in \mathbb{R}^p and Y_i \in \mathbb{R}^q.
                                              • [1.] Find a \in \mathbb{R}^p and b \in \mathbb{R}^1 such that \text{corr}\left( a^TX, b^T Y \right)
                                                is maximized.
                                              • [2.] Pick \hat{a}, \hat{b} such that \sum \hat{a}_i = \sum \hat{b}_i = 1 so that \hat{c} := \text{corr}\left( \{\hat{a}^TX_i, \hat{b}^TY_i\}_{i = 1}^n \right)
                                                is maximized.

                                                It is shown in Bartlett(1939) and Hotelling(1936) that if X_i is independent of Y_i, and both have finite fourth moments, then A_n := -\log(1 - \hat{c})\left( n - \frac{1}{2}(p + q + 3) \right) \overset{d}\rightarrow \chi^2_{p + q - 2}
                                                as n \rightarrow \infty. Can we get bounds on d(A_n, \chi^2_{p + q - 2})?

                                                  Cite this as: AimPL: Stein's method and applications in high-dimensional statistics, available at http://aimpl.org/steinhd.