Entropy power inequalities

Add remark

Problem 4.1.

[Abram Kagan] Let $a_1, a_2, a_3, \ldots$ be a sequence of real numbers (a signal) transmitted in the presence of additive noise such that $a_1 + \epsilon_1, a_2 + \epsilon_2, a_3 + \epsilon_3, \ldots$ is received (both are infinite sequences). We observe $X_1, X_2, X_3, \ldots$ which is either pure noise $\epsilon_1, \epsilon_2, \epsilon_3, \ldots$ or the transmitted signal $a_1 + \epsilon_1, a_2 + \epsilon_2, a_3 + \epsilon_3, \ldots$. To decide which of the following is received: \begin{equation} \begin{split} (\epsilon_1, \epsilon_2, \epsilon_3, …) &\sim p(\mathbb{R}^{\infty}),\\ (a_1 + \epsilon_1, a_2 + \epsilon_2, a_3 + \epsilon_3, …) &\sim p(\mathbb{R}^{\infty}), \end{split} \end{equation} we would like to know whether the measures generated by the sequences above are mutually singular. It is known that if the signal is “strong enough", meaning \begin{align*} \sum_{i=1}^\infty a_i^2 = \infty, \end{align*} then detection is possible. Larry Shepp also noticed that if $\epsilon_i$ are i.i.d. having a pdf such that $I(\epsilon_i)$ is finite, then the above “strong enough" property is also necessary.

The question is the following. Suppose the signal is weak, i.e. \begin{equation*} \sum_{i=1}^\infty a_i^2 < \infty, \end{equation*} but \begin{align*} \sum_{i=1}^\infty |a_i|^{\lambda} = \infty, \end{align*} for some $\lambda \in (0,2)$. If $\epsilon_i \sim F$, what is the infimum of such $\lambda$ for which every signal with $\sum_{i=1}^\infty |a_i|^{\lambda} = \infty$ is detectable.

In the case $[-1, +1]$, the problem has been studied and it is known that the smallest $\lambda$ such that detection is possible is $\lambda = 1$.

If $\epsilon_i \sim U(-a, +a)$, it seems that if $P(\epsilon_i = \pm 1) = \frac{1}{2}$, a signal with $\sum_{i=1}^\infty |a_i|^{\lambda} = \infty$ for some $\lambda > 0$ is detectable so that in this case the infimum equals $0$.

Add remark

Problem 4.2.

[Thomas Courtade] Let $X,Y \in \{0,1\}$ have joint distribution \begin{equation*} p_{X,Y}(x,y) = \begin{bmatrix} \frac{1}{2}(1- \alpha) & \frac{1}{2}\alpha \\ \frac{1}{2}\alpha & \frac{1}{2}(1- \alpha) \\ \end{bmatrix}, \end{equation*} then $I(X;Y) = 1 - h(\alpha)$ for $0 \leq \alpha \leq 1/2$ where $h(\alpha) \triangleq -\alpha \log_2 \alpha - (1 - \alpha) \log_2 (1 - \alpha).$ Note that $I(X;Y) \triangleq D(p_{X,Y} || p_X p_Y)$.

The conjecture, often referred to as the “Courtade-Kumar Conjecture” or the “Most Informative Function Conjecture”, is the following. Let $(X_i, Y_i)$ for $1 \leq i \leq n$ be i.i.d. $\overset{d}{=} (X,Y)$, then \begin{equation} I(f(X_1^n); Y_1^n) \leq 1 - h(\alpha) \end{equation} for any function $f:\{0,1\}^n \rightarrow \{0,1\}.$ We note that there is prior work on special cases.

Add remark

Problem 4.3.

[Peter Harremoës] Given random variables $X, Y \in \mathbb{R}$ with $I(X; Y) \leq \alpha$ and $I(X+Y; X-Y) \leq \beta$. The question is, how close are $X$ and $Y$ to Gaussian? Meaning we would like to find functions $f_1(\alpha, \beta)$ and $f_2(\alpha, \beta)$ such that \begin{equation} \begin{split} D(X) &\leq f_1(\alpha, \beta), \\ D(Y) &\leq f_2(\alpha, \beta). \end{split} \end{equation} We know $f(0,0) = 0$ and $f_i(0, \beta) \rightarrow 0$ as $\beta \rightarrow 0$ for $i=1,2$. It is not known what happens in the case where $(\alpha, \beta)$ is bounded away from $(0,0)$.

Add remark

Problem 4.4.

[Tomasz Tkocz] Let $X_1, X_2, \ldots, X_n$ be i.i.d. uniform on $[-1, +1]$. Considering constants $a_1, a_2, \ldots, a_n$ such that $\sum_{i=1}^n a_i^2 = 1$, is \[\max_{a_1, \ldots, a_n} h\left(\sum_{i=1}^n a_i X_i\right)\] achieved when $a_1 = a_2 = \ldots = a_n$?

Add remark

Problem 4.5.

[Tomasz Tkocz] For $X$ and $Y$ i.i.d. log-concave random variables, what is \[\max_{\lambda \in [0,1]} h(\sqrt{\lambda} X + \sqrt{1 - \lambda} Y)?\] We expect that the maximum is achieved at $\lambda = 1/2$.

It is known that this is not true without log-cavity. The counterexample is given by $X,Y$ with density $x^2 \phi(x)$ where $\phi(\cdot)$ is the standard Gaussian density, which is a bi-modal density. Another counterexample exists when $X$ and $Y$ are uniform on $[a,b] \cup [-b, -a]$. For certain choices of $a$ and $b$, the maximum is achieved at $\lambda$ arbitrarily close to $0$.

We finally note that the Shepp-Olkin conjecture is relevant here. For parameters $p_1, p_2, \ldots, p_n \triangleq \underline{p}$, with random variable \[X(\underline{p}) = Y_1 + Y_2 + \ldots + Y_n, \quad \text{ where } \quad Y_i \text{ are independent Bern}(p_i).\] Then $H(X(\underline{p}))$ is concave in $\underline{p}$. The open question is whether $H(X(\underline{p}))$ iincreases in $\underline{p}$ for $\underline{p} \in [0, 1/2)^n$.

Add remark

Problem 4.6.

[Piotr Nayar] If $X$ and $Y$ are i.i.d. real random variables such that $Var(X) = Var(Y) = 1$. The question is, for $Z \sim \mathcal{N}(0,1)$ and $Z$ independent of $X,Y$, is it true that \begin{equation} h(X + Y) \leq h(X + Z), \end{equation} and \begin{equation} I(X+Y) \geq I(X+Z). \end{equation}

Add remark

Problem 4.7.

[Mokshay Madiman] We know that the following sub-modularity property of the entropy is true: if $X, Y,$ and $Z$ are independent, real-valued random variables, then

\begin{equation} h(X+Y+Z) + h(Z) \leq h(X+Z) + h(Y+Z). \end{equation}

We consider a conjecture relating to quantifying the gap (RHS - LHS) of the above inequality. The conjecture is the following: If $X$, $Y$, and $Z$ are i.i.d. with a given finite variance, is the gap minimized by Gaussians?

This would have an implication for the entropic CLT: it would imply that if $S_n$ is the normalized sum that appears in the CLT, then \[2 D(S_2)\leq D(S_1) + D(S_3)\] which suggests that the sequence $D(S_n)$ in the entropic CLT is not just decreasing (as shown by ABBN) but also convex, where $D(\cdot) = D( \cdot || q)$, i.e. it is the relative entropy to the mean- and variance-matched Gaussian.

Sergey Bobkov suggested the following follow-up question. If $X \in \mathbb{R}$ and $Y \in \mathbb{R}$ are random variables with $Var(X) = Var(Y) = 1$, then $D(X+Y) \leq \frac{1}{2} D(X) + \frac{1}{2} D(Y)$. Moreover, $D(X+Y) = 0$ implies $D(X) = D(Y) = 0$. This is a special case of Cramer (1937). The question is, how do we prove Cramer (1937) using information-theoretic techniques?

Add remark

Problem 4.8.

[Andrew Barron] Let $X_1, X_2, \ldots X_n$ be i.i.d. with mean $0$ and variance $1$. Define \[S_n = \frac{X_1+X_2 + \ldots + X_n}{\sqrt{n}}\] have density $f_n$ and let the standard normal density be $\phi$. If $X_i$ has finite Poincaré constant $R$ (and therefore finite moments of all orders), then \begin{align*} J(f_n || \phi) \leq \frac{2R}{n + (2R -1)} J(f_1 || \phi), \\ D(f_n || \phi) \leq \frac{2R}{n + (2R -1)} D(f_1 || \phi). \end{align*}

The open question asks if we can establish such bounds as those above assuming only a finite fourth moment on $X_i$ without resorting to Poincaré.

Entropy power inequalities

4. Miscellaneous

Problem 4.1.

Problem 4.2.

Problem 4.3.

Problem 4.4.

Problem 4.5.

Problem 4.6.

Problem 4.7.

Problem 4.8.