$\newcommand{\Cat}{{\rm Cat}}$ $\newcommand{\A}{\mathcal A}$ $\newcommand{\freestar}{ \framebox[7pt]{\star} }$

## 1. New models and algorithms

1. ### Adding loops to SBM

#### Problem 1.1.

[Emmanuel Abbe] Real-world networks are likely to have many small loops (e.g. friends introduce their friends to each other) but this feature is not captured by the SBM. One proposed model to capture this is as follows. Fix parameters $t \in [0,1]$ and $\tau > 0$. Let $\mathcal{D}_{+}$ and $\mathcal{D}_{-}$ be two (high-dimensional) Gaussian distributions. For each vertex $u$, randomly choose a community assignment $x_u \in \{+,-\}$ and sample $y_u \sim \mathcal{D}_{x_u}$. For each pair of vertices $u,v$, decide whether to include edge $(u,v)$ as follows. With probability $t$, decide based on the usual SBM. With probability $1-t$, add the edge iff $\|y_u - y_v\| \le \tau$. More generally, $\tau_{uv}$ could depend on the relative community status $x_u x_v$.
• ### Triangle closure

#### Problem 1.2.

[Cris Moore] Let $G \sim G_{n,p}$ with $p = \Theta(1)/n$. For each pair of vertices at distance 2, add an edge between them with probability $q$. Call the resulting graph $H$. Is it possible to recover $G$ from $H$? If $G$ is instead drawn from the SBM, is it possible to recover the communities from $H$? Is the threshold the same or different?
• ### Graph localization

#### Problem 1.3.

[Andrea Montanari] The following graph localization problem is a non-mean field model, which means that many standard tools from statistical physics do not apply. Let $R \subseteq \mathbb{R}^d$ and let $x_1, \ldots, x_n$ be drawn i.i.d. from the uniform distribution on $R$. For each $i,j$, observe $$Y_{ij} = \left\{ \begin{array}{cl}\|x_i - x_j\|^2 + z_{ij} & \text{if } \|x_i - x_j\| \le \rho \\ * & \text{otherwise} \end{array} \right.$$ where $z_{ij}$ is noise. The goal is to recover the values $\|x_i - x_j\|^2$.
• ### Iterative plurality

#### Problem 1.4.

[Luca Trevisan] Let $G \sim \mathrm{SBM}$. Consider the following algorithm. Initialize each vertex to a random color from $[k]$. Repeatedly update each node’s color to the plurality of its neighbors. Are there any conditions under which this is guaranteed to converge to a partition that is correlated with the true communities?
• ### Projective sampling scheme for sparse graphs

#### Problem 1.5.

[Victor Veitch] For each $n$, let $\mathcal{D}_n$ be a distribution over graphs of $n$ vertices. A projective sampling scheme is as follows. For any $k < n$ we have a procedure to, given a sample from $\mathcal{D}_n$, choose $k$ vertices so that the induced subgraph is distributed according to $\mathcal{D}_k$. We would like interesting examples of such sampling schemes for sparse graphs. More generally, we may not require the parameter $n$ to be the number of vertices. One example is taking a sub-box for random geometric graphs. Graphons use the sampling scheme of selecting $k$ vertices uniformly at random.
• ### Node vs. edge information

#### Problem 1.6.

[Elchanan Mossel] Some real-world graphs (e.g. facebook) have information both on the nodes and on the edges. Can you quantify how much information is contained in the nodes versus in the edges? An example model is as follows. Take a metric space $(X,d)$. Nodes are sampled from $X$, and each $x \in X$ has features $f(x)$. The probability of edge $(x,y)$ depends on $d(x,y)$.
• ### Can local changes shift the threshold?

#### Problem 1.7.

[Nikhil Srivastava] Let $G \sim \mathrm{SBM}$. Now let $G$ be changed by local rules. For what types of rules does this change the threshold?

Cite this as: AimPL: Connecting communities via the block model, available at http://aimpl.org/blockmodel.