Our contributions:
This talk:
Question Does this persistence diagram show any structure?
Question Which of these persistence diagrams show structure?
Question Are these persistence diagrams different from those persistence diagrams?
Bubenik
Persistence Landscapes
Each persistence diagram converts to a function \(\mathbb{R}\times\mathbb{N}\to\mathbb{R}\).
Pointwise, the central limit theorem holds. Families of persistence diagrams can be analyzed, averaged, using classical statistical methods and pointwise asymptotic normal distributions.
Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, Singh
Stability and subsampling
Confidence regions for persistence diagrams can be generated by:
Fasy, Lecci, Rinaldo, Wasserman, Balakrishnan, Singh
Stability and distance to measure
Confidence regions for persistence diagrams can be generated by:
Robinson and Turner
Two-sample testing
To compare point clouds \(X_1,\dots,X_n\) to point clouds \(Y_1,\dots,Y_m\)
Cericola, Johnson, Kiers, Krock, Purdy, Torrence
Multiple sample (ANOVA style) testing
Extend the Robinson-Turner cost function to measure differences between three or more groups of persistence diagrams
Cericola, Johnson, Kiers, Krock, Purdy, Torrence
Multiple sample (ANOVA style) testing
Extend the Robinson-Turner cost function to measure differences between three or more groups of persistence diagrams
If such differences do exist, we then propose using a number of post-hoc (i.e. two-space) permutation tests to identify the specific pairwise differences.
| Null Accepted | Null Rejected | |
|---|---|---|
| Null True | \(U\) | \(V\) |
| Null False | \(T\) | \(S\) |
\(U\) and \(S\) count correct behaviors.
\(V\) counts false discoveries (false positives, Type I errors)
\(T\) counts missed discoveries (false negatives, Type II errors)
\(\alpha = V/N\) – probability (rate) of false discoveries
\(\beta = T/N\) – probability of missed discoveries
Suppose \(k\) different post-hoc tests are performed. We should expect on average \(\alpha\cdot k\) of these to come out significant. This rate of positive tests can become dramatically large.
To achieve the chosen level \(\alpha\) overall, the multiple tests need to be modified to not inflate the discoveries.
Bonferroni If \(\alpha\cdot k\) come out significant, replace the threshold for each test by \(\alpha/k\) (or the p-value \(p\) with \(p\cdot k\)).
Bonferroni is known to be conservative – because multiple tests can be simultaneously significant.
Other methods (Holm, Hochberg) exist that rank p-values and then test against thresholds \(m\alpha/k\) as \(m\) varies.
In all of these, \(\alpha\) is replaced by \(\alpha/k\) (with variations).
In permutation- and simulation testing contexts, this lowered threshold can dramatically increase computational load.
Basic task: choose between two competing hypotheses \(H_0\) and \(H_1\).
Basic method: pick a statistic \(T\) and define a critical region \(S\). Accept \(H_0\) when \(T\in S\), and accept \(H_1\) when \(T\not\in S\).
Level of the test is chosen by picking \(S\) so that
\(\mathbb{P}\)(accept \(H_0\) erroneously) \(=\alpha\).
Commonly: \(S = \{T : T > T_0\}\).
Acyclicity: trivial \(H_*\)
Rejecting acyclicity = this persistence diagram demonstrates structure
Fasy et al: check if confidence regions intersect the diagonal
Since the confidence regions are produced from the stability theorem, they are not tight.
Alternative plan, extensible to other applications: use a null model for point clouds.
We propose as a null model for acyclicity:
uniform point clouds
Given: potentially acyclic point cloud \(X\)
Option:
Requires \(N(N-1)/2\) bottleneck distance computations
Given: potentially acyclic point cloud \(X\)
Alternative:
Pick a test statistic that measures acyclicity of a single point cloud.
Attractive statistic: bottleneck norm – bottleneck distance to the empty diagram (ie length of the longest bar)
Requires no bottleneck distance computations.
Because bottleneck distance is a metric, \[ \|X\|_{\mathfrak{B}} + \|Y\|_{\mathfrak{B}} \geq d_{\mathfrak{B}}(X,Y) \]
In our work, we have not addressed any questions of stability for the bottleneck norm.
Motivating Question: Given a Mapper cover, is the cover good in the sense of the Nerve Lemma?
If using persistence hypothesis testing, each simplex in the Mapper complex requires a separate test. For rejection of a good cover it is enough to see one test reject the null model.
This is the exact setup for Family-Wise Error Rate correction (ie Bonferroni, Holm, Hochberg, …):
Reduce \(\mathbb{P}\)(rejecting any one hypothesis in error) to \(\alpha\).
Idea:
Next, treat each “column” as a separate batch simulation.
Then take column maxima, and treat as a simulation test: evaluate the rank of \(\max_i x_i\) among all the \(\max_i y_i^j\).
Is it actually the case that the sampling distribution of the standardized statistics are equal across different densities and shapes of uniformly sampled points?
Sufficient: check that sampling distributions of the un-standardized statistics are equal up to a linear transformation.
The resulting test procedure is:
given point clouds \(X_1,\dots,X_m\),
Suppose we are performing \(k\) hypothesis tests.
The Family-Wise Error Rate is the probability that even one single false discovery is made among these tests. Controlling the FWER is often too harsh a requirement: we know that false discoveries will happen among repeated tests.
More desirable: bounding how many of the discoveries are false.
Write \(q_{FDR} = \mathbb{E}[V/R]\), where as above
Since \[ \frac{V}{R} = \frac{V/N}{R/N} = \frac{\% V}{\% R} \] we can estimate \(q_{FDR}\) by estimating the rate of discoveries and the rate of false discoveries separately.
We can use the test statistics of the null model draws to estimate \(\% V\), and the test statistics from the observations to estimate \(\% R\).
Given a family \(X_1,\dots,X_k\) of point clouds, and a null model \(\mathcal{M}\):
\(\min_{c_i} \hat q_{FDR}(c_i)\) is the smallest achievable FDR.
The same approach can be used to control the FDR in Robinson-Turner style two-sample testing.
Here, the permuted costs form the basis to estimate \(\%V\), and the observed costs estimate \(\%R\).
To write out the process in detail we will need fairly many indices:
\(\min_{x_k} \hat q_{FDR}(x_k)\) is the smallest achievable FDR.
The FDR control methods described here work by construction: they use null model simulation (permutation) to estimate \(V\), and observe an estimated \(R\) directly from data.
Based on simulated uniformly distributed point clouds in the plane, with different point counts and different bounding boxes, we estimate the achieved level of the multiple testing vs null model method.
To estimate power, we use points on unit circles in the plane with varying degrees of Gaussian noise, parametrized by the standard deviation \(\sigma\).
| Attempted Level | 0.01 | 0.05 | 0.10 |
|---|---|---|---|
| Estimated Level (\(\|\cdot\|_{\mathfrak{B}}\)) | 0.01 | 0.07 | 0.12 |
| Estimated Level (\(\log\|\cdot\|_{\mathfrak{B}}\) ) | 0.01 | 0.07 | 0.12 |
| Estimated Power \(\sigma=0.1\), \(\|\cdot\|_{\mathfrak{B}}\) | 0.94 | 0.95 | 0.96 |
| Estimated Power \(\sigma=0.1\), \(\log\|\cdot\|_{\mathfrak{B}}\) | 0.94 | 0.95 | 0.96 |
| Estimated Power \(\sigma=0.25\), \(\|\cdot\|_{\mathfrak{B}}\) | 0.51 | 0.64 | 0.72 |
| Estimated Power \(\sigma=0.25\), \(\log\|\cdot\|_{\mathfrak{B}}\) | 0.51 | 0.64 | 0.72 |
| Attempted Level | 0.01 | 0.05 | 0.10 |
|---|---|---|---|
| Estimated Level (\(\|\cdot\|_{\mathfrak{B}}\)) | 0.04 | 0.11 | 0.17 |
| Estimated Level (\(\log\|\cdot\|_{\mathfrak{B}}\) ) | 0.03 | 0.07 | 0.13 |
| Estimated Power \(\sigma=0.1\), \(\|\cdot\|_{\mathfrak{B}}\) | 0.89 | 0.95 | 0.96 |
| Estimated Power \(\sigma=0.1\), \(\log\|\cdot\|_{\mathfrak{B}}\) | 0.80 | 0.83 | 0.87 |
| Estimated Power \(\sigma=0.25\), \(\|\cdot\|_{\mathfrak{B}}\) | 0.36 | 0.52 | 0.57 |
| Estimated Power \(\sigma=0.25\), \(\log\|\cdot\|_{\mathfrak{B}}\) | 0.36 | 0.41 | 0.51 |
We have:
https://arxiv.org/abs/1812.06491 M Vejdemo-Johansson and S Mukherjee: Multiple testing with persistent homology
http://www.math.csi.cuny.edu/~mvj/talks/union-22/Union-22.html Talk slides