Extremal couplings

This post is inspired by an assignment question I had to answer for STATS 310A – a probability course at Stanford for first year students in the statistics PhD program. In the question we had to derive a few results about couplings. I found myself thinking and talking about the question long after submitting the assignment and decided to put my thoughts on paper. I would like to thank our lecturer Prof. Diaconis for answering my questions and pointing me in the right direction.

What are couplings?

Given two distribution functions \(F\) and \(G\) on \(\mathbb{R}\), a coupling of \(F\) and \(G\) is a distribution function \(H\) on \(\mathbb{R}^2\) such that the marginals of \(H\) are \(F\) and \(G\). Couplings can be used to give probabilistic proofs of analytic statements about \(F\) and \(G\) (see here). Couplings are also are studied in their own right in the theory optimal transport.

We can think of \(F\) and \(G\) as being the cumulative distribution functions of some random variables \(X\) and \(Y\). A coupling \(H\) of \(F\) and \(G\) thus corresponds to a random vector \((\widetilde{X},\widetilde{Y})\) where \(\widetilde{X}\) has the same distribution as \(X\), \(\widetilde{Y}\) has the same distribution as \(Y\) and \((\widetilde{X},\widetilde{Y}) \sim H\).

The independent coupling

For two given distributions function \(F\) and \(G\) there exist many possible couplings. For example we could take \(H = H_I\) where \(H_I(x,y) = F(x)G(y)\). This coupling corresponds to a random vector \((\widetilde{X}_I,\widetilde{Y}_I)\) where \(\widetilde{X}_I\) and \(\widetilde{Y}_I\) are independent and (as is required for all couplings) \(\widetilde{X}_I \stackrel{\text{dist}}{=} X\), \(\widetilde{Y}_I \stackrel{\text{dist}}{=} Y\).

In some sense the coupling \(H_I\) is in the “middle” of all couplings. This is because \(\widetilde{X}\) and \(\widetilde{Y}\) are independent and so \(\widetilde{X}\) doesn’t carry any information about \(\widetilde{Y}\). As the title of the post suggests, there are couplings were this isn’t the case and \(\widetilde{X}\) carries “as much information as possible” about \(\widetilde{Y}\).

The two extremal couplings

Define two function \(H_L, H_U :\mathbb{R}^2 \to [0,1]\) by

\(H_U(x,y) = \min\{F(x), G(y)\}\) and \(H_L(x,y) = \max\{F(x)+G(y) – 1, 0\}\).

With some work, one can show that \(H_L\) and \(H_U\) are distributions functions on \(\mathbb{R}^2\) and that they have the correct marginals. In this post I would like to talk about how to construct random vectors \((\widetilde{X}_U, \widetilde{Y}_U) \sim H_U\) and \((\widetilde{X}_L, \widetilde{Y}_L) \sim H_L\).

Let \(F^{-1}\) and \(G^{-1}\) be the quantile functions of \(F\) and \(G\). That is,

\(F^{-1}(c) = \inf\{ x \in \mathbb{R} : F(x) \ge c\}\) and \(G^{-1}(c) = \inf\{ x \in \mathbb{R} : G(x) \ge c\}\).

Now let \(V\) be a random variable that is uniformly distributed on \([0,1]\) and define

\(\widetilde{X}_U = F^{-1}(V)\) and \(\widetilde{Y}_U = G^{-1}(V)\).

Since \(F^{-1}(V) \le x\) if and only if \(V \le F(x)\), we have \(\widetilde{X}_U \stackrel{\text{dist}}{=} X\) and likewise \(\widetilde{Y}_U \stackrel{\text{dist}}{=} Y\). Furthermore \(\widetilde{X}_U \le x, \widetilde{Y}_U \le y\) occurs if and only if \(V \le F(x), V \le G(y)\) which is equivalent to \(V \le \min\{F(x),G(y)\}\). Thus

\(\mathbb{P}(\widetilde{X}_U \le x, \widetilde{Y}_U \le y) = \mathbb{P}(V \le \min\{F(x),G(y)\})= \min\{F(x),G(y)\}.\)

Thus \((\widetilde{X}_U,\widetilde{Y}_U)\) is distributed according to \(H_U\). We see that under the coupling \(H_U\), \(\widetilde{X}_U\) and \(\widetilde{Y}_U\) are closely related as they are both increasing functions of a common random variable \(V\).

We can follow a similar construction for \(H_L\). Define

\(\widetilde{X}_L = F^{-1}(V)\) and \(\widetilde{Y}_L = G^{-1}(1-V)\).

Thus \(\widetilde{X}_L\) and \(\widetilde{Y}_L\) are again functions of a common random variable \(V\) but \(\widetilde{X}_L\) is an increasing function of \(V\) and \(\widetilde{Y}_L\) is a decreasing function of \(V\). Note that \(1-V\) is also uniformly distributed on \([0,1]\). Thus \(\widetilde{X}_L \stackrel{\text{dist}}{=} X\) and \(\widetilde{Y}_L \stackrel{\text{dist}}{=} Y\).

Now \(\widetilde{X}_L \le x, \widetilde{Y}_L \le y\) occurs if and only if \(V \le F(x)\) and \(1-V \le G(y)\) which occurs if and only if \(1-G(y) \le V \le F(x)\). If \(1-G(y) \le F(x)\), then \(F(x)+G(y)-1 \ge 0\) and \(\mathbb{P}(1-G(y) \le V \le F(x)) =F(x)+G(y)-1\). On the other hand, if \(1 – G(y) > F(x)\), then \(F(x)+G(y)-1< 0\) and \(\mathbb{P}(1-G(y) \le V \le F(x))=0\). Thus

\(\mathbb{P}(\widetilde{X}_L \le x, \widetilde{Y}_L \le y) = \mathbb{P}(1-G(y) \le V \le F(x)) = \max\{F(x)+G(y)-1,0\}\),

and so \((\widetilde{X}_L,\widetilde{Y}_L)\) is distributed according to \(H_L\).

What makes \(H_U\) and \(H_L\) extreme?

Now that we know that \(H_U\) and \(H_L\) are indeed couplings, it is natural to ask what makes them “extreme”. What we would like to say is that \(\widetilde{Y}_U\) is an increasing function of \(\widetilde{X}_U\) and \(\widetilde{Y}_L\) is a decreasing function of \(\widetilde{X}_L\). Unfortunately this isn’t always the case as can be seen by taking \(X\) to be constant and \(Y\) to be continuous.

However the intuition that \(\widetilde{Y}_U\) is increasing in \(\widetilde{X}_U\) and \(\widetilde{Y}_L\) is decreasing in \(\widetilde{X}_L\) is close to correct. Given a coupling \((\widetilde{X},\widetilde{Y}) \sim H\), we can look at the quantity

\(C(x,y) = \mathbb{P}(\widetilde{Y} \le y | \widetilde{X} \le x) -\mathbb{P}(\widetilde{Y} \le y) = \frac{H(x,y)}{F(x)}-G(y)\)

This quantity tells us something about how \(\widetilde{Y}\) changes with \(\widetilde{X}\). For instance if \(\widetilde{X}\) and \(\widetilde{Y}\) were positively correlated, then \(C(x,y)\) would be positive and if \(\widetilde{X}\) and \(\widetilde{Y}\) were negatively correlated, then \(C(x,y)\) would be negative.

For the independent coupling \((\widetilde{X}_I,\widetilde{Y}_I) \sim H_I\), the quantity \(C(x,y)\) is constantly \(0\). It turns out that the above probability is maximised by the coupling \((\widetilde{X}_U, \widetilde{Y}_U) \sim H_U\) and minimised by \((\widetilde{X}_L,\widetilde{Y}_L) \sim H_L\) and it is in this sense that they are extremal. This final claim is the two dimensional version of the Fréchet-Hoeffding Theorem and checking it is a good exercise.

Comments

One response to “Extremal couplings”

  1. […] course and we got up to at least 10. (For more appearances by Professor Diaconis on this blog see here, here and […]

Leave a Reply

Your email address will not be published. Required fields are marked *