The Neyman-Pearson lemma is foundational and important result in the theory of hypothesis testing. When presented in class the proof seemed magical and I had no idea where the ideas came from. I even drew a face like this 😲 next to the usual \(\square\) in my book when the proof was finished. Later in class we learnt the method of undetermined multipliers and suddenly I saw where the Neyman-Pearson lemma came from.
In this blog post, I’ll first give some background and set up notation for the Neyman-Pearson lemma. Then I’ll talk about the method of undetermined multipliers and show how it can be used to derive and prove the Neyman-Pearson lemma. Finally, I’ll write about why I still think the Neyman-Pearson lemma is magical despite the demystified proof.
Background
In the set up of the Neyman-Pearson lemma we have data \(x\) which is a realisation of some random variable \(X\). We wish to conclude something about the distribution of \(X\) from our data \(x\) by doing a hypothesis test.
In the Neyman-Pearson lemma we have simple hypotheses. That is our data either comes from the distribution \(\mathbb{P}_0\) or from the distribution \(\mathbb{P}_1\). Thus, our null hypothesis is \(H_0 : X \sim \mathbb{P}_0\) and our alternative hypothesis is \(H_1 : X \sim \mathbb{P}_1\).
A test of \(H_0\) against \(H_1\) is a function \(\phi\) that takes in data \(x\) and returns a number \(\phi(x) \in [0,1]\). The value \(\phi(x)\) is the probability under the test \(\phi\) of rejecting \(H_0\) given the observed data \(x\). That is, if \(\phi(x) = 1\), we always reject \(H_0\) and if \(\phi(x)=0\) we never reject \(H_0\). For in-between values \(\phi(x) = \gamma \in (0,1)\), we reject \(H_0\) with probability \(\gamma\).
An ideal test would have two desirable properties. We would like a test that rejects \(H_0\) with a low probability when \(H_0\) is true but we would also like the test to reject \(H_0\) with a high probability when \(H_1\) is true. To state this more formally, let \(\mathbb{E}_0[\phi(X)]\) and \(\mathbb{E}_1[\phi(X)]\) be the expectation of \(\phi(X)\) under \(H_0\) and \(H_1\) respectively. The quantity \(\mathbb{E}_0[\phi(X)]\) is the probability we reject \(H_0\) when \(H_0\) is true. Likewise, the quantity \(\mathbb{E}_1[\phi(X)]\) is the probability we reject \(H_0\) when \(H_1\) is true. An optimal test would be one that minimises \(\mathbb{E}_0[\phi(X)]\) and maximises \(\mathbb{E}_1[\phi(X)]\).
Unfortunately the goals of minimising \(\mathbb{E}_0[\phi(X)]\) and maximising \(\mathbb{E}_1[\phi(X)]\) are at odds with one another. This is because we want \(\phi\) to be small in order to minimise \(\mathbb{E}_0[\phi(X)]\) and we want \(\phi\) to be large to maximise \(\mathbb{E}_1[\phi(X)]\). In nearly all cases we have to trade off between these two goals and there is no single test that simultaneously achieves both.
To work around this, a standard approach is to focus on maximising \(\mathbb{E}_1[\phi(X)]\) while requiring that \(\mathbb{E}_0[\phi(X)]\) remains below some threshold. The quantity \(\mathbb{E}_1[\phi(X)]\) is called the power of the test \(\phi\). If \(\alpha\) is a number between \(0\) and \(1\), we will say that \(\phi\) has level-\(\alpha\) if \(\mathbb{E}_1[\phi(X)] \le \alpha\). A test \(\phi\) is said to be most powerful at level-\( \alpha\), if
- The test \(\phi\) is level-\(\alpha\).
- For all level-\(\alpha\) tests \(\phi’\), the test \(\phi\) is more powerful than \(\phi’\). That is,
\(\mathbb{E}_1[\phi'(X)] \le \mathbb{E}_1[\phi(X)]\).
Thus we can see that finding a most powerful level-\(\alpha\) test is a constrained optimisation problem. We wish to maximise the quantity
\(\mathbb{E}_1[\phi(X)]\)
subject to the constraint
\(\mathbb{E}_0[\phi(X)] \le \alpha\)
With this in mind, we turn to the method of undetermined multipliers.
The method of undetermined multipliers
The method of undetermined multipliers (also called the method of Lagrange multipliers) is a very general optimisation tool. Suppose that we have a set \(U\) and two function \(f,g : U \to \mathbb{R}\) and we wish to maximise \(f(u)\) subject to the constraint \(g(u) \le 0\).
In the context of hypothesis testing, the set \(U\) is the set of all tests \(\phi\). The objective function \(f\) is given by \(f(\phi) = \mathbb{E}_1[\phi(X)]\). That is, \(f(\phi)\) is the power of the test \(\phi\). The constraint function \(g\) is given by \(g(\phi)=\mathbb{E}_1[\phi(X)]-\alpha\) so that \(g(\phi) \le 0\) if and only if \(\phi\) has level-\(\alpha\).
The method of undetermined multipliers allows us to reduce this constrained optimisation problem to a (hopefully easier) unconstrained optimisation problem. More specifically we have the following result:
Proposition: Suppose that \(u^* \in U\) is such that:
- \(g(u^*) = 0\),
- There exists \(k \ge 0\), such that \(u^*\) maximises \(f(u)-kg(u)\) over all \(u \in U\).
Then \(u\) maximises \(f(u)\) under the constraint \(g(u) \le 0\).
Proof: Suppose that \(u^*\) satisfies the above two dot points. We need to show that for all \(u \in U\), if \(g(u) \le 0\), then \(f(u) \le f(u^*)\). By assumption we know that \(f(u^*)=0\) and \(u^*\) maximises \(f(u)-kg(u)\). Thus, for all \(u \in U\),
\(f(u^*)=f(u^*)-kg(u^*) \ge f(u)-kg(u)\).
Now suppose \(g(u) \le 0\). Then, \(kg(u) \le 0\) and so \(f(u)-kg(u)\ge f(u)\) and so \(f(u^*) \ge f(u)\) as needed. \(\square\)
The constant \(k\) is the undetermined multiplier. The way to use the method of undetermined is to find values \(u_k^*\) that maximise \(h_k(u) = f(u)-kg(u)\) for each \(k \ge 0\). The multiplier \(k\) is then varied so that the constraint \(g(u^*_k) = 0\) is satisfied.
Proving the Neyman-Pearson lemma
Now let’s use the method of undetermined multipliers to find most powerful tests. Recall the set \(U\) which we are optimising over is the set of all tests \(\phi\). Recall also that we wish to optimise \(f(\phi) = \mathbb{E}_1[\phi(X)]\) subject to the constraint \(g(\phi) = \mathbb{E}_0[\phi(X)] – \alpha \le 0\). The method of undetermined multipliers says that we should consider maximising the function
\(h_k(\phi) = \mathbb{E}_1[\phi(X)] – k\mathbb{E}_0[\phi(X)]\),
where \(k \ge 0\). Suppose that both \(\mathbb{P}_0\) and \(\mathbb{P}_1\) have densities1 \(p_0\) and \(p_1\) with respect to some measure \(\mu\). We can we can write the above expectations as integrals. That is,
\(\mathbb{E}_0[\phi(X)] = \int \phi(x)p_0(x)\mu(dx)\) and \(\mathbb{E}_1[\phi(X)] = \int \phi(x)p_1(x)\mu(dx)\).
Thus the function \(h_k\) is equal to
\(h_k(\phi) = \int \phi(x)p_1(x)\mu(dx)- k\int \phi(x)p_0(x)\mu(dx)=\int \phi(x)(p_1(x)-kp_0(x))\mu(dx)\).
We now wish to maximise the function \(h_k(\phi)\). Recall that \(\phi\) is a function that take values in \([0,1]\). Thus, the integral
\(\int \phi(x)(p_1(x)-kp_0(x))\mu(dx)\),
is maximised if and only if \(\phi(x)=1\) when \(p_1(x)-kp_0(x) > 0\) and \(\phi(x)=0\) when \(p_1(x)-kp_0(x) < 0\). Note that \(p_1(x)-kp_0(x) > 0\) if and only if \(\frac{p_1(x)}{p_0(x)} > k\). Thus for each \(k \ge 0\), a test \(\phi_k\) maximises \(h_k(\phi)\) if and only if
\(\phi_k(x) = \begin{cases} 1 & \text{if } \frac{p_1(x)}{p_0(x)} > k,\\ 0 &\text{if } \frac{p_1(x)}{p_0(x)} < k. \end{cases}\)
The method of undetermined multipliers says that if we can find \(k\) so that the above is satisfied and \(g(\phi_k) = 0\), then \(\phi_k\) is a most powerful test. Recall that \(g(\phi_k)=0\) is equivalent to \(\mathbb{E}_1[\phi(X)]=\alpha\). By summarising the above argument, we arrive at the Neyman-Pearson lemma,
Neyman-Pearson Lemma2: Suppose that \(\phi\) is a test such that
- \(\mathbb{E}_0[\phi(X)] = \alpha\), and
- For some \(k \ge 0\), \(\phi(x) = \begin{cases} 1 & \text{if } \frac{p_1(x)}{p_0(x)} > k,\\ 0 & \text{if } \frac{p_1(x)}{p_0(x)}< k.\end{cases}\)
then \(\phi\) is most powerful at level-\(\alpha\) for testing \(H_0 : X \sim \mathbb{P}_0\) against \(H_1 : X \sim \mathbb{P}_1\).
The magic of Neyman-Pearson
By learning about undetermined multipliers I’ve been able to better understand the proof of the Neyman-Pearson lemma. I now view it is as clever solution to a constrained optimisation problem rather than something that comes out of nowhere.
There is, however, a different aspect of Neyman-Pearson that continues to surprise me. This aspect is the usefulness of the lemma. At first glance the Neyman-Pearson lemma seems to be a very specialised result because it is about simple hypothesis testing. In reality most interesting hypothesis tests have composite nulls or composite alternatives or both. It turns out that Neyman-Pearson is still incredibly useful for composite testing. Through ideas like monotone likelihood ratios, least favourable distributions and unbiasedness, the Neyman-Pearson lemma or similar ideas can be used to find optimal tests in a variety of settings.
Thus I must admit that the title of this blog post is a little inaccurate and deceptive. I do believe that, given the tools of undetermined multipliers and the set up of simple hypothesis testing, one is naturally led to the Neyman-Pearson lemma. However, I don’t believe that many could have realised how useful and interesting simple hypothesis testing would be.
Footnotes
- The assumption that \(\mathbb{P}_0\) and \(\mathbb{P}_1\) have densities with respect to a common measure \(\mu\) is not a restrictive assumption since one can always take \(\mu = \mathbb{P}_0+\mathbb{P}_1\) and the apply Radon-Nikodym. However there is often a more natural choice of \(\mu\) such as Lebesgue measure on \(\mathbb{R}^d\) or the counting measure on \(\mathbb{N}\).
- What I call the Neyman-Pearson lemma is really only a third of the Neyman-Pearson lemma. There are two other parts. One that guarantees the existence of a most powerful test and one that is a partial converse to the statement in this post.



