\institution

Institut für Informatik
Freie Universität Berlin
E-Mail: mulzer@inf.fu-berlin.de

Five Proofs of Chernoff’s Bound with Applications¹¹1Supported in part by DFG Grants MU 3501/1 and MU 3501/2 and ERC StG 757609.

Wolfgang Mulzer

Abstract

We discuss five ways of proving Chernoff’s bound and show how they lead to different extensions of the basic bound.

1 Introduction

Chernoff’s bound gives an estimate on the probability that a sum of independent Binomial random variables deviates from its expectation [14]. It has many variants and extensions that are known under various names such as Bernstein’s inequality or Hoeffding’s bound [4, 14]. Chernoff’s bound is one of the most basic and versatile tools in the life of a theoretical computer scientist, with a seemingly endless amount of applications. Almost every contemporary textbook on algorithms or complexity theory contains a statement and a proof of the bound [2, 12, 16, 8], and there are several texts that discuss its various applications in great detail (e.g., the textbooks by Alon and Spencer [1], Dubhashi and Panchonesi [10], Mitzenmacher and Upfal [19], Motwani and Raghavan [21], or the articles by Chung and Lu [6], Hagerup and Rüb [13], or McDiarmid [17]).

In the present survey, we will see five different ways of proving the basic Chernoff bound. The different techniques used in these proofs allow various generalizations and extensions, some of which we will also discuss.

2 The Basic Bound

We begin with a statement of the basic Chernoff bound. For this, we first need a notion from information theory [9]. Let $P=(p_{1},\dots,p_{m})$ and $Q=(q_{1},\dots,q_{m})$ be two probability distributions on $m$ elements, i.e., $p_{i},q_{i}\in\mathbb{R}$ with $p_{i},q_{i}\geq 0$ , for $i=1,\dots,m$ , and $\sum_{i=1}^{m}p_{i}=\sum_{i=1}^{m}q_{i}=1$ . The Kullback-Leibler divergence or relative entropy of $P$ and $Q$ is defined as

D_{\textup{KL}}(P\|Q):=\sum_{i=1}^{m}p_{i}\ln\frac{p_{i}}{q_{i}}.

If $m=2$ , i.e., if $P=(p,1-p)$ and $Q=(q,1-q)$ , we write $D_{\textup{KL}}(p\|q)$ for $D_{\textup{KL}}((p,1-p)\|(q,1-q))$ . The Kullback-Leibler divergence measures the distance between the distributions $P$ and $Q$ : it represents the expected loss of efficiency if we encode an $m$ -letter alphabet with distribution $P$ with a code that is optimal for distribution $Q$ . Now, the basic Chernoff bound is as follows:

Theorem 2.1.

Let $n\in\mathbb{N}$ , $p\in[0,1]$ , and let $X_{1},\dots,X_{n}$ be $n$ independent random variables with $X_{i}\in\{0,1\}$ and $\Pr[X_{i}=1]=p$ , for $i=1,\dots n$ . Set $X:=\sum_{i=1}^{n}X_{i}$ . Then, for any $t\in[0,1-p]$ , we have

\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

3 Five Proofs for Theorem 2.1

We will now see five different ways of proving Theorem 2.1.

3.1 The Moment Method

The usual textbook proof of Theorem 2.1 uses the exponential function $\exp$ and Markov’s inequality. It is called the moment method, because $\exp$ simultaneously encodes all moments $X,X^{2},X^{3},\dots$ of $X$ . This trick is often attributed to Bernstein [4]. It is very general and can be used to obtain several variants of Theorem 2.1, perhaps most prominently, the Azuma-Hoeffding inequality for martingales with bounded differences [14, 3].

The proof goes as follows. Let $\lambda>0$ be a parameter to be determined later. We have

\Pr[X\geq(p+t)n]=\Pr[\lambda X\geq\lambda(p+t)n]=\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}.

From Markov’s inequality, we obtain

\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Now, the independence of the $X_{i}$ yields

\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\mathbf{E}\Biggl{[}\prod_{i=1}^{n}e^{\lambda X_{i}}\Biggr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}=\bigl{(}pe^{\lambda}+1-p\bigr{)}^{n}.

Thus,

\Pr[X>(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n},

(1)

for every $\lambda>0$ . Optimizing for $\lambda$ using calculus, we get that the right hand side is minimized if

e^{\lambda}=\frac{(1-p)(p+t)}{p(1-p-t)}.

Plugging this into (1), we get

\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.2 Chvátal’s Method

The following proof of Theorem 2.1 is due to Chvátal [7]. As we will see below, it can be generalized to give tail bounds for the hypergeometric distribution. Let $B(n,p)$ be the random variable that gives the number of heads in $n$ independent Bernoulli trials with success probability $p$ . Then,

\Pr[B(n,p)=l]=\binom{n}{l}p^{l}(1-p)^{n-l},

for $l=0,\dots,n$ . Thus, for any $\tau\geq 1$ and $k\geq pn$ , we get

\Pr[B(n,p)\geq k]=\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\\ \leq\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\underbrace{\tau^{i-k}}_{\geq 1}+\underbrace{\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}}_{\geq 0}=\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}.

Using the Binomial theorem, we obtain

\Pr[B(n,p)\geq k]\leq\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}=\tau^{-k}\sum_{i=0}^{n}\binom{n}{i}(p\tau)^{i}(1-p)^{n-i}=\frac{(p\tau+1-p)^{n}}{\tau^{k}}.

If we write $k=(p+t)n$ and $\tau=e^{\lambda}$ , we get

\Pr[B(n,p)\geq(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n}.

This is the same as (1), so we can complete the proof of Theorem 2.1 as in Section 3.1.

3.3 The Impagliazzo-Kabanets Method

The third proof is due to Impagliazzo and Kabanets [15], and it leads to a constructive version of the bound. Let $\lambda\in[0,1]$ be a parameter to be chosen later. Let $I\subseteq\{1,\dots,n\}$ be a random index set obtained by including each element $i\in\{1,\dots,n\}$ with probability $\lambda$ . We estimate $\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]}$ in two different ways, where the expectation is over the random choice of $X_{1},\dots,X_{n}$ and $I$ .

On the one hand, using the law of total expectation and independence, we have

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\prod_{i\in S}\Pr[X_{i}=1]\\ =\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot p^{|S|}=(\lambda p+1-\lambda)^{n}.

(2)

On the other hand, by the law of total expectation,

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}\Pr[X\geq(p+t)n].

Now, fix $X_{1},\dots,X_{n}$ with $X\geq(p+t)n$ . For the fixed choice of $X_{1}=x_{1},\dots,X_{n}=x_{n}$ , the expectation $\mathbf{E}\bigl{[}\prod_{i\in I}x_{i}\bigr{]}$ is exactly the probability that $I$ avoids all the $n-X$ indices $i$ where $x_{i}=0$ . Thus, the conditional expectation is

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}=\mathbf{E}\Bigl{[}(1-\lambda)^{n-X}\mid X\geq(p+t)n\Bigr{]}\geq(1-\lambda)^{(1-p-t)n},

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n].

Combining with (2),

\Pr[X\geq(p+t)n]\leq\left(\frac{\lambda p+1-\lambda}{(1-\lambda)^{(1-p-t)}}\right)^{n}.

(3)

Using calculus, we get that the right hand side is minimized for $\lambda=t/(1-p)(p+t)$ (note that $\lambda\leq 1$ for $t\leq 1-p$ ). Plugging this into (3),

\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.4 The Encoding Argument

The next proof stems from discussions with Luc Devroye, Gábor Lugosi, and Pat Morin, and it is inspired by an encoding argument [20]. A similar argument can also be derived from Xinjia Chen’s likelihood ratio method [5]. Let $\{0,1\}^{n}$ be the set of all bit strings of length $n$ , and let $w:\{0,1\}^{n}\rightarrow[0,1]$ be a weight function. We call $w$ valid if $\sum_{x\in\{0,1\}^{n}}w(x)\leq 1$ . The following lemma says that for any probability distribution $p_{x}$ on $\{0,1\}^{n}$ , a valid weight function is unlikely to be substantially larger than $p_{x}$ .

Lemma 3.1.

Let $\mathcal{D}$ be a probability distribution on $\{0,1\}^{n}$ that assigns to each $x\in\{0,1\}^{n}$ a probability $p_{x}$ , and let $w$ be a valid weight function. For any $s\geq 1$ , we have

\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]\leq 1/s.

Proof.

Let $Z_{s}=\{x\in\{0,1\}^{n}\mid w(x)\geq sp_{x}\}$ . We have

\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]=\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\leq\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\frac{w(x)}{sp_{x}}\leq(1/s)\sum_{x\in Z_{s}}w(x)\leq 1/s,

since $w(x)/sp_{x}\geq 1$ for $x\in Z_{s}$ , $p_{x}>0$ , and since $w$ is valid. ∎

We now show that Lemma 3.1 implies Theorem 2.1. For this, we interpret the sequence $X_{1},\dots,X_{n}$ as a bit string of length $n$ . This induces a probability distribution $\mathcal{D}$ that assigns to each $x\in\{0,1\}^{n}$ the probability $p_{x}=p^{k_{x}}(1-p)^{n-k_{x}}$ , where $k_{x}$ denotes the number of $1$ -bits in $x$ . We define a weight function $w:\{0,1\}^{n}\rightarrow[0,1]$ by $w(x)=(p+t)^{k_{x}}(1-p-t)^{n-k_{x}}$ , for $x\in\{0,1\}^{n}$ . Then $w$ is valid, since $w(x)$ is the probability that $x$ is generated by setting each bit to $1$ independently with probability $p+t$ . For $x\in\{0,1\}^{n}$ , we have

\frac{w(x)}{p_{x}}=\left(\frac{p+t}{p}\right)^{k_{x}}\left(\frac{1-p-t}{1-p}\right)^{n-k_{x}}.

Since $((p+t)/p)((1-p)/(1-p-t))\geq 1$ , it follows that $w(x)/p_{x}$ is an increasing function of $k_{x}$ . Hence, if $k_{x}\geq(p+t)n$ , we have

\frac{w(x)}{p_{x}}\geq\left[\left(\frac{p+t}{p}\right)^{p+t}\left(\frac{1-p-t}{1-p}\right)^{1-p-t}\right]^{n}=e^{D_{\textup{KL}}(p+t\|p)n}.

We now apply Lemma 3.1 to $\mathcal{D}$ and $w$ to get

\Pr[X\geq(p+t)n]=\Pr_{x\sim\mathcal{D}}[k_{x}\geq(p+t)n]\leq\Pr_{x\sim\mathcal{D}}\left[w(x)\geq p_{x}e^{D_{\textup{KL}}(p+t\|p)n}\right]\leq e^{-D_{\textup{KL}}(p+t\|p)n},

as claimed in Theorem 2.1.

See the survey [20] for a more thorough discussion of how this proof is related to coding theory.

3.5 A Proof via Differential Privacy

The fifth proof of Chernoff’s bound is due to Steinke and Ullman [22], and it uses methods from the theory of differential privacy [11]. Unlike the previous four proofs, it seems to lead to a slightly weaker version of the bound. Let $m$ be a parameter to be determined later. The main idea is to bound the expectation of $m-1$ independent copies of $X$ .

Lemma 3.2.

Let $m\in\mathbb{N}$ and $m\leq e^{n}$ . Let $X^{(1)},\dots,X^{(m-1)}$ be $m-1$ independent copies of $X$ , and set $X^{(m)}=\mathbf{E}[X]$ . Then,

\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq pn+5\sqrt{n\ln m}.

We will give a proof of Lemma 3.2 below. First, however, we will see how we can use Lemma 3.2 to derive the following weaker version of Theorem 2.1.²²2In the published version of this paper, the proof of Theorem 3.3 is based on an incorrect application of Markov’s inequality. We have changed Lemma 3.2 so that $X^{(m)}$ is fixed to $\mathbf{E}[X]$ . This ensures that Markov’s inequality is applied to a nonnegative random variable. We thank Natalia Shenkman for pointing this out to us.

Theorem 3.3.

\Pr[X\geq(p+t)n]\leq e^{1-\frac{1}{64}t^{2}n}.

Proof.

We may assume that $t\geq 8/\sqrt{n}$ , since otherwise the lemma holds trivially. Set $\alpha=\Pr[X\geq(p+t)n]$ . Let $X^{(1)},\dots,X^{(m-1)}$ be $m-1$ independent copies of $X$ and let $X^{(m)}=\mathbf{E}[X]$ . Then,

\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=1-(1-\alpha)^{m-1}\geq 1-e^{-\alpha(m-1)}.

(4)

On the other hand, Markov’s inequality gives

\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\geq tn\big{]}\\ \leq\frac{\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\big{]}}{tn}\leq\frac{5\sqrt{\ln m}}{t\sqrt{n}},

by Lemma 3.2. Thus, setting $m=\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}$ , and combining with (4), we get

\frac{e-1}{e}\geq 1-e^{-\alpha(m-1)}\Leftrightarrow\alpha\leq\frac{1}{\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}-1}\leq\frac{1}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1},

since $\big{(}\frac{e-1}{5e}\big{)}^{2}\geq\frac{1}{64}$ . Now the lemma follows from

\frac{\exp\big{(}\frac{t^{2}n}{64}\big{)}}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1}\leq\frac{e}{e-1}\leq e,

which holds as $t\geq 8/\sqrt{n}$ , as $x\mapsto x/(x-1)$ is decreasing for $x\geq 0$ , and as $e\geq 2$ . ∎

It remains to prove Lemma 3.2. For this, we use an idea from differential privacy. Let $A\in[0,1]^{m\times n}$ , $A=(a_{ij})$ , be an $(m\times n)$ -matrix with entries from $[0,1]$ . For a given parameter $\gamma>1$ , we define a random variable $S_{\gamma}(A)$ with values in $\{1,\dots,m\}$ as follows: for $i=1,\dots,m$ , let $b_{i}=\sum_{j=1,\dots,n}a_{ij}$ be the sum of the entries in the $i$ -th row of $A$ . Set

C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}.

Then, for $i=1,\dots,m$ , we define

\Pr[S_{\gamma}(A)=i]=\frac{\gamma^{b_{i}}}{C_{\gamma}(A)}.

The random variable $S_{\gamma}(A)$ is called a stable selector for $A$ (see the work by McSherry and Talwar [18] for more background). The next lemma states two interesting properties for $S_{\gamma}(A)$ . For a matrix $A\in[0,1]^{m\times n}$ , a vector $\vec{c}\in[0,1]^{m}$ , and a number $j\in\{1,\dots,n\}$ we denote by $(A_{-j},\vec{c})$ the matrix obtained from $A$ by replacing the $j$ -th column of $A$ with $\vec{c}$ .

Lemma 3.4.

Let $A\in[0,1]^{m\times n}$ be an $m\times n$ matrix with entries in $[0,1]$ . We have

•

Stability: For every vector $\vec{c}\in[0,1]^{m}$ and every $i\in\{1,\dots,m\}$ ,

\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i].

•

Accuracy: Let $b_{i}$ be the sum of the $i$ -th row of $A$ . Then,

\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i}\leq\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]+\log_{\gamma}m.

Proof.

Stability: for $k\in\{1,\dots,m\}$ , let $b_{k}$ be the sum of the $k$ -th row of $A$ , and let $\widetilde{b}_{k}$ be the sum of the $k$ -th row of $(A_{-j},\widetilde{c})$ . Since $A$ and $(A_{-j},\widetilde{c})$ differ in one column, and since the entries are from $[0,1]$ , we have $\widetilde{b}_{k}-1\leq b_{k}\leq\widetilde{b}_{k}+1$ . Hence,

\gamma^{-1}C_{\gamma}(A_{-j},\vec{c})\leq C_{\gamma}(A)\leq\gamma C_{\gamma}(A_{-j},\vec{c})

and

\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i],

as claimed.

Accuracy: The inequality $\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i}$ is obvious. For the second inequality, we observe that by definition,

b_{i}=\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i]).

Thus,

	$\displaystyle\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]$	$\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i])$
		$\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}C_{\gamma}(A)-\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\frac{1}{\Pr[S_{\gamma}(A)=i]}$
		$\displaystyle\geq\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\gamma^{\max_{i=1}^{m}b_{i}}-\log_{\gamma}m,$
		$\displaystyle=\max_{i=1}^{m}b_{i}-\log_{\gamma}m,$

since $C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}\geq\gamma^{\max_{i=1}^{m}b_{i}}$ and since $x\mapsto-\log_{\gamma}(x)$ is a convex function. ∎

Lemma 3.4 shows that $S_{\gamma}(A)$ constitutes a reasonable mechanism of estimating the maximum row sum of $A$ without revealing too much information about any single column of $A$ . We can now use Lemma 3.4 to bound the expectation of the maximum of $m-1$ independent copies of $X$ and $\mathbf{E}[X]$ .

Lemma 3.5.

Let $m\in\mathbb{N}$ . let $X^{(1)},\dots,X^{(m-1)}$ be $m-1$ independent copies of $X$ , and set $X^{(m)}=\mathbf{E}[X]$ . Then, for any $\gamma>1$ , we have

\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\gamma^{2}pn+\log_{\gamma}m.

Proof.

Let $X_{1}^{(1)},\dots,X_{1}^{(m-1)}$ be $m-1$ independent copies of $X_{1}$ , and let $X_{1}^{(m)}=\mathbf{E}[X_{1}]$ ; let $X_{2}^{(1)},\dots,X_{2}^{(m-1)}$ be $m-1$ independent copies of $X_{2}$ and let $X_{2}^{(m)}=\mathbf{E}[X_{2}]$ ; and so on. We consider the random $m\times n$ matrix $M\in\{0,1\}^{m\times n}$ whose entry in row $i$ and column $j$ is $X_{j}^{(i)}$ . Then, we can write $X^{(i)}=\sum_{j=1}^{n}X_{j}^{(i)}$ , for $i=1,\dots,m$ . By the accuracy claim in Lemma 3.4,

\mathbf{E}_{M}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}+\log_{\gamma}m

(5)

Now we bound $\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}$ . We unwrap the expectation for $i\sim S_{\gamma}(M)$ and get

\mathbf{E}_{M,i\sim S_{\gamma}(M)}[X^{(i)}]=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr[S_{\gamma}(M)=i]X^{(i)}\Big{]}

Let $\widetilde{M}$ be an independent copy of $M$ . Denote the entry in the $i$ -th row and $j$ -th column of $\widetilde{M}$ by $\widetilde{X}_{j}^{(i)}$ , and set $\widetilde{X}^{(i)}=\sum_{j=1}^{n}\widetilde{X}_{j}^{(i)}$ , for $i=1,\dots,m$ . By the stability claim in Lemma 3.4, for every $j\in\{1,\dots,n\}$ ,

	$\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]}$	$\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X^{(i)}\Big{]}.$
Since the random variables $X_{j}^{(i)}$ , $\widetilde{X}_{j}^{(i)}$ , $1\leq i\leq m$ , $1\leq j\leq n$ , are independent, the pairs $\big{(}(M_{-j},\widetilde{M}_{j}),X_{j}^{(i)}\big{)}$ and $\big{(}M,\widetilde{X}_{j}^{(i)}\big{)}$ have the same distribution. Therefore, we can write
	$\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]}$	$\displaystyle=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\sum_{j=1}^{n}\Pr\big{[}S_{\gamma}(M)=i\big{]}X_{j}^{(i)}\Big{]}$
		$\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X_{j}^{(i)}\Big{]}$
		$\displaystyle=\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\widetilde{X}_{j}^{(i)}\Big{]}$
		$\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\mathbf{E}_{\widetilde{M}}\big{[}\widetilde{X}^{(i)}\big{]}\Big{]}$
		$\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}pn\Big{]}=\gamma^{2}pn.$

We can conclude the lemma by plugging this bound into (5). ∎

To obtain Lemma 3.2, we set $\gamma=1+\frac{\sqrt{\ln m}}{\sqrt{n}}$ . Now, Lemma 3.5 gives

	$\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}$	$\displaystyle\leq\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)^{2}pn+\frac{\ln m}{\ln\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)}$
		$\displaystyle\leq\left(1+\frac{3\sqrt{\ln m}}{\sqrt{n}}\right)pn+\frac{\ln m}{\frac{\sqrt{\ln m}}{2\sqrt{n}}},$
since $\frac{\sqrt{\ln m}}{\sqrt{n}}\leq 1$ by our assumption $m\leq e^{n}$ and $\ln(1+x)\geq x/2$ , for $x\in[0,1]$ . Hence, using $pn\leq n$ ,
	$\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}$	$\displaystyle\leq pn+5\sqrt{n\ln m},$

as desired.

4 Useful Consequences

We now show several useful consequences of Theorem 2.1. These results can be derived directly from Theorem 2.1, and therefore they also hold for variants of the theorem with slightly different assumptions.

4.1 The Lower Tail

First, we show that an analogous bound holds for the lower tail probability $\Pr[X\leq(p-t)n]$ .

Corollary 4.1.

Let $X_{1},\dots,X_{n}$ be independent random variables with $X_{i}\in\{0,1\}$ and $\Pr[X_{i}=1]=p$ , for $i=1,\dots n$ . Set $X:=\sum_{i=1}^{n}X_{i}$ . Then, for any $t\in[0,p]$ , we have

\Pr[X\leq(p-t)n]\leq e^{-D_{\textup{KL}}(p-t\|p)n}.

Proof.

\displaystyle\Pr[X\leq(p-t)n]=\Pr[n-X\geq n-(p-t)n]=\Pr[X^{\prime}\geq(1-p+t)n],

where $X^{\prime}=\sum_{i=1}^{n}X_{i}^{\prime}$ with independent random variables $X_{i}^{\prime}\in\{0,1\}$ such that $\Pr[X_{i}^{\prime}=1]=1-p$ . The result follows from $D_{\textup{KL}}(1-p+t\|1-p)=D_{\textup{KL}}(p-t\|p)$ . ∎

4.2 Multiplicative Version

Next, we derive a multiplicative variant of Theorem 2.1. This well-known version of the bound can be found in the classic text by Motwani and Raghavan [21].

Corollary 4.2.

Let $X_{1},\dots,X_{n}$ be independent random variables with $X_{i}\in\{0,1\}$ and $\Pr[X_{i}=1]=p$ , for $i=1,\dots n$ . Set $X:=\sum_{i=1}^{n}X_{i}$ and $\mu=pn$ . Then, for any $\delta\geq 0$ , we have

	$\displaystyle\Pr[X\geq(1+\delta)\mu]$	$\displaystyle\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu},\text{ and}$
	$\displaystyle\Pr[X\leq(1-\delta)\mu]$	$\displaystyle\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.$

Proof.

Setting $t=\delta\mu/n$ in Theorem 2.1 yields

	$\displaystyle\Pr[X\geq(1+\delta)\mu]$	$\displaystyle\leq\exp\left(-n\left[p(1+\delta)\ln(1+\delta)+p\left(\frac{1-p}{p}-\delta\right)\ln\left(1-\delta\frac{p}{1-p}\right)\right]\right)$
		$\displaystyle=\left(\frac{(1-\delta p/(1-p))^{\delta-(1-p)/p}}{(1+\delta)^{1+\delta}}\right)^{\mu}$
		$\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)+\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}.$

Setting $t=\delta\mu/n$ in Corollary 4.1 yields

	$\displaystyle\Pr[X\leq(1-\delta)\mu]$	$\displaystyle\leq\exp\left(-n\left[p(1-\delta)\ln(1-\delta)+p\left(\frac{1-p}{p}+\delta\right)\ln\left(1+\delta\frac{p}{1-p}\right)\right]\right)$
		$\displaystyle=\left(\frac{(1+\delta p/(1-p))^{-\delta-(1-p)/p}}{(1-\delta)^{1-\delta}}\right)^{\mu}$
		$\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.$

∎

4.3 Useful Variants

The next few corollaries give some handy variants of the bound that are often more manageable in practice. First, we give a simple bound for the multiplicative lower tail.

Corollary 4.3.

\Pr[X\leq(1-\delta)\mu]\leq e^{-\delta^{2}\mu/2}.

Proof.

By Corollary 4.2

\Pr[X\leq(1-\delta)\mu]\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.

Using the power series expansion of $\ln(1-\delta)$ , we get

(1-\delta)\ln(1-\delta)=-(1-\delta)\sum_{i=1}^{\infty}\frac{\delta^{i}}{i}=-\delta+\sum_{i=2}^{\infty}\frac{\delta^{i}}{(i-1)i}\geq-\delta+\delta^{2}/2.

Thus,

\Pr[X\leq(1-\delta)\mu]\leq e^{[-\delta+\delta-\delta^{2}/2]\mu}=e^{-\delta^{2}\mu/2},

as claimed. ∎

An only slightly more complicated bound can be found for the multiplicative upper tail.

Corollary 4.4.

\Pr[X\geq(1+\delta)\mu]\leq e^{-\min\{\delta^{2},\delta\}\mu/4}.

Proof.

We may assume that $(1+\delta)p\leq 1$ . Then, Theorem 2.1 gives

\Pr[X\geq(1+\delta)pn]\leq e^{-D_{\textup{KL}}((1+\delta)p\|p)n}.

Define $f(\delta):=D_{\textup{KL}}((1+\delta)p\|p)$ . Then,

f^{\prime}(\delta)=p\ln(1+\delta)-p\ln(1-\delta p/(1-p))

and

f^{\prime\prime}(\delta)=\frac{p}{(1+\delta)(1-p-\delta p)}\geq\frac{p}{1+\delta}.

By Taylor’s theorem, we have

f(\delta)=f(0)+\delta f^{\prime}(0)+\frac{\delta^{2}}{2}f^{\prime\prime}(\xi),

for some $\xi\in[0,\delta]$ . Since $f(0)=f^{\prime}(0)=0$ , it follows that

f(\delta)=\frac{\delta^{2}}{2}f^{\prime\prime}(\xi)\geq\frac{\delta^{2}p}{2(1+\xi)}\geq\frac{\delta^{2}p}{2(1+\delta)}.

For $\delta\geq 1$ , we have $\delta/(1+\delta)\geq 1/2$ , for $\delta<1$ , we have $1/(\delta+1)\geq 1/2$ . This gives, for all $\delta\geq 0$ ,

f(\delta)\geq\min\{\delta^{2},\delta\}p/4,

and the claim follows. ∎

The following corollary combines the two bounds. This variant can be found, e.g., in the book by Arora and Barak [2].

Corollary 4.5.

\Pr[|X-\mu|\geq\delta\mu]\leq 2e^{-\min\{\delta^{2},\delta\}\mu/4}.

Proof.

Combine Corollaries 4.3 and 4.4. ∎

The following corollary, which appears, e.g., in the book by Motwani and Raghavan [21], is also sometimes useful.

Corollary 4.6.

Let $X_{1},\dots,X_{n}$ be independent random variables with $X_{i}\in\{0,1\}$ and $\Pr[X_{i}=1]=p$ , for $i=1,\dots n$ . Set $X:=\sum_{i=1}^{n}X_{i}$ and $\mu=pn$ . For $t\geq 2e\mu$ , we have

\Pr[X\geq t]\leq 2^{-t}.

Proof.

By Corollary 4.2

\Pr[X\geq(1+\delta)\mu]\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e}{1+\delta}\right)^{(1+\delta)\mu}.

For $\delta\geq 2e-1$ , the denominator in the right hand side is at least $2e$ , and the claim follows. ∎

5 Generalizations

We mention a few generalizations of the proof techniques for Section 3. Since the consequences from Section 4 are based on simple algebraic manipulation of the bounds, the same consequences also hold for the generalized settings.

5.1 Hoeffding Extension

The moment method (Section 3.1) yields many generalizations of Theorem 2.1. The following result is known as Hoeffding’s extension [14]. It shows that the $X_{i}$ can actually be chosen to be continuous with varying expectations.

Theorem 5.1.

Let $X_{1},\dots,X_{n}$ be independent random variables with $X_{i}\in[0,1]$ and $\mathbf{E}[X_{i}]=p_{i}$ . Set $X:=\sum_{i=1}^{n}X_{i}$ and $p:=(1/n)\sum_{i=1}^{n}p_{i}$ . Then, for any $t\in[0,1-p]$ , we have

\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

Proof.

Let $\lambda>0$ a parameter to be determined later. As before, Markov’s inequality yields

\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Using independence, we get

\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}.

(6)

Now we need to estimate $\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}$ . The function $z\mapsto e^{\lambda z}$ is convex, so $e^{\lambda z}\leq(1-z)e^{0\cdot\lambda}+ze^{1\cdot\lambda}$ for $z\in[0,1]$ . Hence,

\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}\leq\mathbf{E}[1-X_{i}+X_{i}e^{\lambda}]=1-p_{i}+p_{i}e^{\lambda}.

Going back to (6),

\mathbf{E}[e^{\lambda X}]\leq\prod_{i=1}^{n}(1-p_{i}+p_{i}e^{\lambda}).

Using the arithmetic-geometric mean inequality $\prod_{i=1}^{n}x_{i}\leq\bigl{(}(1/n)\sum_{i=1}^{n}x_{i}\bigr{)}^{n}$ , for $x_{i}\geq 0$ , this is

\mathbf{E}[e^{\lambda X}]\leq(1-p+pe^{\lambda})^{n}.

From here we continue as in Section 3.1. ∎

5.2 Hypergeometric Distribution

Chvátals proof [7] from Section 3.2 generalizes to the hypergeometric distribution. We emphasize once again that this means that all the corollaries from Section 4 also apply to this case.

Theorem 5.2.

Suppose we have an urn with $N$ balls, $P$ of which are red. We randomly draw $n$ balls from the urn without replacement. Let $H(N,P,n)$ denote the number of red balls in the sample. Set $p:=P/N$ . Then, for any $t\in[0,1-p]$ , we have

\Pr\big{[}H(N,P,n)\geq(p+t)n\big{]}\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

Proof.

It is well known that

\Pr[H(N,P,n)=l]=\binom{P}{l}\binom{N-p}{n-l}\binom{N}{l}^{-1},

for $l=0,\dots,n$ .

Claim 5.3.

For every $j\in\{0,\dots,n\}$ , we have

\binom{N}{n}^{-1}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}\leq\binom{n}{j}p^{j}.

Proof.

Consider the following random experiment: take a random permutation of the $N$ balls in the urn. Let $S$ be the sequence of the first $n$ elements in the permutation. Let $X$ be the number of $j$ -subsets of $S$ that contain only red balls. We compute $\mathbf{E}[X]$ in two different ways. On the one hand,

\mathbf{E}[X]=\sum_{i=j}^{n}\Pr[\text{S contains $i$ red balls}]\binom{i}{j}=\sum_{i=j}^{n}\binom{N}{n}^{-1}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}.

(7)

On the other hand, let $I\subseteq\{1,\dots,n\}$ with $|I|=j$ . Then the probability that all the balls in the positions indexed by $I$ are red is

\frac{P}{N}\cdot\frac{P-1}{N-1}\cdot\cdots\cdot\frac{P-j+1}{N-j+1}\leq\left(\frac{P}{N}\right)^{j}=p^{j}.

Thus, by linearity of expectation $\mathbf{E}[X]\leq\binom{n}{j}p^{j}$ . Together with (7), the claim follows. ∎

Claim 5.4.

For every $\tau\geq 1$ , we have

\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i}\leq(1+(\tau-1)p)^{n}.

Proof.

Using Claim 5.3 and the Binomial theorem (twice),

	$\displaystyle\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i}$	$\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}(1-(\tau-1))^{i}$
		$\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\sum_{j=0}^{i}\binom{i}{j}(\tau-1)^{j}$
		$\displaystyle=\binom{N}{n}^{-1}\sum_{j=0}^{n}(\tau-1)^{j}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}$
		$\displaystyle\leq\sum_{j=0}^{n}\binom{n}{j}((\tau-1)p)^{j}=(1+(\tau-1)p)^{n},$

as claimed. ∎

Thus, for any $\tau\geq 1$ and $k\geq pn$ , we get as before

\Pr[H(N,P,n)\geq k]=\binom{N}{n}^{-1}\sum_{i=k}^{n}\binom{P}{i}\binom{N-P}{n-i}\\ \leq\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i-k}\leq\frac{(p\tau+1-p)^{n}}{\tau^{k}},

by Claim 5.4. From here the proof proceeds as in Section 3.2. ∎

5.3 Negative Correlations

The proof by Impagliazzo and Kabanets [15] from Section 3.3 can be used to relax the independence assumption. It now suffices that the random variables are negatively correlated.

Theorem 5.5.

Let $X_{1},\dots,X_{n}$ be random variables with $X_{i}\in\{0,1\}$ . Suppose there exist $p_{i}\in[0,1]$ , $i=1,\dots,n$ , such that for every index set $I\subseteq\{1,\dots,n\}$ , we have $\mathbf{E}\big{[}\prod_{i\in I}X_{i}\big{]}\leq\prod_{i\in I}p_{i}$ . Set $X:=\sum_{i=1}^{n}X_{i}$ and $p:=(1/n)\sum_{i=1}^{n}p_{i}$ . Then, for any $t\in[0,1-p]$ , we have

\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

Proof.

Let $\lambda\in[0,1]$ be a parameter to be chosen later. Let $I\subseteq\{1,\dots,n\}$ be a random index set obtained by including each element $i\in\{1,\dots,n\}$ with probability $\lambda$ . As before, we estimate the expectation $\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]}$ in two different ways, where the expectation is over the random choice of $X_{1},\dots,X_{n}$ and $I$ . Similarly to before,

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}\leq\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot\Big{(}\prod_{i\in S}p_{i}\Big{)}\\ =\sum_{S\subseteq\{1,\dots,n\}}\Big{(}\prod_{i\in S}\lambda p_{i}\Big{)}\Big{(}\prod_{i\in\{1,\dots,n\}\setminus S}(1-\lambda)\Big{)}=\prod_{i=1}^{n}(1-\lambda+p_{i}\lambda)\leq(1-\lambda+p\lambda)^{n},

(8)

by the arithmetic-geometric mean inequality. The proof of the lower bound remains unchanged and yields

\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n],

as before. Combining with (8) and optimizing for $\lambda$ finishes the proof, see Section 3.3. ∎

Acknowledgments.

This survey is based on lecture notes for a class on advanced algorithms at Freie Universität Berlin. I would like to thank all the students who took this class for their interest and participation. I would also like to thank Nabil Mustafa and Jonathan Ullman for valuable comments that improved this survey.

References

[1] N. Alon and J. Spencer. The Probabilistic Method. Wiley-Interscience, 2016.
[2] S. Arora and B. Barak. Computational Complexity – A Modern Approach. Cambridge University Press, 2009.
[3] K. Azuma. Weighted sums of certain dependent random variables. Tôhoku Math. J. (2), 19:357–367, 1967.
[4] S. N. Bernstein. Sobranie Sochinenii [Collected Works]. Nauka, Moscow, 1964.
[5] X. Chen. A likelihood ratio approach for probabilistic inequalities. arXiv:1308.4123, 2013.
[6] F. R. K. Chung and L. Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79–127, 2006.
[7] V. Chvátal. The tail of the hypergeometric distribution. Discrete Mathematics, 25(3):285–287, 1979.
[8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009.
[9] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, 2en edition, 2006.
[10] D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
[11] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
[12] O. Goldreich. Computational complexity – a conceptual perspective. Cambridge University Press, 2008.
[13] T. Hagerup and C. Rüb. A guided tour of Chernoff bounds. Inform. Process. Lett., 33(6):305–308, 1990.
[14] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13–30, 1963.
[15] R. Impagliazzo and V. Kabanets. Constructive proofs of concentration bounds. In Proc. 13th Int. Conf. Approx. (APPROX) and 14th Int. Conf. Rand. Comb. Opt. (RANDOM), pages 617–631, 2010.
[16] J. M. Kleinberg and É. Tardos. Algorithm design. Addison-Wesley, 2006.
[17] C. McDiarmid. Concentration. In Probabilistic methods for algorithmic discrete mathematics, volume 16 of Algorithms Combin., pages 195–248. Springer-Verlag, 1998.
[18] F. McSherry and K. Talwar. Mechanism design via differential privacy. In Proc. 48th Annu. IEEE Symp. Found. Comput. Sci. (FOCS), pages 94–103, 2007.
[19] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, 2nd edition, 2017.
[20] P. Morin, W. Mulzer, and T. Reddad. Encoding arguments. ACM Comput. Surv., 50(3):46:1–46:36, 2017.
[21] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
[22] T. Steinke and J. Ullman. Subgaussian tail bounds via stability arguments. arXiv:1701.03493, 2017.

Five Proofs of Chernoff’s Bound with Applications111Supported in part by DFG Grants MU 3501/1 and MU 3501/2 and ERC StG 757609.

Abstract

1 Introduction

2 The Basic Bound

Theorem 2.1.

3 Five Proofs for Theorem 2.1

3.1 The Moment Method

3.2 Chvátal’s Method

3.3 The Impagliazzo-Kabanets Method

3.4 The Encoding Argument

Lemma 3.1.

Proof.

3.5 A Proof via Differential Privacy

Lemma 3.2.

Theorem 3.3.

Proof.

Lemma 3.4.

Proof.

Lemma 3.5.

Proof.

4 Useful Consequences

4.1 The Lower Tail

Corollary 4.1.

Proof.

4.2 Multiplicative Version

Corollary 4.2.

Proof.

4.3 Useful Variants

Corollary 4.3.

Proof.

Corollary 4.4.

Proof.

Corollary 4.5.

Proof.

Corollary 4.6.

Proof.

5 Generalizations

5.1 Hoeffding Extension

Theorem 5.1.

Proof.

5.2 Hypergeometric Distribution

Theorem 5.2.

Proof.

Claim 5.3.

Proof.

Claim 5.4.

Proof.

5.3 Negative Correlations

Theorem 5.5.

Proof.

Acknowledgments.

References

Five Proofs of Chernoff’s Bound with Applications¹¹1Supported in part by DFG Grants MU 3501/1 and MU 3501/2 and ERC StG 757609.