\institution

Institut für Informatik
Freie Universität Berlin
E-Mail: mulzer@inf.fu-berlin.de

Five Proofs of Chernoff’s Bound with Applications111Supported in part by DFG Grants MU 3501/1 and MU 3501/2 and ERC StG 757609.

Wolfgang Mulzer
Abstract

We discuss five ways of proving Chernoff’s bound and show how they lead to different extensions of the basic bound.

1 Introduction

Chernoff’s bound gives an estimate on the probability that a sum of independent Binomial random variables deviates from its expectation [14]. It has many variants and extensions that are known under various names such as Bernstein’s inequality or Hoeffding’s bound [4, 14]. Chernoff’s bound is one of the most basic and versatile tools in the life of a theoretical computer scientist, with a seemingly endless amount of applications. Almost every contemporary textbook on algorithms or complexity theory contains a statement and a proof of the bound [2, 12, 16, 8], and there are several texts that discuss its various applications in great detail (e.g., the textbooks by Alon and Spencer [1], Dubhashi and Panchonesi [10], Mitzenmacher and Upfal [19], Motwani and Raghavan [21], or the articles by Chung and Lu [6], Hagerup and Rüb [13], or McDiarmid [17]).

In the present survey, we will see five different ways of proving the basic Chernoff bound. The different techniques used in these proofs allow various generalizations and extensions, some of which we will also discuss.

2 The Basic Bound

We begin with a statement of the basic Chernoff bound. For this, we first need a notion from information theory [9]. Let P=(p1,,pm)𝑃subscript𝑝1subscript𝑝𝑚P=(p_{1},\dots,p_{m}) and Q=(q1,,qm)𝑄subscript𝑞1subscript𝑞𝑚Q=(q_{1},\dots,q_{m}) be two probability distributions on m𝑚m elements, i.e., pi,qisubscript𝑝𝑖subscript𝑞𝑖p_{i},q_{i}\in\mathbb{R} with pi,qi0subscript𝑝𝑖subscript𝑞𝑖0p_{i},q_{i}\geq 0, for i=1,,m𝑖1𝑚i=1,\dots,m, and i=1mpi=i=1mqi=1superscriptsubscript𝑖1𝑚subscript𝑝𝑖superscriptsubscript𝑖1𝑚subscript𝑞𝑖1\sum_{i=1}^{m}p_{i}=\sum_{i=1}^{m}q_{i}=1. The Kullback-Leibler divergence or relative entropy of P𝑃P and Q𝑄Q is defined as

DKL(PQ):=i=1mpilnpiqi.assignsubscript𝐷KLconditional𝑃𝑄superscriptsubscript𝑖1𝑚subscript𝑝𝑖subscript𝑝𝑖subscript𝑞𝑖D_{\textup{KL}}(P\|Q):=\sum_{i=1}^{m}p_{i}\ln\frac{p_{i}}{q_{i}}.

If m=2𝑚2m=2, i.e., if P=(p,1p)𝑃𝑝1𝑝P=(p,1-p) and Q=(q,1q)𝑄𝑞1𝑞Q=(q,1-q), we write DKL(pq)subscript𝐷KLconditional𝑝𝑞D_{\textup{KL}}(p\|q) for DKL((p,1p)(q,1q))subscript𝐷KLconditional𝑝1𝑝𝑞1𝑞D_{\textup{KL}}((p,1-p)\|(q,1-q)). The Kullback-Leibler divergence measures the distance between the distributions P𝑃P and Q𝑄Q: it represents the expected loss of efficiency if we encode an m𝑚m-letter alphabet with distribution P𝑃P with a code that is optimal for distribution Q𝑄Q. Now, the basic Chernoff bound is as follows:

Theorem 2.1.

Let n𝑛n\in\mathbb{N}, p[0,1]𝑝01p\in[0,1], and let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be n𝑛n independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t[0,1p]𝑡01𝑝t\in[0,1-p], we have

Pr[X(p+t)n]eDKL(p+tp)n.Pr𝑋𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.

3 Five Proofs for Theorem 2.1

We will now see five different ways of proving Theorem 2.1.

3.1 The Moment Method

The usual textbook proof of Theorem 2.1 uses the exponential function exp\exp and Markov’s inequality. It is called the moment method, because exp\exp simultaneously encodes all moments X,X2,X3,𝑋superscript𝑋2superscript𝑋3X,X^{2},X^{3},\dots of X𝑋X. This trick is often attributed to Bernstein [4]. It is very general and can be used to obtain several variants of Theorem 2.1, perhaps most prominently, the Azuma-Hoeffding inequality for martingales with bounded differences [14, 3].

The proof goes as follows. Let λ>0𝜆0\lambda>0 be a parameter to be determined later. We have

Pr[X(p+t)n]=Pr[λXλ(p+t)n]=Pr[eλXeλ(p+t)n].Pr𝑋𝑝𝑡𝑛Pr𝜆𝑋𝜆𝑝𝑡𝑛Prsuperscript𝑒𝜆𝑋superscript𝑒𝜆𝑝𝑡𝑛\Pr[X\geq(p+t)n]=\Pr[\lambda X\geq\lambda(p+t)n]=\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}.

From Markov’s inequality, we obtain

Pr[eλXeλ(p+t)n]𝐄[eλX]eλ(p+t)n.Prsuperscript𝑒𝜆𝑋superscript𝑒𝜆𝑝𝑡𝑛𝐄delimited-[]superscript𝑒𝜆𝑋superscript𝑒𝜆𝑝𝑡𝑛\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Now, the independence of the Xisubscript𝑋𝑖X_{i} yields

𝐄[eλX]=𝐄[eλi=1nXi]=𝐄[i=1neλXi]=i=1n𝐄[eλXi]=(peλ+1p)n.𝐄delimited-[]superscript𝑒𝜆𝑋𝐄delimited-[]superscript𝑒𝜆superscriptsubscript𝑖1𝑛subscript𝑋𝑖𝐄delimited-[]superscriptsubscriptproduct𝑖1𝑛superscript𝑒𝜆subscript𝑋𝑖superscriptsubscriptproduct𝑖1𝑛𝐄delimited-[]superscript𝑒𝜆subscript𝑋𝑖superscript𝑝superscript𝑒𝜆1𝑝𝑛\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\mathbf{E}\Biggl{[}\prod_{i=1}^{n}e^{\lambda X_{i}}\Biggr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}=\bigl{(}pe^{\lambda}+1-p\bigr{)}^{n}.

Thus,

Pr[X>(p+t)n](peλ+1peλ(p+t))n,Pr𝑋𝑝𝑡𝑛superscript𝑝superscript𝑒𝜆1𝑝superscript𝑒𝜆𝑝𝑡𝑛\Pr[X>(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n}, (1)

for every λ>0𝜆0\lambda>0. Optimizing for λ𝜆\lambda using calculus, we get that the right hand side is minimized if

eλ=(1p)(p+t)p(1pt).superscript𝑒𝜆1𝑝𝑝𝑡𝑝1𝑝𝑡e^{\lambda}=\frac{(1-p)(p+t)}{p(1-p-t)}.

Plugging this into (1), we get

Pr[X>(p+t)n][(pp+t)p+t(1p1pt)1pt]n=eDKL(p+tp)n,Pr𝑋𝑝𝑡𝑛superscriptdelimited-[]superscript𝑝𝑝𝑡𝑝𝑡superscript1𝑝1𝑝𝑡1𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.2 Chvátal’s Method

The following proof of Theorem 2.1 is due to Chvátal [7]. As we will see below, it can be generalized to give tail bounds for the hypergeometric distribution. Let B(n,p)𝐵𝑛𝑝B(n,p) be the random variable that gives the number of heads in n𝑛n independent Bernoulli trials with success probability p𝑝p. Then,

Pr[B(n,p)=l]=(nl)pl(1p)nl,Pr𝐵𝑛𝑝𝑙binomial𝑛𝑙superscript𝑝𝑙superscript1𝑝𝑛𝑙\Pr[B(n,p)=l]=\binom{n}{l}p^{l}(1-p)^{n-l},

for l=0,,n𝑙0𝑛l=0,\dots,n. Thus, for any τ1𝜏1\tau\geq 1 and kpn𝑘𝑝𝑛k\geq pn, we get

Pr[B(n,p)k]=i=kn(ni)pi(1p)nii=kn(ni)pi(1p)niτik1+i=0k1(ni)pi(1p)niτik0=i=0n(ni)pi(1p)niτik.Pr𝐵𝑛𝑝𝑘superscriptsubscript𝑖𝑘𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscriptsubscript𝑖𝑘𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖subscriptsuperscript𝜏𝑖𝑘absent1subscriptsuperscriptsubscript𝑖0𝑘1binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscript𝜏𝑖𝑘absent0superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscript𝜏𝑖𝑘\Pr[B(n,p)\geq k]=\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\\ \leq\sum_{i=k}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\underbrace{\tau^{i-k}}_{\geq 1}+\underbrace{\sum_{i=0}^{k-1}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}}_{\geq 0}=\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}.

Using the Binomial theorem, we obtain

Pr[B(n,p)k]i=0n(ni)pi(1p)niτik=τki=0n(ni)(pτ)i(1p)ni=(pτ+1p)nτk.Pr𝐵𝑛𝑝𝑘superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscript𝑝𝑖superscript1𝑝𝑛𝑖superscript𝜏𝑖𝑘superscript𝜏𝑘superscriptsubscript𝑖0𝑛binomial𝑛𝑖superscript𝑝𝜏𝑖superscript1𝑝𝑛𝑖superscript𝑝𝜏1𝑝𝑛superscript𝜏𝑘\Pr[B(n,p)\geq k]\leq\sum_{i=0}^{n}\binom{n}{i}p^{i}(1-p)^{n-i}\tau^{i-k}=\tau^{-k}\sum_{i=0}^{n}\binom{n}{i}(p\tau)^{i}(1-p)^{n-i}=\frac{(p\tau+1-p)^{n}}{\tau^{k}}.

If we write k=(p+t)n𝑘𝑝𝑡𝑛k=(p+t)n and τ=eλ𝜏superscript𝑒𝜆\tau=e^{\lambda}, we get

Pr[B(n,p)(p+t)n](peλ+1peλ(p+t))n.Pr𝐵𝑛𝑝𝑝𝑡𝑛superscript𝑝superscript𝑒𝜆1𝑝superscript𝑒𝜆𝑝𝑡𝑛\Pr[B(n,p)\geq(p+t)n]\leq\Bigl{(}\frac{pe^{\lambda}+1-p}{e^{\lambda(p+t)}}\Bigr{)}^{n}.

This is the same as (1), so we can complete the proof of Theorem 2.1 as in Section 3.1.

3.3 The Impagliazzo-Kabanets Method

The third proof is due to Impagliazzo and Kabanets [15], and it leads to a constructive version of the bound. Let λ[0,1]𝜆01\lambda\in[0,1] be a parameter to be chosen later. Let I{1,,n}𝐼1𝑛I\subseteq\{1,\dots,n\} be a random index set obtained by including each element i{1,,n}𝑖1𝑛i\in\{1,\dots,n\} with probability λ𝜆\lambda. We estimate 𝐄[iIXi]𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]} in two different ways, where the expectation is over the random choice of X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} and I𝐼I.

On the one hand, using the law of total expectation and independence, we have

𝐄[iIXi]=S{1,,n}Pr[I=S]𝐄[iSXi]=S{1,,n}Pr[I=S]iSPr[Xi=1]=S{1,,n}λ|S|(1λ)n|S|p|S|=(λp+1λ)n.𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscript𝑆1𝑛Pr𝐼𝑆𝐄delimited-[]subscriptproduct𝑖𝑆subscript𝑋𝑖subscript𝑆1𝑛Pr𝐼𝑆subscriptproduct𝑖𝑆Prsubscript𝑋𝑖1subscript𝑆1𝑛superscript𝜆𝑆superscript1𝜆𝑛𝑆superscript𝑝𝑆superscript𝜆𝑝1𝜆𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\prod_{i\in S}\Pr[X_{i}=1]\\ =\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot p^{|S|}=(\lambda p+1-\lambda)^{n}. (2)

On the other hand, by the law of total expectation,

𝐄[iIXi]𝐄[iIXiX(p+t)n]Pr[X(p+t)n].𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖𝐄delimited-[]conditionalsubscriptproduct𝑖𝐼subscript𝑋𝑖𝑋𝑝𝑡𝑛Pr𝑋𝑝𝑡𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}\Pr[X\geq(p+t)n].

Now, fix X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} with X(p+t)n𝑋𝑝𝑡𝑛X\geq(p+t)n. For the fixed choice of X1=x1,,Xn=xnformulae-sequencesubscript𝑋1subscript𝑥1subscript𝑋𝑛subscript𝑥𝑛X_{1}=x_{1},\dots,X_{n}=x_{n}, the expectation 𝐄[iIxi]𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑥𝑖\mathbf{E}\bigl{[}\prod_{i\in I}x_{i}\bigr{]} is exactly the probability that I𝐼I avoids all the nX𝑛𝑋n-X indices i𝑖i where xi=0subscript𝑥𝑖0x_{i}=0. Thus, the conditional expectation is

𝐄[iIXiX(p+t)n]=𝐄[(1λ)nXX(p+t)n](1λ)(1pt)n,𝐄delimited-[]conditionalsubscriptproduct𝑖𝐼subscript𝑋𝑖𝑋𝑝𝑡𝑛𝐄delimited-[]conditionalsuperscript1𝜆𝑛𝑋𝑋𝑝𝑡𝑛superscript1𝜆1𝑝𝑡𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\mid X\geq(p+t)n\Bigr{]}=\mathbf{E}\Bigl{[}(1-\lambda)^{n-X}\mid X\geq(p+t)n\Bigr{]}\geq(1-\lambda)^{(1-p-t)n},

so

𝐄[iIXi](1λ)(1pt)nPr[X(p+t)n].𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖superscript1𝜆1𝑝𝑡𝑛Pr𝑋𝑝𝑡𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n].

Combining with (2),

Pr[X(p+t)n](λp+1λ(1λ)(1pt))n.Pr𝑋𝑝𝑡𝑛superscript𝜆𝑝1𝜆superscript1𝜆1𝑝𝑡𝑛\Pr[X\geq(p+t)n]\leq\left(\frac{\lambda p+1-\lambda}{(1-\lambda)^{(1-p-t)}}\right)^{n}. (3)

Using calculus, we get that the right hand side is minimized for λ=t/(1p)(p+t)𝜆𝑡1𝑝𝑝𝑡\lambda=t/(1-p)(p+t) (note that λ1𝜆1\lambda\leq 1 for t1p𝑡1𝑝t\leq 1-p). Plugging this into (3),

Pr[X>(p+t)n][(pp+t)p+t(1p1pt)1pt]n=eDKL(p+tp)n,Pr𝑋𝑝𝑡𝑛superscriptdelimited-[]superscript𝑝𝑝𝑡𝑝𝑡superscript1𝑝1𝑝𝑡1𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X>(p+t)n]\leq\Biggl{[}\Bigl{(}\frac{p}{p+t}\Bigr{)}^{p+t}\Bigl{(}\frac{1-p}{1-p-t}\Bigr{)}^{1-p-t}\Biggr{]}^{n}=e^{-D_{\textup{KL}}(p+t\|p)n},

as desired.

3.4 The Encoding Argument

The next proof stems from discussions with Luc Devroye, Gábor Lugosi, and Pat Morin, and it is inspired by an encoding argument [20]. A similar argument can also be derived from Xinjia Chen’s likelihood ratio method [5]. Let {0,1}nsuperscript01𝑛\{0,1\}^{n} be the set of all bit strings of length n𝑛n, and let w:{0,1}n[0,1]:𝑤superscript01𝑛01w:\{0,1\}^{n}\rightarrow[0,1] be a weight function. We call w𝑤w valid if x{0,1}nw(x)1subscript𝑥superscript01𝑛𝑤𝑥1\sum_{x\in\{0,1\}^{n}}w(x)\leq 1. The following lemma says that for any probability distribution pxsubscript𝑝𝑥p_{x} on {0,1}nsuperscript01𝑛\{0,1\}^{n}, a valid weight function is unlikely to be substantially larger than pxsubscript𝑝𝑥p_{x}.

Lemma 3.1.

Let 𝒟𝒟\mathcal{D} be a probability distribution on {0,1}nsuperscript01𝑛\{0,1\}^{n} that assigns to each x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n} a probability pxsubscript𝑝𝑥p_{x}, and let w𝑤w be a valid weight function. For any s1𝑠1s\geq 1, we have

Prx𝒟[w(x)spx]1/s.subscriptPrsimilar-to𝑥𝒟𝑤𝑥𝑠subscript𝑝𝑥1𝑠\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]\leq 1/s.
Proof.

Let Zs={x{0,1}nw(x)spx}subscript𝑍𝑠conditional-set𝑥superscript01𝑛𝑤𝑥𝑠subscript𝑝𝑥Z_{s}=\{x\in\{0,1\}^{n}\mid w(x)\geq sp_{x}\}. We have

Prx𝒟[w(x)spx]=xZspx>0pxxZspx>0pxw(x)spx(1/s)xZsw(x)1/s,subscriptPrsimilar-to𝑥𝒟𝑤𝑥𝑠subscript𝑝𝑥subscript𝑥subscript𝑍𝑠subscript𝑝𝑥0subscript𝑝𝑥subscript𝑥subscript𝑍𝑠subscript𝑝𝑥0subscript𝑝𝑥𝑤𝑥𝑠subscript𝑝𝑥1𝑠subscript𝑥subscript𝑍𝑠𝑤𝑥1𝑠\Pr_{x\sim\mathcal{D}}\left[w(x)\geq sp_{x}\right]=\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\leq\sum_{\begin{subarray}{c}x\in Z_{s}\\ p_{x}>0\end{subarray}}p_{x}\frac{w(x)}{sp_{x}}\leq(1/s)\sum_{x\in Z_{s}}w(x)\leq 1/s,

since w(x)/spx1𝑤𝑥𝑠subscript𝑝𝑥1w(x)/sp_{x}\geq 1 for xZs𝑥subscript𝑍𝑠x\in Z_{s}, px>0subscript𝑝𝑥0p_{x}>0, and since w𝑤w is valid. ∎

We now show that Lemma 3.1 implies Theorem 2.1. For this, we interpret the sequence X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} as a bit string of length n𝑛n. This induces a probability distribution 𝒟𝒟\mathcal{D} that assigns to each x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n} the probability px=pkx(1p)nkxsubscript𝑝𝑥superscript𝑝subscript𝑘𝑥superscript1𝑝𝑛subscript𝑘𝑥p_{x}=p^{k_{x}}(1-p)^{n-k_{x}}, where kxsubscript𝑘𝑥k_{x} denotes the number of 111-bits in x𝑥x. We define a weight function w:{0,1}n[0,1]:𝑤superscript01𝑛01w:\{0,1\}^{n}\rightarrow[0,1] by w(x)=(p+t)kx(1pt)nkx𝑤𝑥superscript𝑝𝑡subscript𝑘𝑥superscript1𝑝𝑡𝑛subscript𝑘𝑥w(x)=(p+t)^{k_{x}}(1-p-t)^{n-k_{x}}, for x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n}. Then w𝑤w is valid, since w(x)𝑤𝑥w(x) is the probability that x𝑥x is generated by setting each bit to 111 independently with probability p+t𝑝𝑡p+t. For x{0,1}n𝑥superscript01𝑛x\in\{0,1\}^{n}, we have

w(x)px=(p+tp)kx(1pt1p)nkx.𝑤𝑥subscript𝑝𝑥superscript𝑝𝑡𝑝subscript𝑘𝑥superscript1𝑝𝑡1𝑝𝑛subscript𝑘𝑥\frac{w(x)}{p_{x}}=\left(\frac{p+t}{p}\right)^{k_{x}}\left(\frac{1-p-t}{1-p}\right)^{n-k_{x}}.

Since ((p+t)/p)((1p)/(1pt))1𝑝𝑡𝑝1𝑝1𝑝𝑡1((p+t)/p)((1-p)/(1-p-t))\geq 1, it follows that w(x)/px𝑤𝑥subscript𝑝𝑥w(x)/p_{x} is an increasing function of kxsubscript𝑘𝑥k_{x}. Hence, if kx(p+t)nsubscript𝑘𝑥𝑝𝑡𝑛k_{x}\geq(p+t)n, we have

w(x)px[(p+tp)p+t(1pt1p)1pt]n=eDKL(p+tp)n.𝑤𝑥subscript𝑝𝑥superscriptdelimited-[]superscript𝑝𝑡𝑝𝑝𝑡superscript1𝑝𝑡1𝑝1𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\frac{w(x)}{p_{x}}\geq\left[\left(\frac{p+t}{p}\right)^{p+t}\left(\frac{1-p-t}{1-p}\right)^{1-p-t}\right]^{n}=e^{D_{\textup{KL}}(p+t\|p)n}.

We now apply Lemma 3.1 to 𝒟𝒟\mathcal{D} and w𝑤w to get

Pr[X(p+t)n]=Prx𝒟[kx(p+t)n]Prx𝒟[w(x)pxeDKL(p+tp)n]eDKL(p+tp)n,Pr𝑋𝑝𝑡𝑛subscriptPrsimilar-to𝑥𝒟subscript𝑘𝑥𝑝𝑡𝑛subscriptPrsimilar-to𝑥𝒟𝑤𝑥subscript𝑝𝑥superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X\geq(p+t)n]=\Pr_{x\sim\mathcal{D}}[k_{x}\geq(p+t)n]\leq\Pr_{x\sim\mathcal{D}}\left[w(x)\geq p_{x}e^{D_{\textup{KL}}(p+t\|p)n}\right]\leq e^{-D_{\textup{KL}}(p+t\|p)n},

as claimed in Theorem 2.1.

See the survey [20] for a more thorough discussion of how this proof is related to coding theory.

3.5 A Proof via Differential Privacy

The fifth proof of Chernoff’s bound is due to Steinke and Ullman [22], and it uses methods from the theory of differential privacy [11]. Unlike the previous four proofs, it seems to lead to a slightly weaker version of the bound. Let m𝑚m be a parameter to be determined later. The main idea is to bound the expectation of m1𝑚1m-1 independent copies of X𝑋X.

Lemma 3.2.

Let m𝑚m\in\mathbb{N} and men𝑚superscript𝑒𝑛m\leq e^{n}. Let X(1),,X(m1)superscript𝑋1superscript𝑋𝑚1X^{(1)},\dots,X^{(m-1)} be m1𝑚1m-1 independent copies of X𝑋X, and set X(m)=𝐄[X]superscript𝑋𝑚𝐄delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then,

𝐄[max{X(1),,X(m)}]pn+5nlnm.𝐄delimited-[]superscript𝑋1superscript𝑋𝑚𝑝𝑛5𝑛𝑚\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq pn+5\sqrt{n\ln m}.

We will give a proof of Lemma 3.2 below. First, however, we will see how we can use Lemma 3.2 to derive the following weaker version of Theorem 2.1.222In the published version of this paper, the proof of Theorem 3.3 is based on an incorrect application of Markov’s inequality. We have changed Lemma 3.2 so that X(m)superscript𝑋𝑚X^{(m)} is fixed to 𝐄[X]𝐄delimited-[]𝑋\mathbf{E}[X]. This ensures that Markov’s inequality is applied to a nonnegative random variable. We thank Natalia Shenkman for pointing this out to us.

Theorem 3.3.

Let n𝑛n\in\mathbb{N}, p[0,1]𝑝01p\in[0,1], and let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be n𝑛n independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t[0,1p]𝑡01𝑝t\in[0,1-p], we have

Pr[X(p+t)n]e1164t2n.Pr𝑋𝑝𝑡𝑛superscript𝑒1164superscript𝑡2𝑛\Pr[X\geq(p+t)n]\leq e^{1-\frac{1}{64}t^{2}n}.
Proof.

We may assume that t8/n𝑡8𝑛t\geq 8/\sqrt{n}, since otherwise the lemma holds trivially. Set α=Pr[X(p+t)n]𝛼Pr𝑋𝑝𝑡𝑛\alpha=\Pr[X\geq(p+t)n]. Let X(1),,X(m1)superscript𝑋1superscript𝑋𝑚1X^{(1)},\dots,X^{(m-1)} be m1𝑚1m-1 independent copies of X𝑋X and let X(m)=𝐄[X]superscript𝑋𝑚𝐄delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then,

Pr[max{X(1),,X(m)}(p+t)n]=1(1α)m11eα(m1).Prsuperscript𝑋1superscript𝑋𝑚𝑝𝑡𝑛1superscript1𝛼𝑚11superscript𝑒𝛼𝑚1\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=1-(1-\alpha)^{m-1}\geq 1-e^{-\alpha(m-1)}. (4)

On the other hand, Markov’s inequality gives

Pr[max{X(1),,X(m)}(p+t)n]=Pr[max{X(1),,X(m)}pntn]𝐄[max{X(1),,X(m)}pn]tn5lnmtn,Prsuperscript𝑋1superscript𝑋𝑚𝑝𝑡𝑛Prsuperscript𝑋1superscript𝑋𝑚𝑝𝑛𝑡𝑛𝐄delimited-[]superscript𝑋1superscript𝑋𝑚𝑝𝑛𝑡𝑛5𝑚𝑡𝑛\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\geq(p+t)n\big{]}=\Pr\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\geq tn\big{]}\\ \leq\frac{\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}-pn\big{]}}{tn}\leq\frac{5\sqrt{\ln m}}{t\sqrt{n}},

by Lemma 3.2. Thus, setting m=exp((e15e)2t2n)𝑚superscript𝑒15𝑒2superscript𝑡2𝑛m=\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}, and combining with (4), we get

e1e1eα(m1)α1exp((e15e)2t2n)11exp(t2n64)1,𝑒1𝑒1superscript𝑒𝛼𝑚1𝛼1superscript𝑒15𝑒2superscript𝑡2𝑛11superscript𝑡2𝑛641\frac{e-1}{e}\geq 1-e^{-\alpha(m-1)}\Leftrightarrow\alpha\leq\frac{1}{\exp\Big{(}\big{(}\frac{e-1}{5e}\big{)}^{2}t^{2}n\Big{)}-1}\leq\frac{1}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1},

since (e15e)2164superscript𝑒15𝑒2164\big{(}\frac{e-1}{5e}\big{)}^{2}\geq\frac{1}{64}. Now the lemma follows from

exp(t2n64)exp(t2n64)1ee1e,superscript𝑡2𝑛64superscript𝑡2𝑛641𝑒𝑒1𝑒\frac{\exp\big{(}\frac{t^{2}n}{64}\big{)}}{\exp\big{(}\frac{t^{2}n}{64}\big{)}-1}\leq\frac{e}{e-1}\leq e,

which holds as t8/n𝑡8𝑛t\geq 8/\sqrt{n}, as xx/(x1)maps-to𝑥𝑥𝑥1x\mapsto x/(x-1) is decreasing for x0𝑥0x\geq 0, and as e2𝑒2e\geq 2. ∎

It remains to prove Lemma 3.2. For this, we use an idea from differential privacy. Let A[0,1]m×n𝐴superscript01𝑚𝑛A\in[0,1]^{m\times n}, A=(aij)𝐴subscript𝑎𝑖𝑗A=(a_{ij}), be an (m×n)𝑚𝑛(m\times n)-matrix with entries from [0,1]01[0,1]. For a given parameter γ>1𝛾1\gamma>1, we define a random variable Sγ(A)subscript𝑆𝛾𝐴S_{\gamma}(A) with values in {1,,m}1𝑚\{1,\dots,m\} as follows: for i=1,,m𝑖1𝑚i=1,\dots,m, let bi=j=1,,naijsubscript𝑏𝑖subscript𝑗1𝑛subscript𝑎𝑖𝑗b_{i}=\sum_{j=1,\dots,n}a_{ij} be the sum of the entries in the i𝑖i-th row of A𝐴A. Set

Cγ(A)=i=1mγbi.subscript𝐶𝛾𝐴superscriptsubscript𝑖1𝑚superscript𝛾subscript𝑏𝑖C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}.

Then, for i=1,,m𝑖1𝑚i=1,\dots,m, we define

Pr[Sγ(A)=i]=γbiCγ(A).Prsubscript𝑆𝛾𝐴𝑖superscript𝛾subscript𝑏𝑖subscript𝐶𝛾𝐴\Pr[S_{\gamma}(A)=i]=\frac{\gamma^{b_{i}}}{C_{\gamma}(A)}.

The random variable Sγ(A)subscript𝑆𝛾𝐴S_{\gamma}(A) is called a stable selector for A𝐴A (see the work by McSherry and Talwar [18] for more background). The next lemma states two interesting properties for Sγ(A)subscript𝑆𝛾𝐴S_{\gamma}(A). For a matrix A[0,1]m×n𝐴superscript01𝑚𝑛A\in[0,1]^{m\times n}, a vector c[0,1]m𝑐superscript01𝑚\vec{c}\in[0,1]^{m}, and a number j{1,,n}𝑗1𝑛j\in\{1,\dots,n\} we denote by (Aj,c)subscript𝐴𝑗𝑐(A_{-j},\vec{c}) the matrix obtained from A𝐴A by replacing the j𝑗j-th column of A𝐴A with c𝑐\vec{c}.

Lemma 3.4.

Let A[0,1]m×n𝐴superscript01𝑚𝑛A\in[0,1]^{m\times n} be an m×n𝑚𝑛m\times n matrix with entries in [0,1]01[0,1]. We have

  • Stability: For every vector c[0,1]m𝑐superscript01𝑚\vec{c}\in[0,1]^{m} and every i{1,,m}𝑖1𝑚i\in\{1,\dots,m\},

    γ2Pr[Sγ(Aj,c)=i]Pr[Sγ(A)=i]γ2Pr[Sγ(Aj,c)=i].superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗𝑐𝑖Prsubscript𝑆𝛾𝐴𝑖superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗𝑐𝑖\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i].
  • Accuracy: Let bisubscript𝑏𝑖b_{i} be the sum of the i𝑖i-th row of A𝐴A. Then,

    𝐄iSγ(A)[bi]maxi=1mbi𝐄iSγ(A)[bi]+logγm.subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖superscriptsubscript𝑖1𝑚subscript𝑏𝑖subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖subscript𝛾𝑚\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i}\leq\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]+\log_{\gamma}m.
Proof.

Stability: for k{1,,m}𝑘1𝑚k\in\{1,\dots,m\}, let bksubscript𝑏𝑘b_{k} be the sum of the k𝑘k-th row of A𝐴A, and let b~ksubscript~𝑏𝑘\widetilde{b}_{k} be the sum of the k𝑘k-th row of (Aj,c~)subscript𝐴𝑗~𝑐(A_{-j},\widetilde{c}). Since A𝐴A and (Aj,c~)subscript𝐴𝑗~𝑐(A_{-j},\widetilde{c}) differ in one column, and since the entries are from [0,1]01[0,1], we have b~k1bkb~k+1subscript~𝑏𝑘1subscript𝑏𝑘subscript~𝑏𝑘1\widetilde{b}_{k}-1\leq b_{k}\leq\widetilde{b}_{k}+1. Hence,

γ1Cγ(Aj,c)Cγ(A)γCγ(Aj,c)superscript𝛾1subscript𝐶𝛾subscript𝐴𝑗𝑐subscript𝐶𝛾𝐴𝛾subscript𝐶𝛾subscript𝐴𝑗𝑐\gamma^{-1}C_{\gamma}(A_{-j},\vec{c})\leq C_{\gamma}(A)\leq\gamma C_{\gamma}(A_{-j},\vec{c})

and

γ2Pr[Sγ(Aj,c)=i]Pr[Sγ(A)=i]γ2Pr[Sγ(Aj,c)=i],superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗𝑐𝑖Prsubscript𝑆𝛾𝐴𝑖superscript𝛾2Prsubscript𝑆𝛾subscript𝐴𝑗𝑐𝑖\gamma^{-2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i]\leq\Pr[S_{\gamma}(A)=i]\leq\gamma^{2}\Pr[S_{\gamma}(A_{-j},\vec{c})=i],

as claimed.

Accuracy: The inequality 𝐄iSγ(A)[bi]maxi=1mbisubscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖superscriptsubscript𝑖1𝑚subscript𝑏𝑖\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}]\leq\max_{i=1}^{m}b_{i} is obvious. For the second inequality, we observe that by definition,

bi=logγ(Cγ(A)Pr[Sγ(A)=i]).subscript𝑏𝑖subscript𝛾subscript𝐶𝛾𝐴Prsubscript𝑆𝛾𝐴𝑖b_{i}=\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i]).

Thus,

𝐄iSγ(A)[bi]subscript𝐄similar-to𝑖subscript𝑆𝛾𝐴delimited-[]subscript𝑏𝑖\displaystyle\mathbf{E}_{i\sim S_{\gamma}(A)}[b_{i}] =i=1mPr[Sγ(A)=i]logγ(Cγ(A)Pr[Sγ(A)=i])absentsuperscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝐴𝑖subscript𝛾subscript𝐶𝛾𝐴Prsubscript𝑆𝛾𝐴𝑖\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}(C_{\gamma}(A)\Pr[S_{\gamma}(A)=i])
=i=1mPr[Sγ(A)=i]logγCγ(A)i=1mPr[Sγ(A)=i]logγ1Pr[Sγ(A)=i]absentsuperscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝐴𝑖subscript𝛾subscript𝐶𝛾𝐴superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝐴𝑖subscript𝛾1Prsubscript𝑆𝛾𝐴𝑖\displaystyle=\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}C_{\gamma}(A)-\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\frac{1}{\Pr[S_{\gamma}(A)=i]}
i=1mPr[Sγ(A)=i]logγγmaxi=1mbilogγm,absentsuperscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝐴𝑖subscript𝛾superscript𝛾superscriptsubscript𝑖1𝑚subscript𝑏𝑖subscript𝛾𝑚\displaystyle\geq\sum_{i=1}^{m}\Pr[S_{\gamma}(A)=i]\log_{\gamma}\gamma^{\max_{i=1}^{m}b_{i}}-\log_{\gamma}m,
=maxi=1mbilogγm,absentsuperscriptsubscript𝑖1𝑚subscript𝑏𝑖subscript𝛾𝑚\displaystyle=\max_{i=1}^{m}b_{i}-\log_{\gamma}m,

since Cγ(A)=i=1mγbiγmaxi=1mbisubscript𝐶𝛾𝐴superscriptsubscript𝑖1𝑚superscript𝛾subscript𝑏𝑖superscript𝛾superscriptsubscript𝑖1𝑚subscript𝑏𝑖C_{\gamma}(A)=\sum_{i=1}^{m}\gamma^{b_{i}}\geq\gamma^{\max_{i=1}^{m}b_{i}} and since xlogγ(x)maps-to𝑥subscript𝛾𝑥x\mapsto-\log_{\gamma}(x) is a convex function. ∎

Lemma 3.4 shows that Sγ(A)subscript𝑆𝛾𝐴S_{\gamma}(A) constitutes a reasonable mechanism of estimating the maximum row sum of A𝐴A without revealing too much information about any single column of A𝐴A. We can now use Lemma 3.4 to bound the expectation of the maximum of m1𝑚1m-1 independent copies of X𝑋X and 𝐄[X]𝐄delimited-[]𝑋\mathbf{E}[X].

Lemma 3.5.

Let m𝑚m\in\mathbb{N}. let X(1),,X(m1)superscript𝑋1superscript𝑋𝑚1X^{(1)},\dots,X^{(m-1)} be m1𝑚1m-1 independent copies of X𝑋X, and set X(m)=𝐄[X]superscript𝑋𝑚𝐄delimited-[]𝑋X^{(m)}=\mathbf{E}[X]. Then, for any γ>1𝛾1\gamma>1, we have

𝐄[max{X(1),,X(m)}]γ2pn+logγm.𝐄delimited-[]superscript𝑋1superscript𝑋𝑚superscript𝛾2𝑝𝑛subscript𝛾𝑚\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\gamma^{2}pn+\log_{\gamma}m.
Proof.

Let X1(1),,X1(m1)superscriptsubscript𝑋11superscriptsubscript𝑋1𝑚1X_{1}^{(1)},\dots,X_{1}^{(m-1)} be m1𝑚1m-1 independent copies of X1subscript𝑋1X_{1}, and let X1(m)=𝐄[X1]superscriptsubscript𝑋1𝑚𝐄delimited-[]subscript𝑋1X_{1}^{(m)}=\mathbf{E}[X_{1}]; let X2(1),,X2(m1)superscriptsubscript𝑋21superscriptsubscript𝑋2𝑚1X_{2}^{(1)},\dots,X_{2}^{(m-1)} be m1𝑚1m-1 independent copies of X2subscript𝑋2X_{2} and let X2(m)=𝐄[X2]superscriptsubscript𝑋2𝑚𝐄delimited-[]subscript𝑋2X_{2}^{(m)}=\mathbf{E}[X_{2}]; and so on. We consider the random m×n𝑚𝑛m\times n matrix M{0,1}m×n𝑀superscript01𝑚𝑛M\in\{0,1\}^{m\times n} whose entry in row i𝑖i and column j𝑗j is Xj(i)superscriptsubscript𝑋𝑗𝑖X_{j}^{(i)}. Then, we can write X(i)=j=1nXj(i)superscript𝑋𝑖superscriptsubscript𝑗1𝑛superscriptsubscript𝑋𝑗𝑖X^{(i)}=\sum_{j=1}^{n}X_{j}^{(i)}, for i=1,,m𝑖1𝑚i=1,\dots,m. By the accuracy claim in Lemma 3.4,

𝐄M[max{X(1),,X(m)}]𝐄M,iSγ(M)[X(i)]+logγmsubscript𝐄𝑀delimited-[]superscript𝑋1superscript𝑋𝑚subscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖subscript𝛾𝑚\mathbf{E}_{M}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]}\leq\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}+\log_{\gamma}m (5)

Now we bound 𝐄M,iSγ(M)[X(i)]subscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖\mathbf{E}_{M,i\sim S_{\gamma}(M)}\big{[}X^{(i)}\big{]}. We unwrap the expectation for iSγ(M)similar-to𝑖subscript𝑆𝛾𝑀i\sim S_{\gamma}(M) and get

𝐄M,iSγ(M)[X(i)]=𝐄M[i=1mPr[Sγ(M)=i]X(i)]subscript𝐄similar-to𝑀𝑖subscript𝑆𝛾𝑀delimited-[]superscript𝑋𝑖subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\mathbf{E}_{M,i\sim S_{\gamma}(M)}[X^{(i)}]=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr[S_{\gamma}(M)=i]X^{(i)}\Big{]}

Let M~~𝑀\widetilde{M} be an independent copy of M𝑀M. Denote the entry in the i𝑖i-th row and j𝑗j-th column of M~~𝑀\widetilde{M} by X~j(i)superscriptsubscript~𝑋𝑗𝑖\widetilde{X}_{j}^{(i)}, and set X~(i)=j=1nX~j(i)superscript~𝑋𝑖superscriptsubscript𝑗1𝑛superscriptsubscript~𝑋𝑗𝑖\widetilde{X}^{(i)}=\sum_{j=1}^{n}\widetilde{X}_{j}^{(i)}, for i=1,,m𝑖1𝑚i=1,\dots,m. By the stability claim in Lemma 3.4, for every j{1,,n}𝑗1𝑛j\in\{1,\dots,n\},

𝐄M[i=1mPr[Sγ(M)=i]X(i)]subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]} γ2𝐄M,M~[i=1mPr[Sγ(Mj,M~j)=i]X(i)].absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾subscript𝑀𝑗subscript~𝑀𝑗𝑖superscript𝑋𝑖\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X^{(i)}\Big{]}.
Since the random variables Xj(i)superscriptsubscript𝑋𝑗𝑖X_{j}^{(i)}, X~j(i)superscriptsubscript~𝑋𝑗𝑖\widetilde{X}_{j}^{(i)}, 1im1𝑖𝑚1\leq i\leq m, 1jn1𝑗𝑛1\leq j\leq n, are independent, the pairs ((Mj,M~j),Xj(i))subscript𝑀𝑗subscript~𝑀𝑗superscriptsubscript𝑋𝑗𝑖\big{(}(M_{-j},\widetilde{M}_{j}),X_{j}^{(i)}\big{)} and (M,X~j(i))𝑀superscriptsubscript~𝑋𝑗𝑖\big{(}M,\widetilde{X}_{j}^{(i)}\big{)} have the same distribution. Therefore, we can write
𝐄M[i=1mPr[Sγ(M)=i]X(i)]subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖superscript𝑋𝑖\displaystyle\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}X^{(i)}\Big{]} =𝐄M[i=1mj=1nPr[Sγ(M)=i]Xj(i)]absentsubscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝑛Prsubscript𝑆𝛾𝑀𝑖superscriptsubscript𝑋𝑗𝑖\displaystyle=\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\sum_{j=1}^{n}\Pr\big{[}S_{\gamma}(M)=i\big{]}X_{j}^{(i)}\Big{]}
γ2𝐄M,M~[j=1ni=1mPr[Sγ(Mj,M~j)=i]Xj(i)]absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑗1𝑛superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾subscript𝑀𝑗subscript~𝑀𝑗𝑖superscriptsubscript𝑋𝑗𝑖\displaystyle\leq\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M_{-j},\widetilde{M}_{j})=i\big{]}X_{j}^{(i)}\Big{]}
=γ2𝐄M,M~[j=1ni=1mPr[Sγ(M)=i]X~j(i)]absentsuperscript𝛾2subscript𝐄𝑀~𝑀delimited-[]superscriptsubscript𝑗1𝑛superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖superscriptsubscript~𝑋𝑗𝑖\displaystyle=\gamma^{2}\mathbf{E}_{M,\widetilde{M}}\Big{[}\sum_{j=1}^{n}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\widetilde{X}_{j}^{(i)}\Big{]}
=γ2𝐄M[i=1mPr[Sγ(M)=i]𝐄M~[X~(i)]]absentsuperscript𝛾2subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖subscript𝐄~𝑀delimited-[]superscript~𝑋𝑖\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}\mathbf{E}_{\widetilde{M}}\big{[}\widetilde{X}^{(i)}\big{]}\Big{]}
=γ2𝐄M[i=1mPr[Sγ(M)=i]pn]=γ2pn.absentsuperscript𝛾2subscript𝐄𝑀delimited-[]superscriptsubscript𝑖1𝑚Prsubscript𝑆𝛾𝑀𝑖𝑝𝑛superscript𝛾2𝑝𝑛\displaystyle=\gamma^{2}\mathbf{E}_{M}\Big{[}\sum_{i=1}^{m}\Pr\big{[}S_{\gamma}(M)=i\big{]}pn\Big{]}=\gamma^{2}pn.

We can conclude the lemma by plugging this bound into (5). ∎

To obtain Lemma 3.2, we set γ=1+lnmn𝛾1𝑚𝑛\gamma=1+\frac{\sqrt{\ln m}}{\sqrt{n}}. Now, Lemma 3.5 gives

𝐄[max{X(1),,X(m)}]𝐄delimited-[]superscript𝑋1superscript𝑋𝑚\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]} (1+lnmn)2pn+lnmln(1+lnmn)absentsuperscript1𝑚𝑛2𝑝𝑛𝑚1𝑚𝑛\displaystyle\leq\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)^{2}pn+\frac{\ln m}{\ln\left(1+\frac{\sqrt{\ln m}}{\sqrt{n}}\right)}
(1+3lnmn)pn+lnmlnm2n,absent13𝑚𝑛𝑝𝑛𝑚𝑚2𝑛\displaystyle\leq\left(1+\frac{3\sqrt{\ln m}}{\sqrt{n}}\right)pn+\frac{\ln m}{\frac{\sqrt{\ln m}}{2\sqrt{n}}},
since lnmn1𝑚𝑛1\frac{\sqrt{\ln m}}{\sqrt{n}}\leq 1 by our assumption men𝑚superscript𝑒𝑛m\leq e^{n} and ln(1+x)x/21𝑥𝑥2\ln(1+x)\geq x/2, for x[0,1]𝑥01x\in[0,1]. Hence, using pnn𝑝𝑛𝑛pn\leq n,
𝐄[max{X(1),,X(m)}]𝐄delimited-[]superscript𝑋1superscript𝑋𝑚\displaystyle\mathbf{E}\big{[}\max\{X^{(1)},\dots,X^{(m)}\}\big{]} pn+5nlnm,absent𝑝𝑛5𝑛𝑚\displaystyle\leq pn+5\sqrt{n\ln m},

as desired.

4 Useful Consequences

We now show several useful consequences of Theorem 2.1. These results can be derived directly from Theorem 2.1, and therefore they also hold for variants of the theorem with slightly different assumptions.

4.1 The Lower Tail

First, we show that an analogous bound holds for the lower tail probability Pr[X(pt)n]Pr𝑋𝑝𝑡𝑛\Pr[X\leq(p-t)n].

Corollary 4.1.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i}. Then, for any t[0,p]𝑡0𝑝t\in[0,p], we have

Pr[X(pt)n]eDKL(ptp)n.Pr𝑋𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X\leq(p-t)n]\leq e^{-D_{\textup{KL}}(p-t\|p)n}.
Proof.
Pr[X(pt)n]=Pr[nXn(pt)n]=Pr[X(1p+t)n],Pr𝑋𝑝𝑡𝑛Pr𝑛𝑋𝑛𝑝𝑡𝑛Prsuperscript𝑋1𝑝𝑡𝑛\displaystyle\Pr[X\leq(p-t)n]=\Pr[n-X\geq n-(p-t)n]=\Pr[X^{\prime}\geq(1-p+t)n],

where X=i=1nXisuperscript𝑋superscriptsubscript𝑖1𝑛superscriptsubscript𝑋𝑖X^{\prime}=\sum_{i=1}^{n}X_{i}^{\prime} with independent random variables Xi{0,1}superscriptsubscript𝑋𝑖01X_{i}^{\prime}\in\{0,1\} such that Pr[Xi=1]=1pPrsuperscriptsubscript𝑋𝑖11𝑝\Pr[X_{i}^{\prime}=1]=1-p. The result follows from DKL(1p+t1p)=DKL(ptp)subscript𝐷KL1𝑝conditional𝑡1𝑝subscript𝐷KL𝑝conditional𝑡𝑝D_{\textup{KL}}(1-p+t\|1-p)=D_{\textup{KL}}(p-t\|p). ∎

4.2 Multiplicative Version

Next, we derive a multiplicative variant of Theorem 2.1. This well-known version of the bound can be found in the classic text by Motwani and Raghavan [21].

Corollary 4.2.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and μ=pn𝜇𝑝𝑛\mu=pn. Then, for any δ0𝛿0\delta\geq 0, we have

Pr[X(1+δ)μ]Pr𝑋1𝛿𝜇\displaystyle\Pr[X\geq(1+\delta)\mu] (eδ(1+δ)1+δ)μ, andabsentsuperscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇 and\displaystyle\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu},\text{ and}
Pr[X(1δ)μ]Pr𝑋1𝛿𝜇\displaystyle\Pr[X\leq(1-\delta)\mu] (eδ(1δ)1δ)μ.absentsuperscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇\displaystyle\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.
Proof.

Setting t=δμ/n𝑡𝛿𝜇𝑛t=\delta\mu/n in Theorem 2.1 yields

Pr[X(1+δ)μ]Pr𝑋1𝛿𝜇\displaystyle\Pr[X\geq(1+\delta)\mu] exp(n[p(1+δ)ln(1+δ)+p(1ppδ)ln(1δp1p)])absent𝑛delimited-[]𝑝1𝛿1𝛿𝑝1𝑝𝑝𝛿1𝛿𝑝1𝑝\displaystyle\leq\exp\left(-n\left[p(1+\delta)\ln(1+\delta)+p\left(\frac{1-p}{p}-\delta\right)\ln\left(1-\delta\frac{p}{1-p}\right)\right]\right)
=((1δp/(1p))δ(1p)/p(1+δ)1+δ)μabsentsuperscriptsuperscript1𝛿𝑝1𝑝𝛿1𝑝𝑝superscript1𝛿1𝛿𝜇\displaystyle=\left(\frac{(1-\delta p/(1-p))^{\delta-(1-p)/p}}{(1+\delta)^{1+\delta}}\right)^{\mu}
(eδ2p/(1p)+δ(1+δ)1+δ)μ(eδ(1+δ)1+δ)μ.absentsuperscriptsuperscript𝑒superscript𝛿2𝑝1𝑝𝛿superscript1𝛿1𝛿𝜇superscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)+\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}.

Setting t=δμ/n𝑡𝛿𝜇𝑛t=\delta\mu/n in Corollary 4.1 yields

Pr[X(1δ)μ]Pr𝑋1𝛿𝜇\displaystyle\Pr[X\leq(1-\delta)\mu] exp(n[p(1δ)ln(1δ)+p(1pp+δ)ln(1+δp1p)])absent𝑛delimited-[]𝑝1𝛿1𝛿𝑝1𝑝𝑝𝛿1𝛿𝑝1𝑝\displaystyle\leq\exp\left(-n\left[p(1-\delta)\ln(1-\delta)+p\left(\frac{1-p}{p}+\delta\right)\ln\left(1+\delta\frac{p}{1-p}\right)\right]\right)
=((1+δp/(1p))δ(1p)/p(1δ)1δ)μabsentsuperscriptsuperscript1𝛿𝑝1𝑝𝛿1𝑝𝑝superscript1𝛿1𝛿𝜇\displaystyle=\left(\frac{(1+\delta p/(1-p))^{-\delta-(1-p)/p}}{(1-\delta)^{1-\delta}}\right)^{\mu}
(eδ2p/(1p)δ(1δ)1δ)μ(eδ(1δ)1δ)μ.absentsuperscriptsuperscript𝑒superscript𝛿2𝑝1𝑝𝛿superscript1𝛿1𝛿𝜇superscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇\displaystyle\leq\left(\frac{e^{-\delta^{2}p/(1-p)-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.

4.3 Useful Variants

The next few corollaries give some handy variants of the bound that are often more manageable in practice. First, we give a simple bound for the multiplicative lower tail.

Corollary 4.3.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and μ=pn𝜇𝑝𝑛\mu=pn. Then, for any δ(0,1)𝛿01\delta\in(0,1), we have

Pr[X(1δ)μ]eδ2μ/2.Pr𝑋1𝛿𝜇superscript𝑒superscript𝛿2𝜇2\Pr[X\leq(1-\delta)\mu]\leq e^{-\delta^{2}\mu/2}.
Proof.

By Corollary 4.2

Pr[X(1δ)μ](eδ(1δ)1δ)μ.Pr𝑋1𝛿𝜇superscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇\Pr[X\leq(1-\delta)\mu]\leq\left(\frac{e^{-\delta}}{(1-\delta)^{1-\delta}}\right)^{\mu}.

Using the power series expansion of ln(1δ)1𝛿\ln(1-\delta), we get

(1δ)ln(1δ)=(1δ)i=1δii=δ+i=2δi(i1)iδ+δ2/2.1𝛿1𝛿1𝛿superscriptsubscript𝑖1superscript𝛿𝑖𝑖𝛿superscriptsubscript𝑖2superscript𝛿𝑖𝑖1𝑖𝛿superscript𝛿22(1-\delta)\ln(1-\delta)=-(1-\delta)\sum_{i=1}^{\infty}\frac{\delta^{i}}{i}=-\delta+\sum_{i=2}^{\infty}\frac{\delta^{i}}{(i-1)i}\geq-\delta+\delta^{2}/2.

Thus,

Pr[X(1δ)μ]e[δ+δδ2/2]μ=eδ2μ/2,Pr𝑋1𝛿𝜇superscript𝑒delimited-[]𝛿𝛿superscript𝛿22𝜇superscript𝑒superscript𝛿2𝜇2\Pr[X\leq(1-\delta)\mu]\leq e^{[-\delta+\delta-\delta^{2}/2]\mu}=e^{-\delta^{2}\mu/2},

as claimed. ∎

An only slightly more complicated bound can be found for the multiplicative upper tail.

Corollary 4.4.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and μ=pn𝜇𝑝𝑛\mu=pn. Then, for any δ0𝛿0\delta\geq 0, we have

Pr[X(1+δ)μ]emin{δ2,δ}μ/4.Pr𝑋1𝛿𝜇superscript𝑒superscript𝛿2𝛿𝜇4\Pr[X\geq(1+\delta)\mu]\leq e^{-\min\{\delta^{2},\delta\}\mu/4}.
Proof.

We may assume that (1+δ)p11𝛿𝑝1(1+\delta)p\leq 1. Then, Theorem 2.1 gives

Pr[X(1+δ)pn]eDKL((1+δ)pp)n.Pr𝑋1𝛿𝑝𝑛superscript𝑒subscript𝐷KLconditional1𝛿𝑝𝑝𝑛\Pr[X\geq(1+\delta)pn]\leq e^{-D_{\textup{KL}}((1+\delta)p\|p)n}.

Define f(δ):=DKL((1+δ)pp)assign𝑓𝛿subscript𝐷KLconditional1𝛿𝑝𝑝f(\delta):=D_{\textup{KL}}((1+\delta)p\|p). Then,

f(δ)=pln(1+δ)pln(1δp/(1p))superscript𝑓𝛿𝑝1𝛿𝑝1𝛿𝑝1𝑝f^{\prime}(\delta)=p\ln(1+\delta)-p\ln(1-\delta p/(1-p))

and

f′′(δ)=p(1+δ)(1pδp)p1+δ.superscript𝑓′′𝛿𝑝1𝛿1𝑝𝛿𝑝𝑝1𝛿f^{\prime\prime}(\delta)=\frac{p}{(1+\delta)(1-p-\delta p)}\geq\frac{p}{1+\delta}.

By Taylor’s theorem, we have

f(δ)=f(0)+δf(0)+δ22f′′(ξ),𝑓𝛿𝑓0𝛿superscript𝑓0superscript𝛿22superscript𝑓′′𝜉f(\delta)=f(0)+\delta f^{\prime}(0)+\frac{\delta^{2}}{2}f^{\prime\prime}(\xi),

for some ξ[0,δ]𝜉0𝛿\xi\in[0,\delta]. Since f(0)=f(0)=0𝑓0superscript𝑓00f(0)=f^{\prime}(0)=0, it follows that

f(δ)=δ22f′′(ξ)δ2p2(1+ξ)δ2p2(1+δ).𝑓𝛿superscript𝛿22superscript𝑓′′𝜉superscript𝛿2𝑝21𝜉superscript𝛿2𝑝21𝛿f(\delta)=\frac{\delta^{2}}{2}f^{\prime\prime}(\xi)\geq\frac{\delta^{2}p}{2(1+\xi)}\geq\frac{\delta^{2}p}{2(1+\delta)}.

For δ1𝛿1\delta\geq 1, we have δ/(1+δ)1/2𝛿1𝛿12\delta/(1+\delta)\geq 1/2, for δ<1𝛿1\delta<1, we have 1/(δ+1)1/21𝛿1121/(\delta+1)\geq 1/2. This gives, for all δ0𝛿0\delta\geq 0,

f(δ)min{δ2,δ}p/4,𝑓𝛿superscript𝛿2𝛿𝑝4f(\delta)\geq\min\{\delta^{2},\delta\}p/4,

and the claim follows. ∎

The following corollary combines the two bounds. This variant can be found, e.g., in the book by Arora and Barak [2].

Corollary 4.5.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and μ=pn𝜇𝑝𝑛\mu=pn. Then, for any δ>0𝛿0\delta>0, we have

Pr[|Xμ|δμ]2emin{δ2,δ}μ/4.Pr𝑋𝜇𝛿𝜇2superscript𝑒superscript𝛿2𝛿𝜇4\Pr[|X-\mu|\geq\delta\mu]\leq 2e^{-\min\{\delta^{2},\delta\}\mu/4}.
Proof.

Combine Corollaries 4.3 and 4.4. ∎

The following corollary, which appears, e.g., in the book by Motwani and Raghavan [21], is also sometimes useful.

Corollary 4.6.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\} and Pr[Xi=1]=pPrsubscript𝑋𝑖1𝑝\Pr[X_{i}=1]=p, for i=1,n𝑖1𝑛i=1,\dots n. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and μ=pn𝜇𝑝𝑛\mu=pn. For t2eμ𝑡2𝑒𝜇t\geq 2e\mu, we have

Pr[Xt]2t.Pr𝑋𝑡superscript2𝑡\Pr[X\geq t]\leq 2^{-t}.
Proof.

By Corollary 4.2

Pr[X(1+δ)μ](eδ(1+δ)1+δ)μ(e1+δ)(1+δ)μ.Pr𝑋1𝛿𝜇superscriptsuperscript𝑒𝛿superscript1𝛿1𝛿𝜇superscript𝑒1𝛿1𝛿𝜇\Pr[X\geq(1+\delta)\mu]\leq\left(\frac{e^{\delta}}{(1+\delta)^{1+\delta}}\right)^{\mu}\leq\left(\frac{e}{1+\delta}\right)^{(1+\delta)\mu}.

For δ2e1𝛿2𝑒1\delta\geq 2e-1, the denominator in the right hand side is at least 2e2𝑒2e, and the claim follows. ∎

5 Generalizations

We mention a few generalizations of the proof techniques for Section 3. Since the consequences from Section 4 are based on simple algebraic manipulation of the bounds, the same consequences also hold for the generalized settings.

5.1 Hoeffding Extension

The moment method (Section 3.1) yields many generalizations of Theorem 2.1. The following result is known as Hoeffding’s extension [14]. It shows that the Xisubscript𝑋𝑖X_{i} can actually be chosen to be continuous with varying expectations.

Theorem 5.1.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be independent random variables with Xi[0,1]subscript𝑋𝑖01X_{i}\in[0,1] and 𝐄[Xi]=pi𝐄delimited-[]subscript𝑋𝑖subscript𝑝𝑖\mathbf{E}[X_{i}]=p_{i}. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and p:=(1/n)i=1npiassign𝑝1𝑛superscriptsubscript𝑖1𝑛subscript𝑝𝑖p:=(1/n)\sum_{i=1}^{n}p_{i}. Then, for any t[0,1p]𝑡01𝑝t\in[0,1-p], we have

Pr[X(p+t)n]eDKL(p+tp)n.Pr𝑋𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

Let λ>0𝜆0\lambda>0 a parameter to be determined later. As before, Markov’s inequality yields

Pr[eλXeλ(p+t)n]𝐄[eλX]eλ(p+t)n.Prsuperscript𝑒𝜆𝑋superscript𝑒𝜆𝑝𝑡𝑛𝐄delimited-[]superscript𝑒𝜆𝑋superscript𝑒𝜆𝑝𝑡𝑛\Pr\bigl{[}e^{\lambda X}\geq e^{\lambda(p+t)n}\bigr{]}\leq\frac{\mathbf{E}[e^{\lambda X}]}{e^{\lambda(p+t)n}}.

Using independence, we get

𝐄[eλX]=𝐄[eλi=1nXi]=i=1n𝐄[eλXi].𝐄delimited-[]superscript𝑒𝜆𝑋𝐄delimited-[]superscript𝑒𝜆superscriptsubscript𝑖1𝑛subscript𝑋𝑖superscriptsubscriptproduct𝑖1𝑛𝐄delimited-[]superscript𝑒𝜆subscript𝑋𝑖\mathbf{E}[e^{\lambda X}]=\mathbf{E}\Bigl{[}e^{\lambda\sum_{i=1}^{n}X_{i}}\Bigr{]}=\prod_{i=1}^{n}\mathbf{E}\Bigl{[}e^{\lambda X_{i}}\Bigr{]}. (6)

Now we need to estimate 𝐄[eλXi]𝐄delimited-[]superscript𝑒𝜆subscript𝑋𝑖\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}. The function zeλzmaps-to𝑧superscript𝑒𝜆𝑧z\mapsto e^{\lambda z} is convex, so eλz(1z)e0λ+ze1λsuperscript𝑒𝜆𝑧1𝑧superscript𝑒0𝜆𝑧superscript𝑒1𝜆e^{\lambda z}\leq(1-z)e^{0\cdot\lambda}+ze^{1\cdot\lambda} for z[0,1]𝑧01z\in[0,1]. Hence,

𝐄[eλXi]𝐄[1Xi+Xieλ]=1pi+pieλ.𝐄delimited-[]superscript𝑒𝜆subscript𝑋𝑖𝐄delimited-[]1subscript𝑋𝑖subscript𝑋𝑖superscript𝑒𝜆1subscript𝑝𝑖subscript𝑝𝑖superscript𝑒𝜆\mathbf{E}\bigl{[}e^{\lambda X_{i}}\bigr{]}\leq\mathbf{E}[1-X_{i}+X_{i}e^{\lambda}]=1-p_{i}+p_{i}e^{\lambda}.

Going back to (6),

𝐄[eλX]i=1n(1pi+pieλ).𝐄delimited-[]superscript𝑒𝜆𝑋superscriptsubscriptproduct𝑖1𝑛1subscript𝑝𝑖subscript𝑝𝑖superscript𝑒𝜆\mathbf{E}[e^{\lambda X}]\leq\prod_{i=1}^{n}(1-p_{i}+p_{i}e^{\lambda}).

Using the arithmetic-geometric mean inequality i=1nxi((1/n)i=1nxi)nsuperscriptsubscriptproduct𝑖1𝑛subscript𝑥𝑖superscript1𝑛superscriptsubscript𝑖1𝑛subscript𝑥𝑖𝑛\prod_{i=1}^{n}x_{i}\leq\bigl{(}(1/n)\sum_{i=1}^{n}x_{i}\bigr{)}^{n}, for xi0subscript𝑥𝑖0x_{i}\geq 0, this is

𝐄[eλX](1p+peλ)n.𝐄delimited-[]superscript𝑒𝜆𝑋superscript1𝑝𝑝superscript𝑒𝜆𝑛\mathbf{E}[e^{\lambda X}]\leq(1-p+pe^{\lambda})^{n}.

From here we continue as in Section 3.1. ∎

5.2 Hypergeometric Distribution

Chvátals proof [7] from Section 3.2 generalizes to the hypergeometric distribution. We emphasize once again that this means that all the corollaries from Section 4 also apply to this case.

Theorem 5.2.

Suppose we have an urn with N𝑁N balls, P𝑃P of which are red. We randomly draw n𝑛n balls from the urn without replacement. Let H(N,P,n)𝐻𝑁𝑃𝑛H(N,P,n) denote the number of red balls in the sample. Set p:=P/Nassign𝑝𝑃𝑁p:=P/N. Then, for any t[0,1p]𝑡01𝑝t\in[0,1-p], we have

Pr[H(N,P,n)(p+t)n]eDKL(p+tp)n.Pr𝐻𝑁𝑃𝑛𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr\big{[}H(N,P,n)\geq(p+t)n\big{]}\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

It is well known that

Pr[H(N,P,n)=l]=(Pl)(Npnl)(Nl)1,Pr𝐻𝑁𝑃𝑛𝑙binomial𝑃𝑙binomial𝑁𝑝𝑛𝑙superscriptbinomial𝑁𝑙1\Pr[H(N,P,n)=l]=\binom{P}{l}\binom{N-p}{n-l}\binom{N}{l}^{-1},

for l=0,,n𝑙0𝑛l=0,\dots,n.

Claim 5.3.

For every j{0,,n}𝑗0𝑛j\in\{0,\dots,n\}, we have

(Nn)1i=jn(Pi)(NPni)(ij)(nj)pj.superscriptbinomial𝑁𝑛1superscriptsubscript𝑖𝑗𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗binomial𝑛𝑗superscript𝑝𝑗\binom{N}{n}^{-1}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}\leq\binom{n}{j}p^{j}.
Proof.

Consider the following random experiment: take a random permutation of the N𝑁N balls in the urn. Let S𝑆S be the sequence of the first n𝑛n elements in the permutation. Let X𝑋X be the number of j𝑗j-subsets of S𝑆S that contain only red balls. We compute 𝐄[X]𝐄delimited-[]𝑋\mathbf{E}[X] in two different ways. On the one hand,

𝐄[X]=i=jnPr[S contains i red balls](ij)=i=jn(Nn)1(Pi)(NPni)(ij).𝐄delimited-[]𝑋superscriptsubscript𝑖𝑗𝑛PrS contains i red ballsbinomial𝑖𝑗superscriptsubscript𝑖𝑗𝑛superscriptbinomial𝑁𝑛1binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗\mathbf{E}[X]=\sum_{i=j}^{n}\Pr[\text{S contains $i$ red balls}]\binom{i}{j}=\sum_{i=j}^{n}\binom{N}{n}^{-1}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}. (7)

On the other hand, let I{1,,n}𝐼1𝑛I\subseteq\{1,\dots,n\} with |I|=j𝐼𝑗|I|=j. Then the probability that all the balls in the positions indexed by I𝐼I are red is

PNP1N1Pj+1Nj+1(PN)j=pj.𝑃𝑁𝑃1𝑁1𝑃𝑗1𝑁𝑗1superscript𝑃𝑁𝑗superscript𝑝𝑗\frac{P}{N}\cdot\frac{P-1}{N-1}\cdot\cdots\cdot\frac{P-j+1}{N-j+1}\leq\left(\frac{P}{N}\right)^{j}=p^{j}.

Thus, by linearity of expectation 𝐄[X](nj)pj𝐄delimited-[]𝑋binomial𝑛𝑗superscript𝑝𝑗\mathbf{E}[X]\leq\binom{n}{j}p^{j}. Together with (7), the claim follows. ∎

Claim 5.4.

For every τ1𝜏1\tau\geq 1, we have

(Nn)1i=0n(Pi)(NPni)τi(1+(τ1)p)n.superscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscript𝜏𝑖superscript1𝜏1𝑝𝑛\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i}\leq(1+(\tau-1)p)^{n}.
Proof.

Using Claim 5.3 and the Binomial theorem (twice),

(Nn)1i=0n(Pi)(NPni)τisuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscript𝜏𝑖\displaystyle\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i} =(Nn)1i=0n(Pi)(NPni)(1(τ1))iabsentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscript1𝜏1𝑖\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}(1-(\tau-1))^{i}
=(Nn)1i=0n(Pi)(NPni)j=0i(ij)(τ1)jabsentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptsubscript𝑗0𝑖binomial𝑖𝑗superscript𝜏1𝑗\displaystyle=\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\sum_{j=0}^{i}\binom{i}{j}(\tau-1)^{j}
=(Nn)1j=0n(τ1)ji=jn(Pi)(NPni)(ij)absentsuperscriptbinomial𝑁𝑛1superscriptsubscript𝑗0𝑛superscript𝜏1𝑗superscriptsubscript𝑖𝑗𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖binomial𝑖𝑗\displaystyle=\binom{N}{n}^{-1}\sum_{j=0}^{n}(\tau-1)^{j}\sum_{i=j}^{n}\binom{P}{i}\binom{N-P}{n-i}\binom{i}{j}
j=0n(nj)((τ1)p)j=(1+(τ1)p)n,absentsuperscriptsubscript𝑗0𝑛binomial𝑛𝑗superscript𝜏1𝑝𝑗superscript1𝜏1𝑝𝑛\displaystyle\leq\sum_{j=0}^{n}\binom{n}{j}((\tau-1)p)^{j}=(1+(\tau-1)p)^{n},

as claimed. ∎

Thus, for any τ1𝜏1\tau\geq 1 and kpn𝑘𝑝𝑛k\geq pn, we get as before

Pr[H(N,P,n)k]=(Nn)1i=kn(Pi)(NPni)(Nn)1i=0n(Pi)(NPni)τik(pτ+1p)nτk,Pr𝐻𝑁𝑃𝑛𝑘superscriptbinomial𝑁𝑛1superscriptsubscript𝑖𝑘𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscriptbinomial𝑁𝑛1superscriptsubscript𝑖0𝑛binomial𝑃𝑖binomial𝑁𝑃𝑛𝑖superscript𝜏𝑖𝑘superscript𝑝𝜏1𝑝𝑛superscript𝜏𝑘\Pr[H(N,P,n)\geq k]=\binom{N}{n}^{-1}\sum_{i=k}^{n}\binom{P}{i}\binom{N-P}{n-i}\\ \leq\binom{N}{n}^{-1}\sum_{i=0}^{n}\binom{P}{i}\binom{N-P}{n-i}\tau^{i-k}\leq\frac{(p\tau+1-p)^{n}}{\tau^{k}},

by Claim 5.4. From here the proof proceeds as in Section 3.2. ∎

5.3 Negative Correlations

The proof by Impagliazzo and Kabanets [15] from Section 3.3 can be used to relax the independence assumption. It now suffices that the random variables are negatively correlated.

Theorem 5.5.

Let X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} be random variables with Xi{0,1}subscript𝑋𝑖01X_{i}\in\{0,1\}. Suppose there exist pi[0,1]subscript𝑝𝑖01p_{i}\in[0,1], i=1,,n𝑖1𝑛i=1,\dots,n, such that for every index set I{1,,n}𝐼1𝑛I\subseteq\{1,\dots,n\}, we have 𝐄[iIXi]iIpi𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscriptproduct𝑖𝐼subscript𝑝𝑖\mathbf{E}\big{[}\prod_{i\in I}X_{i}\big{]}\leq\prod_{i\in I}p_{i}. Set X:=i=1nXiassign𝑋superscriptsubscript𝑖1𝑛subscript𝑋𝑖X:=\sum_{i=1}^{n}X_{i} and p:=(1/n)i=1npiassign𝑝1𝑛superscriptsubscript𝑖1𝑛subscript𝑝𝑖p:=(1/n)\sum_{i=1}^{n}p_{i}. Then, for any t[0,1p]𝑡01𝑝t\in[0,1-p], we have

Pr[X(p+t)n]eDKL(p+tp)n.Pr𝑋𝑝𝑡𝑛superscript𝑒subscript𝐷KL𝑝conditional𝑡𝑝𝑛\Pr[X\geq(p+t)n]\leq e^{-D_{\textup{KL}}(p+t\|p)n}.
Proof.

Let λ[0,1]𝜆01\lambda\in[0,1] be a parameter to be chosen later. Let I{1,,n}𝐼1𝑛I\subseteq\{1,\dots,n\} be a random index set obtained by including each element i{1,,n}𝑖1𝑛i\in\{1,\dots,n\} with probability λ𝜆\lambda. As before, we estimate the expectation 𝐄[iIXi]𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖\mathbf{E}\bigl{[}\prod_{i\in I}X_{i}\bigr{]} in two different ways, where the expectation is over the random choice of X1,,Xnsubscript𝑋1subscript𝑋𝑛X_{1},\dots,X_{n} and I𝐼I. Similarly to before,

𝐄[iIXi]=S{1,,n}Pr[I=S]𝐄[iSXi]S{1,,n}λ|S|(1λ)n|S|(iSpi)=S{1,,n}(iSλpi)(i{1,,n}S(1λ))=i=1n(1λ+piλ)(1λ+pλ)n,𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖subscript𝑆1𝑛Pr𝐼𝑆𝐄delimited-[]subscriptproduct𝑖𝑆subscript𝑋𝑖subscript𝑆1𝑛superscript𝜆𝑆superscript1𝜆𝑛𝑆subscriptproduct𝑖𝑆subscript𝑝𝑖subscript𝑆1𝑛subscriptproduct𝑖𝑆𝜆subscript𝑝𝑖subscriptproduct𝑖1𝑛𝑆1𝜆superscriptsubscriptproduct𝑖1𝑛1𝜆subscript𝑝𝑖𝜆superscript1𝜆𝑝𝜆𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}=\sum_{S\subseteq\{1,\dots,n\}}\Pr[I=S]\cdot\mathbf{E}\Bigl{[}\prod_{i\in S}X_{i}\Bigr{]}\leq\sum_{S\subseteq\{1,\dots,n\}}\lambda^{|S|}(1-\lambda)^{n-|S|}\cdot\Big{(}\prod_{i\in S}p_{i}\Big{)}\\ =\sum_{S\subseteq\{1,\dots,n\}}\Big{(}\prod_{i\in S}\lambda p_{i}\Big{)}\Big{(}\prod_{i\in\{1,\dots,n\}\setminus S}(1-\lambda)\Big{)}=\prod_{i=1}^{n}(1-\lambda+p_{i}\lambda)\leq(1-\lambda+p\lambda)^{n}, (8)

by the arithmetic-geometric mean inequality. The proof of the lower bound remains unchanged and yields

𝐄[iIXi](1λ)(1pt)nPr[X(p+t)n],𝐄delimited-[]subscriptproduct𝑖𝐼subscript𝑋𝑖superscript1𝜆1𝑝𝑡𝑛Pr𝑋𝑝𝑡𝑛\mathbf{E}\Bigl{[}\prod_{i\in I}X_{i}\Bigr{]}\geq(1-\lambda)^{(1-p-t)n}\Pr[X\geq(p+t)n],

as before. Combining with (8) and optimizing for λ𝜆\lambda finishes the proof, see Section 3.3. ∎

Acknowledgments.

This survey is based on lecture notes for a class on advanced algorithms at Freie Universität Berlin. I would like to thank all the students who took this class for their interest and participation. I would also like to thank Nabil Mustafa and Jonathan Ullman for valuable comments that improved this survey.

References

  • [1] N. Alon and J. Spencer. The Probabilistic Method. Wiley-Interscience, 2016.
  • [2] S. Arora and B. Barak. Computational Complexity – A Modern Approach. Cambridge University Press, 2009.
  • [3] K. Azuma. Weighted sums of certain dependent random variables. Tôhoku Math. J. (2), 19:357–367, 1967.
  • [4] S. N. Bernstein. Sobranie Sochinenii [Collected Works]. Nauka, Moscow, 1964.
  • [5] X. Chen. A likelihood ratio approach for probabilistic inequalities. arXiv:1308.4123, 2013.
  • [6] F. R. K. Chung and L. Lu. Concentration inequalities and martingale inequalities: A survey. Internet Mathematics, 3(1):79–127, 2006.
  • [7] V. Chvátal. The tail of the hypergeometric distribution. Discrete Mathematics, 25(3):285–287, 1979.
  • [8] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 3rd edition, 2009.
  • [9] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley-Interscience, 2en edition, 2006.
  • [10] D. P. Dubhashi and A. Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms. Cambridge University Press, 2009.
  • [11] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.
  • [12] O. Goldreich. Computational complexity – a conceptual perspective. Cambridge University Press, 2008.
  • [13] T. Hagerup and C. Rüb. A guided tour of Chernoff bounds. Inform. Process. Lett., 33(6):305–308, 1990.
  • [14] W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc., 58:13–30, 1963.
  • [15] R. Impagliazzo and V. Kabanets. Constructive proofs of concentration bounds. In Proc. 13th Int. Conf. Approx. (APPROX) and 14th Int. Conf. Rand. Comb. Opt. (RANDOM), pages 617–631, 2010.
  • [16] J. M. Kleinberg and É. Tardos. Algorithm design. Addison-Wesley, 2006.
  • [17] C. McDiarmid. Concentration. In Probabilistic methods for algorithmic discrete mathematics, volume 16 of Algorithms Combin., pages 195–248. Springer-Verlag, 1998.
  • [18] F. McSherry and K. Talwar. Mechanism design via differential privacy. In Proc. 48th Annu. IEEE Symp. Found. Comput. Sci. (FOCS), pages 94–103, 2007.
  • [19] M. Mitzenmacher and E. Upfal. Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, 2nd edition, 2017.
  • [20] P. Morin, W. Mulzer, and T. Reddad. Encoding arguments. ACM Comput. Surv., 50(3):46:1–46:36, 2017.
  • [21] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
  • [22] T. Steinke and J. Ullman. Subgaussian tail bounds via stability arguments. arXiv:1701.03493, 2017.