Statistika pro informatiku - Edux

Statistika pro informatiku prof. RNDr. Roman Koteck´y DrSc., Dr. Rudolf Blaˇzek, PhD Katedra teoreticke´ informatiky FIT ˇ Cesk e´ vysoke´ uˇcen´ı technicke´ v Praze

´ ska 4 MI-SPI, ZS 2011/12, Pˇrednaˇ

´ ı fond. Evropsk´y socialn´ Praha & EU: Investujeme do vaˇs´ı budoucnosti

ˇ Roman Kotecky, ´ Rudolf Blaˇzek (FIT CVUT)

Statistika pro informatiku

ˇ MI-SPI, ZS 2011/12, Pˇredna´ ska 4

1 / 23

Recapitulation

Recapitulation Random variable: a function X : Ω → R.

Probability function: pX (x ) = P (X = x ).

Cumulative distribution function: FX (x ) = P (X ≤ x ) =

Expectation: E (X ) =

P

x :pX (x )>0

P

xk :xk ≤x

p(xk ).

xpX (x ).

Linearity: a, b ∈ R, then E (aX + bY ) = aE (X ) + bE (Y ).

Expectation of an indicator: E (IA ) = P (A). k th moment: mk = E (X k ). k th central moment: σk = E ((X − m1 )k ).

Variance: var(X ) = σ2 = E ((X − E (X ))2 ) = E (X 2 ) − (E (X ))2 .. Bernoulli random variable: pX (0) = 1 − p a pX (1) = p. Binomial random variable: pX (k ) =

n k

pk (1 − p)n−k , k = 1, . . . , n.

Geometric random variable: pX (k ) = (1 − p)k −1 p, k = 1, 2, . . . . k

Poisson random variable: pX (k ) = λk ! e−λ , k = 0, 1, 2, . . . .




2 / 23

Expectations and variances of standard random variables

Constant random variable

Constant random variable Example X (ω) = c for each ω ∈ Ω;

Constant random variable:

i.e. pX (x ) = 1 for x = c , otherwise 0. Expectation and variance: E (X ) =

X

xk pX (xk ) = c P (x = c ) = c

xk

var(X ) = E (X − E (X ))2 = E (c − c )2 = 0. For computations we use: E (c ) = c . . . barycenter of a constant c is the constant c itself

var(c ) = 0 . . . “the width of a graph with a single value c” is 0.




3 / 23


Bernoulli random variable

Bernoulli random variable X (head) = 1 and X (tail) = 0. The occurence of “head” plays a role of “success”. Example Bernoulli random variable: Expectation and variance: E (X ) =

X

2

X

xk

E (X ) =

xk

pX (1) = p ∈ (0, 1)

pX (0) = 1 − p = q

(head, success) (tail, failure)

xk pX (xk ) = 1 · p + 0 · q = p, xk2 pX (xk ) = 12 · p + 02 · q = p,

var(X ) = E (X 2 ) − E (X )2 = p − p2 = p(1 − p) = p q .




4 / 23


Binomial random variable

Binomial random variable Number of successes during n identical and independent repetitions of a Bernoulli experiment (with P (success) = p ). Example Binomial random variable X ∼ Bin(n, p): pX (k ) = P (X = k ) = E (X ) =

X xk

n

pk q n−k ,

k

xk pX (xk ) =

k = 0, 1, . . . , n .

n X n

k

k =0

The sum on the right hand side of EX reminds

Pn

k =0

k pk q n−k .

n k

x k y n−k = (x + y )n , up

to the factor “k ” . in “k pk ” .




5 / 23



Example (continuation) Deriving this sum with respect to x and multiplying by x we get the needed n n n expression: X X d X n k n −k n n x x y =x k x k −1 y n−k = k x k y n −k . dx k k k k =0

k =0

k =0

After two derivatives with regard to x followed by a multiplication by x we get n d X

dx

d

k

dx

n

k =0

x k y n−k =

(x + y )n ,

n X n k =0

k

k x k −1 y n−k = n(x + y )n−1 ,

n X n k =0


k

k x k y n−k = x n(x + y )n−1 .



6 / 23



Example (continuation) Substituting x = p and y = q we get (recall, q = 1 − p =⇒ p + q = 1) E (X ) =

n X n k =0

k

k pk q n−k = p n(p + q )n−1 = n p.

Similarly, the second derivative of the generating function (x + y )n yields n d2 X

dx 2 n X n k =0

k

k =0

d2

k

dx 2

n

x k y n−k =

(x + y )n ,

k (k − 1) x k −2 y n−k = n(n − 1)(x + y )n−2 ,

n X n k =0

k


k (k − 1) x k y n−k = x 2 n(n − 1)(x + y )n−2 .



7 / 23



Example (continuation) Thus (again x = p and y = q with p + q = 1) n X n E (X (X − 1)) = k (k − 1) pk q n−k = p2 n(n − 1)(p + q )n−2= p2 n(n − 1) k =0

k

Hence (recall that E (X ) = n p) E (X (X − 1)) = p2 n(n − 1)

EX 2 − EX = n2 p2 − n p2

EX 2 = n2 p2 − n p2 + EX = n2 p2 − n p2 + n p

EX 2 = (np)2 + n p(1 − p).

Finally, var(X ) = E (X 2 ) − E (X )2 = n p(1 − p) = n p q .

For X ∼ Bin(n, p) thus

EX = n p and varX = n p (1 − p) = n p q . ˇ Roman Kotecky, ´ Rudolf Blaˇzek (FIT CVUT)



8 / 23


Geometric random variable

Geometric random variable X = “order of the experiment when the first “success” occures during identical a independent repetitions of Bernoulli experiments (with P (success) = p ∈ (0, 1) ). Example Geometric random variable: pX (k ) = (1 − p)k −1 p = q k −1 p, k = 1, 2, . . . E (X ) =

X

xk pX (xk ) =

∞ X

k q k −1 p = p

k =1

xk

∞ X

k q k −1 .

k =1

Similarly as in the case of the binomial distribution, the sum on the right hand side of EX reminds the derivative of ∞ d X

dx

k

x =

k =0


d

P∞

k =0

1

dx 1 − x

1 x k = 1− for |x | < 1: x

or

∞ X k =1


k x k −1 =

1

(1 − x )2

.


9 / 23



Example (continuation) Using thus

P∞

k =1

k x k −1 = (1−1x )2 (with q = x and 1 − q = p):

E (X ) =

∞ X

k q k −1 p = p

k =1

∞ X

k q k −1 =

k =1

P∞

k x k −1 = (1−1x )2 =⇒ Derivative with respect to x then yields Multiplying by x we get ∞ d X

dx

k xk =

k =0

d

k =0

x

dx (1 − x )

2

hence

p

1

(1 − q )2 P∞

k =0

∞ X

= . p

k x k = (1−xx )2 .

k 2 x k −1 =

k =0

1+x

(1 − x )3

,

implying E (X 2 ) =

∞ X

k 2 q k −1 p = p

k =1


∞ X k =1

k 2 q k −1 = p ·


1+q

(1 − q )

3

=

2−p p2

.


10 / 23



Example (continuation) Finally thus

var(X ) = E (X 2 ) − E (X )2 =

2−p p2

−

1 p2

=

1−p p2

=

q p2

.

Summarizing, for geometric random variable with P (success) = p ∈ (0, 1) we have

EX =

varX =


1

,

p 1−p p2

=


q p2

.


11 / 23


Poisson random variable

Poisson random variable Poisson probability distribution is often used to model the number of random events during a given time period. For example, X =“number of attempts for a connection with a server within 15 seconds”. Example Poisson random variable with parameter λ > 0: pX (k ) = P (X = k ) = E (X ) =

λk −λ e , k!

k = 0, 1, 2, . . .

X λk X λk −1 k e−λ = λ e−λ k! (k − 1)! k ≥1

= λ e−λ

k ≥1

X λm m ≥0

m!

= λ e−λ eλ

=λ




12 / 23


Poisson random variable

Example (continuation)

X λ k −1 λk −λ e = λe−λ k k! (k − 1)! k ≥1 k ≥1   k −1 k −1 X X λ λ  + = λe−λ  (k − 1) (k − 1)! (k − 1)! k ≥1 k ≥1   X λ k −2 + eλ  = λe−λ λeλ + eλ = λ2 + λ, = λe−λ λ (k − 2)!

E (X 2 ) =

X

k2

k ≥2

and thus var(X ) = EX 2 − (EX )2 = (λ2 + λ) − λ2 = λ. For X ∼ Poisson(λ) thus


E (X ) = var(X ) = λ.



13 / 23

Independence of random variables

Independence revisited

Definition X and Y are independent if {X = x } and {Y = y } are independent for each x

and y.

Lemma If X and Y are independent, then E (XY ) = E (X )E (Y ). Proof. XY =

P

x ,y

xyIAx IBy and independence implies E (XY ) =

X x ,y

=

X x


xyP (Ax ∩ By ) =

X

xP (Ax )

X

xyP (Ax )P (By ) =

x ,y

y P (By ) = E (X )E (Y ).

y



14 / 23


Noncorrelated random variables

Definition Random variables X and Y are noncorrelated if E (XY ) = E (X )E (Y ). Example

Independence implies noncorrelation but not otherwise around: Assume that X , Y ∈ {−1, 0, 1} are such that P (X = i , Y = j ) = 1/4 if (i , j ) ∈ {(0, 1), (0, −1), (1, 0), (−1, 0)} (and it equals 0 in the remaining cases). Then, X and Y have the same probability distributions P (X = 0) = 1/2 and P (X = 1) = P (X = −1) = 1/4, and thus E (XY ) = 0 = E (X )E (Y ) since E (X ) = E (Y ) = 0,

while P (X = 0, Y = 1) = 1/4 6= P (X = 0)P (Y = 1) = 1/8.




15 / 23


Properties of the variance

Theorem For any random variables X and Y and arbitrary a ∈ R, we have: a) var(aX ) = a2 var(X ).

b) var(X + Y ) = var(X ) + var(Y ) if X and Y are noncorrelated. Proof. a) var(aX ) = E ((aX )2 ) − (E (aX ))2 = a2 (E (X 2 ) − (E (X ))2 ). b) It suffices to use E ((X + Y )2 ) = E (X 2 + 2XY + Y 2 ) =

E (X 2 ) + 2E (XY ) + E (Y 2 ) = E (X 2 ) + 2E (X )E (Y ) + E (Y 2 ) and E (X + Y )2 = (E (X ) + E (Y ))2 = E (X )2 + 2E (X )E (Y ) + E (Y )2 .




16 / 23


Uncorrelated random variables in computations

Example (Computation of the expectation of a binomial random variable, revisited) Consider X =

Pn

i =1

Xi with independent Bernoulli variables Xi with parameter p.

Then X has a binomial distribution

n

P (X = k ) =

k

pk (1 − p)n−k .

Thus E (X ) =

X

E (Xi ) = np

i

and

P i

Xi

2

=

P

E (X 2 ) =

i ,j

Xi Xj ,

X i

E (Xi2 ) +

X i 6=j

E (Xi )E (Xj ) = np + n(n − 1)p2 ,

yielding

var(X ) = np(1 − p). ˇ Roman Kotecky, ´ Rudolf Blaˇzek (FIT CVUT)



17 / 23



Example (Reliability) Consider a “network”

1• • 3

•4

2•

Assume that each connection e = (i , j ), i , j ∈ {1, 2, 3, 4} is working with probability pe (and not working with probability 1 − pe ).

What is the reliability

spolehlivost

R (1, 4) = the probability that a path from the node 1 to the node 4 is open? Let Xe be the indicator function that edge e is working and χ the indicator function yielding 1 if a connection π from 1 to 4 exists and 0 otherwise. In general,

χ=1−


Y π

Iπ not working = 1 −


Y π

1−

Y

Xe .

e∈π


18 / 23



Example (continuation) In our case,

χ = 1 − (1 − X1,3 X3,4 )(1 − X1,2 X2,3 X3,4 ) = = X3,4 (X1,3 + X1,2 X2,3 ) − X1,3 X1,2 X2,3 X32,4

and thus

R (1, 4) = P (χ = 1) = E (χ) = p3,4 (p1,3 + p1,2 p2,3 ) − p1,3 p1,2 p2,3 p3,4 . In particular, if the reliability of each edge connection is 90% (i.e., pe = 0.9), we have R (1, 4) = 0.92 + 0.93 − 0.94 ∼ 0.88.




19 / 23

Probability on uncountable spaces


Examples Infinite sequences of zeros and ones

Ω = {0, 1}N = {ω : ω = (ω1 , ω2 , . . . ), ω1 , ω2 , · · · ∈ {0, 1}}

To analyse, e.g., the following claim: A fair coin is tossed repeatedly. Show that, with probability one, a head turns up eventually.

Darts Continuous set of outcomes: T ⊂ R2 . Here Ω = T ∪ {∗}, where {∗} is a one

point set representing the result “dart missed the target”.

Defining probability on an uncountable Ω, one should be careful: one should consider only a subset F ⊂ P (Ω).




20 / 23


Definition Probability is a function P : F → [0, 1] such that (N) P (Ω) = 1,

(A) If A1 , A2 , · · · ∈ F are pairwise disjoint (Ai ∩ Aj = ∅ for i 6= j), then P (∪`≥1 A` ) =

X

P (A` )

`≥1

(N). . . normalisation, (A) . . . σ -additivity. Remark Here, we tacitly assume that A1 , A2 , · · · ∈ F implies ∪`≥1 A` ∈ F . Example For a countable Ω, by choosing p : Ω → [0, 1] such that prob. P is given by taking P (A) =


P

ω∈Ω p (ω)

= 1, the

P

ω∈A p (ω) for any A ∈ P (Ω).



21 / 23


The power set P (Ω) is too large

The power set P (Ω) might be too large Why not P (Ω) instead of F in the definition of probability distribution? Theorem (Vitali, 1905) Let Ω = {0, 1}N . Then there is no function P : P (Ω) → [0, 1] such that it satisfies

the conditions (N), (A), and

(I) for all A ⊂ Ω and n ≥ 1 it is P (Tn A) = P (A).

Here,

Tn : ω = (ω1 , ω2 , . . . ) → (ω1 , . . . , ωn−1 , ω cn , ωn+1 , . . . ), where b 0 = 1, b 1 = 0, and Tn (A) = {Tn (ω) : ω ∈ A}. Main idea of the proof: Define equivalence relation on Ω:

ω ∼ ω 0 iff they differ only in finitely many coordinates. ˇ Roman Kotecky, ´ Rudolf Blaˇzek (FIT CVUT)



22 / 23


The power set P (Ω) is too large

Proof (continuation). Take A containing one ω from each equivalence class (axiom of choice). Consider S = {S ⊂ N : S finite } and TS defined by TS = Tn1 ◦ · · · ◦ Tnk for

S = {n1 , . . . , nk }. Then:

Ω = ∪S∈S TS (A) TS (A) and TS 0 (A) are disjoint for S 6= S 0 .

This implies

1 = P (Ω) =

X

P (TS (A)) =

S ∈S

X

P (A),

S ∈S

which is a contradiction (infinite sum of a number is either 0 or ∞).




23 / 23