My notes on simple Causal Probability

Contents

1Basic Rules of Probability
1.1Probability summation rule
1.2Probability multiplication rule
2Event sets used in examples section
2.1Event Set 1
2.2Event Set 2
3Bayes' Theorem
4Causal Graph
5D-Separation
5.1Type 1 (causal chain)
5.1.1Active Triplets :
5.1.2Inactive Triplets
5.2Type 2 (common cause)
5.2.1Active Triplets (Conditional Independence)
5.2.2Inactive Triplets
5.3Type 3 (common effect OR v-structure)
5.3.1Active Triplets
5.3.2Inactive Triplets (Absolute Independence)
5.4Type 4 (common effect on descendant OR v-structure with a bottom tail)
5.4.1Active Triplets
5.4.2
6Examples
6.1Cancer example
6.1.1Given
6.1.2Directly Inferred
6.1.3Computations for Cancer Example
6.2Two test Cancer example
6.2.1Given
6.2.2Computations for two test cancer example
6.3Example for Converged Conditionals on a Single node
6.3.1Given
6.3.2Computations for converged conditionals on a single node

Basic Rules of Probability

\(P(A), P(B), …\) represents probability of occurrence of events \(A, B, …\)

Probability of any event \(X\) can be calculated as the ratio of the number of chances favorable for event \(X\) to the total number of chances:

i.e.; \(P(X) = \frac{m_X}{n}\)

where, \(n\) is the total number of chances & \(m_X\) is the number of chances favorable for event \(X\)

Probability of any random event lies between \(0\) & \(1\)

i.e.; \(0 \leq P(X) \leq 1\)

Event \(X\) is called practically sure if its probability is not exactly but very close to \(1\)

i.e.; \(P(X) \approx 1\)

Event \(X\) is called practically impossible if its probability is not exactly but very close to \(0\)

i.e.; \(P(X) \approx 0\)

Probability summation rule

The Probability that one of two (or more) mutually exclusive events occurs is equal to the sum of the Probabilities of these events

i.e.; \(P(X or Y) = P(X) + P(Y)\)

if events \(X_1, X_2, …, X_n \) are mutually exclusive and exhaustive, the sum of their Probabilities is equal to \(1\)

i.e.; \(P(X_1) + P(X_2) + … + P(X_n) = 1\)

From the above statement it follows that Probability of an event \(X\) and its opposite(non occurrence) i.e.; \( !X\), is equal to \(1\)

i.e.; \(P(X) + P(!X) =1\)

Probability multiplication rule

The Probability of the combination of two events (simultaneous occurrence) is equal to the Probability of one of them multiplied by the probability of the other provided that the first event has occured.

i.e.; \(P(A and B) = P(A, B) = P(A) P(B | A)\)

where \(P(B | A)\) is called conditional probability of an event \(B\) calculated for the condition that event \(A\) has occurred.

Event sets used in examples section

Event Set 1

\(C\): Event of the occurrence of a specific type of Cancer

\(P(C)\): Probability of occurrence of event C.

\(P(!C) = 1- P(C)\)

\((T = +)\): Event that a test is positive for C. (Sometimes we will represent it with just a \(+\))

\(P(T = -) = 1 -P(T = +)\) OR \(P(-) = 1- P(+)\)

Event Set 2

\(S\): Event that the weather is Sunny.

\(R\): Event that the person gets a raise.

\(H\): Event that the person is happy.

Bayes’ Theorem

\(P(C | +) = \frac{P(C) P(+ | C)}{P(+)} = \frac{P(C) P(+ | C)}{P(C) P(+ |C) + P(!C) P(+ | !C)}\)

where:-

\(P(C | +)\): is Posterior

\(P(+ | C)\): is Likelihood

\(P(C)\): is Prior

\(P(+)\): Marginal Likelihood

Bayes’ Theorem gives us a framework to modify our beliefs in light of new evidences. Bayesian statistics gives us a solid mathematical means of incorporating our prior beliefs and evidence, to produce new posterior beliefs.

(NOTE: I will expand upon this in some future blog post).

Causal Graph

Causal Graphs (also called Causal Bayesian Networks) are directed acyclic graphs which are used to encode assumptions about the Probabilistic data represented by variables.

For example in the above graph, joint Probability represented by Bayes’ Network is given by:

\(P(A, B, C, D, E) = P(A) P(B) P(C | A, B) P(D | C) P(E | C)\)

This example requires 10 values to represent the network. These are:

\(P(A)\)
\(P(B)\)
\(P(C | A, B)\)
\(P(C | !A, B)\)
\(P(C | A, !B)\)
\(P(C | !A, !B)\)
\(P(D | C)\)
\(P(D | !C)\)
\(P(E | C)\)
\(P(E | !C)\)

D-Separation

D-separation is a criterion for deciding, from a given causal graph, whether a variable (OR set of variables) \(X\) is independent of another variable (OR set of variables) \(Y\), given a third variable (OR set of variables) \(Z\). This has to do with path or reachability.

So when we think in terms of types of D-Separation we need to further break it down in terms of Active or Inactive triplets.

(NOTE: In listed types, I have highlighted node(s) in gray color to show involvement.)

Type 1 (causal chain)

Active Triplets :

Given that \(B\) is involved, \(A\) & \(C\) are independent.

i.e.; \(P(C | B, A) = P(C | B)\)

Inactive Triplets

Given that \(B\) is not involved, \(A\) & \(C\) are dependent

i.e.; \(P(C | A) eq P(C)\)

Type 2 (common cause)

Active Triplets (Conditional Independence)

Given that \(A\) is involved, \(B\) & \(C\) become independent (Conditional independence) .

i.e.; \(P(C | A, B) = P(C | A) \)

Remember, Conditional Independence does not guarantee absolute independence

i.e.; \(P(C | B) = P(C | B, A) P( A | B) = P(C | A) P(A | B)\)

Inactive Triplets

Given that \(A\) is not involved, \(B\) & \(C\) become dependent.

i.e.; \(P(C | B) eq P(C)\)

Type 3 (common effect OR v-structure)

Active Triplets

Given that \(C\) is involved, \(A\) & \(B\) become dependent (loose their absolute independence).

i.e.; \(P(A | B, C) eq P(A | C)\)

Also used for probabilistic reasoning called explain away effect.

Inactive Triplets (Absolute Independence)

Given that \(C\) is not involved, \(A\) & \(B\) are independent(Absolute Independence).

i.e.; \(P(A |B ) = P(A)\)

Type 4 (common effect on descendant OR v-structure with a bottom tail)

Type 4 is derived directly from Type 3.

Active Triplets

Given that \(C\) is involved, \(A\) and \(B\) become dependent (loose their absolute independence).

i.e.; \(P(A | B, C) eq P(A | C)\)

Examples

Cancer example

Given

\(P(C) = 0.01\)
\(P(+ | C) = 0.9\)
\(P(+ | !C) = 0.2\)

Directly Inferred

\(P(!C) = 1 – P(C) = 1 – 0.01 = 0.99\)
\(P(- | C) = 1 – P(+ | C) = 1 – 0.9 = 0.1\)
\(P(- | !C) = 1 – P(+ | !C) = 1 – 0.2 = 0.8\)

Computations for Cancer Example

\(P(+, C) = P(C) P(+ | C) = 0.01 * 0.9 = 0.009\)

\(P(+, !C) = P(!C) P(+ | !C) = 0.99 * 0.2 = 0.198\)

\(P(+) = P(+ | C) P(C) \hspace{1mm}+\hspace{1mm} P(+ | !C) P(!C)\)

\(\Rightarrow 0.9 * 0.01 \hspace{1mm}+\hspace{1mm} 0.2 * 0.99 = 0.207\)

\(P(C | +) = \frac{P(C) P(+ | C)}{P(+)}\)

\(\Longrightarrow\) resolution: (Bayes’ Rule)

\(\Rightarrow \frac{P(C) P(+ | C)}{P(C) P(+ | C) \hspace{1mm}+\hspace{1mm} P(!C) P(+ | !C)}\)

\(\Rightarrow \frac{0.01 * 0.9 }{ 0.01 * 0.9 \hspace{1mm}+\hspace{1mm} 0.99 * 0.2} = 0.04347826087\)

Two test Cancer example

Given

\(P(C) = 0.01\) \(\Longrightarrow P(!C) = 0.99\)
\(P(+ | C) = 0.9\) \(\Longrightarrow P(- | C) = 0.1\)
\(P(+ | !C) = 0.2\) \(\Longrightarrow P(- | !C) = 0.8\)

Computations for two test cancer example

\(P(C | T1 = +, T2 = +) = P(C | +_1, +_2)\)

\(\Rightarrow \frac{P(+_1, +_2 | C) P(C)}{P(+_1, +_2 | C) P(C) \hspace{1mm}+\hspace{1mm} P(+_1, +_2 | !C) P(!C)}\)

\(\Rightarrow \frac{P(+_1 |+_2, C) P( +_2 | C) P(C)}{P(+_1 |+_2, C) P(+_2 | C) P(C) \hspace{1mm}+\hspace{1mm} P(+_1 |+_2, !C) P(+_2 | !C) P(!C)}\)

\(\Rightarrow \frac{P(+_1 | C) P(+_2 | C) P(C)}{P(+_1 | C) P(+_2 | C) P(C) \hspace{1mm}+\hspace{1mm} P(+_1 | !C) P(+_2 | !C) P(!C)}\)

\(\Longrightarrow\) resolution: \(P( +_1 | +_2, C) = P(+_1 | C)\) and \(P( +_1 | +_2, !C) = P(+_1 | !C)\) (Since C is the root of both \(+_1\) and \(+_2\), they become independent)

\(\Rightarrow \frac{0.9 * 0.9 * 0.01}{0.9 * 0.9 * 0.01 \hspace{1mm}+\hspace{1mm} 0.2 * 0.2 * 0.99} = 0.1698113208\)

\(P(C | T1 = +, T2 = -) = P(C | +,-)\)

\(\Rightarrow \frac{P(+, – | C) P(C)}{P(+, – | C) P(C) \hspace{1mm}+\hspace{1mm} P(+, – | !C) P(!C)}\)

\(\Rightarrow \frac{P(+ |-, C) P( – | C) P(C)}{P(+ |-, C) P(- | C) P(C) \hspace{1mm}+\hspace{1mm} P(+ |-, !C) P(- | !C) P(!C)}\)

\(\Rightarrow \frac{P(+ | C) P(- | C) P(C)}{P(+ | C) P(- | C) P(C) \hspace{1mm}+\hspace{1mm} P(+ | !C) P(- | !C) P(!C)}\)

\(\Longrightarrow\) resolution: \(P( + | -, C) = P(+ | C)\) and \(P( + | -, !C) = P(+ | !C)\) (Since C is the root of both \(+\) and \(-\), they become independent)

\(\Rightarrow \frac{0.9 * 0.1 * 0.01}{0.9 * 0.1 * 0.01 \hspace{1mm}+\hspace{1mm} 0.2 * 0.8 * 0.99} = 0.00564971\)

\(P(T2 = + | T1 = +) = P(+_2 | +_1)\)

\(\Rightarrow P(+_2 | +_1, C) P(C | +_1) \hspace{1mm}+\hspace{1mm} P(+_2 | +_1, !C) P(!C | +_1)\)

\(\Rightarrow P(+_2 | C) P(C | +_1) \hspace{1mm}+\hspace{1mm} P(+_2 | !C) P(!C | +_1)\)

\(\Longrightarrow\) resolution: \(P(+_2 | +_1, C) = P(+_2 | C)\) (since both \(+_1\) and \(+_2\) are conditionally independent in presence of \(C\))

\(\Rightarrow P(+_2 | C) \frac{P(C) P(+_1 | C)}{P(C) P(+_1 | C) \hspace{1mm}+\hspace{1mm} P(!C) P(+_1 | !C)} \hspace{1mm}+\hspace{1mm} P(+_2 | !C) \frac{P(!C) P(+_1 | !C)}{P(C) P(+_1 | C) \hspace{1mm}+\hspace{1mm} P(!C) P(+_1 | !C)}\)

\(\Rightarrow 0.9 * 0.04347826087 \hspace{1mm}+\hspace{1mm} 0.2 * 0.95652173913 = 0.23043478260\)

Example for Converged Conditionals on a Single node

Given

\(P(S) = 0.7\)
\(P(R) = 0.01\)
\(P(H | S, R) = 1\)
\(P( H | !S, R) = 0.9\)
\(P(H | S, !R) = 0.7\)
\(P(H | S, !R) = 0.7\)

Computations for converged conditionals on a single node

\(P(R | S) = P(R) = 0.01\)

\(\Longrightarrow\) resolution: (Absolute Independence)

\(P(R | H, S) = \frac{P(R) P(H, S | R)}{P(H,S)}\)

\(\Rightarrow \frac{P(R) P(H | S ,R) P(S | R)}{P(H | S)P(S)}\)

\(\Rightarrow \frac{P(R) P(H | S ,R) P(S)}{P(H | S) P(S)}\)

\(\Longrightarrow\) resolution: \(P(S | R) = P(S)\) (Since \(S\) and \(R\) are conditionally independent in absence of \(H\) (absolute independence) )

\(\Rightarrow \frac{P(R) P(H | S ,R)}{P(H | S)}\)

\(\Rightarrow \frac{P(R) P(H | S ,R)}{P(R) P(H | S, R) \hspace{1mm}+\hspace{1mm} P(!R) P(H | S, !R)}\)

\(\Rightarrow \frac{1 * 0.01}{1 * 0.01 \hspace{1mm}+\hspace{1mm} 0.7 * 0.99} = 0.01422475\)

\(P(R | H) = \frac{P(R)P(H|R)}{P(H)}\)

\(\Rightarrow \frac{P(R)P(H|R)}{P(R)P(H|R) \hspace{1mm}+\hspace{1mm} P(!R)P(H|!R)}\)

\(\Rightarrow \frac{P(R)(\hspace{1mm}P(S)P(H|S,R)\hspace{1mm}+\hspace{1mm}P(!S)P(H|!S,R)\hspace{1mm})}{P(R)(\hspace{1mm}P(S)P(H|S,R)\hspace{1mm}+\hspace{1mm}P(!S)P(H|!S,R)\hspace{1mm}) \hspace{1mm}+\hspace{1mm} P(!R)(\hspace{1mm}P(S)P(H|S,!R)\hspace{1mm}+\hspace{1mm}P(!S)P(H|!S,!R)\hspace{1mm})}\)

\(\Rightarrow \frac{0.01 * 0.97}{0.01 * 0.97 + 0.99 * 0.52} = 0.01849380\)

2 Replies to “My notes on simple Causal Probability”

My notes on simple Causal Probability – moebiuscurve http://moebius… | Moebiuscurve's Blog says:

November 19, 2016 at 11:50 am

[…] My notes on simple Causal Probability – moebiuscurve http://moebiuscurve.com/my-notes-on-simple-causal-probability/ […]
tuxdna says:

June 17, 2017 at 3:22 pm

In you last example “Example for Converged Conditionals on a Single node”, the given probability for “P(H|S,!R)=0.7” should instead read “P(H|!S,!R)=0.1”

Also it will be great if you could explain how in the two test cancer example, P(+1,+2|C) expands to P(+1|+2,C)P(+2|C)

Great article nonetheless.

Comments are closed.

moebiuscurve

My notes on simple Causal Probability

Basic Rules of Probability

Probability summation rule

Probability multiplication rule

Event sets used in examples section

Event Set 1

Event Set 2

Bayes’ Theorem

Causal Graph

D-Separation

Type 1 (causal chain)

Active Triplets :

Inactive Triplets

Type 2 (common cause)

Active Triplets (Conditional Independence)

Inactive Triplets

Type 3 (common effect OR v-structure)

Active Triplets

Inactive Triplets (Absolute Independence)

Type 4 (common effect on descendant OR v-structure with a bottom tail)

Active Triplets

Examples

Cancer example

Given

Directly Inferred

Computations for Cancer Example

Two test Cancer example

Given

Computations for two test cancer example

Example for Converged Conditionals on a Single node

Given

Computations for converged conditionals on a single node

2 Replies to “My notes on simple Causal Probability”

Categories