Table of contents {: .text-delta }
  1. Before you Begin
  2. Discrete and Continuous Variables
    1. Discrete Variables and Notations
    2. Continuous Random Variables
    3. Get Absolute Probability from PDF
    4. Understanding p(x) PDF
    5. Can the value of p(x) > 1?
  3. Joint and Contional Probability
    1. Joint Probability
    2. Conditional Probability
  4. Marginals and Conditionals
    1. Crux of the Matter:
    2. Small Leap
  5. Bayes Theorem

Before you Begin

Ref Book. Probabalistic Robotics

Discrete and Continuous Variables

Discrete Variables and Notations

Here the value X in P(X) can take on any value X=x_i, just that x_i are discrete points

However, keep in my that we formally call the below function Probability Mass Function

Continuous Random Variables

Here the value of X in p(X) can take on a continuous variable X=x, where x is a smooth function

Now, here we use lower_case p to denote the p(x) since in the continuous probability world, we cannot speak in terms of absolute probability, but in terms of a density function:

Get Absolute Probability from PDF

Understanding p(x) PDF

As seen above, only the integration (area under curve) gives us the absolute probability.
Therefore, this p(x) must be a curve of sorts, something like this:

Can the value of p(x) > 1?

Yes. This is because p(x) is a PDF not absolute probability.
Consider the example of a proximity sensor whose readings only range from 0m - 0.5m. The
PDF for such a sensor would look like the below graph

Joint and Contional Probability

Joint Probability

Note. The calculation of absolute probability will change depending upon the nature of the variables:

Conditional Probability

Marginals and Conditionals

To start, lets get an intuition on what a marginal or conditional may look like:

  • Let’s consider a multivariate probability distribution (i.e there are say 2 random variables)
  • Let us consider these two variables to have their own distributions
  • Let these two distributions be exam grades and study time
  • Imagine exam_grades are distributed along y-axis, and study_time along x-axis (sorry for asking you to imagine this much :/)
  • Let the z-axis be a joint probability of both x and y
  • Now, combining everything we should have a 3D plot

If we view this plot from the top view, we should see something like this:

Crux of the Matter:

  • Think of conditionals as taking a slice of this cloud and evaluating distribution of exam grades given a specific study time
  • Think of marginals as squishing the cloud (say squishing all study-time data onto the exam-grades axis) and then studying the distribution

Small Leap

Now that you’ve understood the intuition behind marginals, here’s the math

Bayes Theorem