Table of contents {: .text-delta }
Before you Begin
Ref Book. Probabalistic Robotics
Discrete and Continuous Variables
Discrete Variables and Notations
Here the value X in P(X) can take on any value X=x_i, just that x_i are discrete points
However, keep in my that we formally call the below function Probability Mass Function
Continuous Random Variables
Here the value of X in p(X) can take on a continuous variable X=x, where x is a smooth function
Now, here we use lower_case p
to denote the p(x)
since in the continuous probability world, we cannot speak in terms of absolute probability, but in terms of a density function:
Get Absolute Probability from PDF
Understanding p(x) PDF
As seen above, only the integration (area under curve) gives us the absolute probability.
Therefore, this p(x) must be a curve of sorts, something like this:
Can the value of p(x) > 1?
Yes. This is because p(x) is a PDF not absolute probability.
Consider the example of a proximity sensor whose readings only range from 0m - 0.5m. The
PDF for such a sensor would look like the below graph
Joint and Contional Probability
Joint Probability
Note. The calculation of absolute probability will change depending upon the nature of the variables:
Conditional Probability
Marginals and Conditionals
To start, lets get an intuition on what a marginal or conditional may look like:
- Let’s consider a multivariate probability distribution (i.e there are say 2 random variables)
- Let us consider these two variables to have their own distributions
- Let these two distributions be exam grades and study time
- Imagine exam_grades are distributed along y-axis, and study_time along x-axis (sorry for asking you to imagine this much :/)
- Let the z-axis be a joint probability of both x and y
- Now, combining everything we should have a 3D plot
If we view this plot from the top view, we should see something like this:
Crux of the Matter:
- Think of conditionals as taking a slice of this cloud and evaluating distribution of exam grades given a specific study time
- Think of marginals as squishing the cloud (say squishing all study-time data onto the exam-grades axis) and then studying the distribution
Small Leap
Now that you’ve understood the intuition behind marginals, here’s the math