Before you Begin

Ref Book. Probabalistic Robotics

Discrete and Continuous Variables

Discrete Variables and Notations

Here the value X in P(X) can take on any value X=x_i, just that x_i are discrete points

However, keep in my that we formally call the below function Probability Mass Function

Continuous Random Variables

Here the value of X in p(X) can take on a continuous variable X=x, where x is a smooth function

Now, here we use lower_case p to denote the p(x) since in the continuous probability world, we cannot speak in terms of absolute probability, but in terms of a density function:

Get Absolute Probability from PDF

Understanding p(x) PDF

As seen above, only the integration (area under curve) gives us the absolute probability.
Therefore, this p(x) must be a curve of sorts, something like this:

Can the value of p(x) > 1?

Yes. This is because p(x) is a PDF not absolute probability.
Consider the example of a proximity sensor whose readings only range from 0m - 0.5m. The
PDF for such a sensor would look like the below graph

Joint and Contional Probability

Joint Probability

Note. The calculation of absolute probability will change depending upon the nature of the variables:

Conditional Probability

Marginals and Conditionals

To start, lets get an intuition on what a marginal or conditional may look like:

Let’s consider a multivariate probability distribution (i.e there are say 2 random variables)
Let us consider these two variables to have their own distributions
Let these two distributions be exam grades and study time
Imagine exam_grades are distributed along y-axis, and study_time along x-axis (sorry for asking you to imagine this much :/)
Let the z-axis be a joint probability of both x and y
Now, combining everything we should have a 3D plot

If we view this plot from the top view, we should see something like this:

Crux of the Matter:

Think of conditionals as taking a slice of this cloud and evaluating distribution of exam grades given a specific study time
Think of marginals as squishing the cloud (say squishing all study-time data onto the exam-grades axis) and then studying the distribution

Small Leap

Now that you’ve understood the intuition behind marginals, here’s the math