Table of contents

Before you Begin
PDFs
Uses of Mutli-View Geometry
Background: Structure from Motion (SFM)
Theory: Solving for camera parameters in the presence of scale ambiguity
1. Solving for Camera Params: Direct Linear Transform
2. Actual Derivation
Theory: Epipolar Geometry
1. Essential Matrix: Maps a point -> to a line
2. Fundamental Matrix
Structure From Motion Step 1: Estimating Fundamental Matrix:
1. Estimate Essential Matrix from Fundamental Matrix (given K1 and K2)
Triangulation
Bundle Adjustment
1. High level procedure
  1. RodriguesResidual: rodriguesResidual(x, K1, M1, p1, K2, p2)
2. Optimization of M2
Final Pipeline Including RANSAC
1. Effects of RANSAC
Tracking Real World Objects in 3D

Before you Begin

Reference Book 1 Reference Book 2

PDFs

Assignment Questionnaire

My Answers

Uses of Mutli-View Geometry

Background: Structure from Motion (SFM)

The basis for classical 3D vision is viewing an object from multiple views to give 3D understanding.

In humans, we don’t have to move to perceive 3D because our eyes(cameras) are already seperated by a fixed distance. This is the same as having one camera move by a fixed distance. Hence, we see “stereo depth vision”

Theory: Solving for camera parameters in the presence of scale ambiguity

Remember we had the following equation:

The lambda above is account for a scale factor which is ambiguous. This equation can also be written as:

Solving for Camera Params: Direct Linear Transform

Now, the above equation is a similarity equation. To solve the equation we make use of a neat trick called Direct Linear Transform

PX should give the same ray (vector) as x, hence their cross product would be zero

Actual Derivation

Theory: Epipolar Geometry

Simply put, epipolar geometry maps a point in one view, to a line in another view
Epipolar Geometry is purely determined by camera intrinsics and camera extrinsics

Essential Matrix: Maps a point -> to a line

Derivation of Essential Matrix: Longuet Higgins

Recall the skew-symmetric form of a matrix can encode cross products
We can use this to show how three vectors can define the volume of a parallelpiped:
Now, given a calibrated camera (i.e. known intrinsics) we can capture a 3D point in two views

Longuet Higgins Derivation

The above derivation tells us in simplicity:

The volume of the parallelpiped (as seen previously is defined by a.(b x c)) is zero
This means that three vectors are in one plane
Which makes sense since the epipolar plane is what connects the 2 camera centers and the 3D point

Difference Between Essential Matrix and Homography Matrix

How does this Essential Matrix map a point to a line (where is the math?)

Fundamental Matrix

Structure From Motion Step 1: Estimating Fundamental Matrix:

We are given two images of the same object from two different views:

'''
Q2.1: Eight Point Algorithm
    Input:  pts1, Nx2 Matrix
            pts2, Nx2 Matrix
            M, a scalar parameter computed as max (imwidth, imheight)
    Output: F, the fundamental matrix

    HINTS:
    (1) Normalize the input pts1 and pts2 using the matrix T.
    (2) Setup the eight point algorithm's equation.
    (3) Solve for the least square solution using SVD. 
    (4) Use the function `_singularize` (provided) to enforce the singularity condition. 
    (5) Use the function `refineF` (provided) to refine the computed fundamental matrix. 
        (Remember to usethe normalized points instead of the original points)
    (6) Unscale the fundamental matrix
'''


def eightpoint(pts1, pts2, M):
    """
    Compute the normalized coordinates
    and also the fundamental matrix using computeH

    Args:
        x1 (Mx2): the matched locations of corners in img1
        x2 (Mx2): the matched locations of corners in img2

    Returns:
        F2to1: Fundamental matrix after denormalization
    """
    # Compute the centroid of the points
    x1, x2 = pts1, pts2

    # Doing the M normaliazation
    moved_scaled_x1 = x1/M
    moved_scaled_x2 = x2/M

    t = np.diag([1/M, 1/M, 1])

    # Compute Fundamental Matrix
    F = computeF(moved_scaled_x1, moved_scaled_x2)

    # Refine and then enforce singularity constraint
    F = _singularize(F)
    F = refineF(F, moved_scaled_x1, moved_scaled_x2)

    # Denormalization
    F2to1 = np.matmul(t.T, (F @ t))
    F2to1 = F2to1/F2to1[2,2]

    return F2to1


def computeF(x1, x2):
    """
    Computes the fundamental based on 
    matching points in both images

    Args:
        x1: keypoints in image 1
        x2: keypoints in image 2

    Returns:
        H2to1: the fundamental matrix
    """

    # Define a dummy H matrix
    A_build = []
    
    # Define the A matrix for (Ah = 0) (A matrix size = N*2 x 9)
    for i in range(x1.shape[0]):
        row_1 = np.array([ x2[i,0]*x1[i,0], x2[i,0]*x1[i,1], x2[i,0], x2[i,1]*x1[i,0], x2[i,1]*x1[i,1], x2[i,1], x1[i,0], x1[i,1], 1])
        A_build.append(row_1)
    
    A = np.stack(A_build, axis=0)

    # Do the least squares minimization to get the homography matrix
    # this is done as eigenvector coresponding to smallest eigen value of A`A = H matrix
    u, s, v = np.linalg.svd(A)

    # here the linalg.svd gives v_transpose
    # but we need just V therefore we again transpose
    F2to1 = np.reshape(v.T[:,-1], (3,3))
    return F2to1


def check_and_create_directory(dir_path, create):
    """
    Checks for existing directories and creates if unavailable

    [input]
    * dir_path : path to be checked
    * create   : tag to specify only checking path or also creating path
    """
    if create == 1:
        if not os.path.exists(dir_path):
            os.makedirs(dir_path)
    else:
        if not os.path.exists(dir_path):
            warnings.warn(f'following path could not be found: {dir_path}')



if __name__ == "__main__":
        
    correspondence = np.load('data/some_corresp.npz') # Loading correspondences
    intrinsics = np.load('data/intrinsics.npz') # Loading the intrinscis of the camera
    K1, K2 = intrinsics['K1'], intrinsics['K2']
    pts1, pts2 = correspondence['pts1'], correspondence['pts2']
    im1 = plt.imread('data/im1.png')
    im2 = plt.imread('data/im2.png')

    F = eightpoint(pts1, pts2, M=np.max([*im1.shape, *im2.shape]))
    print("the fundamental matrix found was \n", F)

    # Q2.1
    out_dir = "/home/sush/CMU/Assignment_Sem_1/CV_A/Assignment_4/code/outputs"
    check_and_create_directory(out_dir, create=1)
    np.savez_compressed(
                        os.path.join(out_dir, 'q2_1.npz'),
                        F,
                        np.max([*im1.shape, *im2.shape])
                        )

    displayEpipolarF(im1, im2, F)

Output:

Estimate Essential Matrix from Fundamental Matrix (given K1 and K2)

'''
Q3.1: Compute the essential matrix E.
    Input:  F, fundamental matrix
            K1, internal camera calibration matrix of camera 1
            K2, internal camera calibration matrix of camera 2
    Output: E, the essential matrix
'''
def essentialMatrix(F, K1, K2):
    E = (K2.T @ F) @ K1
    E = E/E[2,2]
    print("rank of E is", np.linalg.matrix_rank(E))
    return E


if __name__ == "__main__":

    correspondence = np.load('data/some_corresp.npz') # Loading correspondences
    intrinsics = np.load('data/intrinsics.npz') # Loading the intrinscis of the camera
    K1, K2 = intrinsics['K1'], intrinsics['K2']
    pts1, pts2 = correspondence['pts1'], correspondence['pts2']
    im1 = plt.imread('data/im1.png')
    im2 = plt.imread('data/im2.png')


    # ----- TODO -----
    # YOUR CODE HERE

    F = eightpoint(pts1, pts2, M=np.max([*im1.shape, *im2.shape]))
    E = essentialMatrix(F, K1, K2)
    print("E is \n", E)

Triangulation

triangulate3D

Here we fix one camera (extrinsic matrix = Identity matrix). Now, we know correspondence points in image1 and image2.
Using that we found the Fundamental Matrix

Now, to estimate the 3D location of these points, we need

Camera matrices (extrinsic*intrinsic) for both cameras M1 and M2
Image points (x,y) which correspond to each other
Direct Linear Transform

DLT was mentioned above, but small recap:

After finding the 3D points, we will reproject them back onto the image and compare them with our original correspondence points (which we either manually selected or got from some keypoint detector like ORB or BRIEF)

The formula for reprojection error in this case is:

def triangulate(C1, pts1, C2, pts2):
    """
    Find the 3D coords of the keypoints

    We are given camera matrices and 2D correspondences.
    We can therefore find the 3D points (refer L17 (Camera Models) of CV slides)

    Note. We can't just use x = PX to compute the 3D point X because of scale ambiguity
          i.e the ambiguity can be rep. as x = alpha*Px (we cannot find alpha)
          Therefore we need to do DLT just like the case of homography 
          (see L14 (2D transforms) CVB slide 61)

    Args:
        C1   : the 3x4 camera matrix of camera 1
        pts1 : img coords of keypoints in camera 1 (Nx2)
        C2   : the 3x4 camera matrix of camera 2
        pts2 : img coords of keypoints in camera 2 (Nx2)

    Returns:
        P    : the estimated 3D point for the given pair of keypoint correspondences
        err  : the reprojection error
    """
    P = np.zeros(shape=(1,3))
    err = 0

    for i in range(len(pts1)):
        # get the camera 1 matrix
        p1_1 = C1[0,:]
        p2_1 = C1[1,:]
        p3_1 = C1[2,:]

        # get the camera 2 matrix
        p1_2 = C2[0,:]
        p2_2 = C2[1,:]
        p3_2 = C2[2,:]

        x, y = pts1[i,0], pts1[i,1]
        x2, y2 = pts2[i,0], pts2[i,1]

        # calculate the A matrix for this point correspondence
        A = np.array([y*p3_1 - p2_1 , p1_1 - x*p3_1 , y2*p3_2 - p2_2 , p1_2 - x2*p3_2])
        u, s, v = np.linalg.svd(A)

        # here the linalg.svd gives v_transpose
        # but we need just V therefore we again transpose
        X = v.T[:,-1]
        # print("X is", X)
        X = X.T
        X = np.expand_dims(X,axis=0)
        # print("X after transpose and expand is", X)
        
        # convert X to homogenous coords
        X = X/X[0,3]
        # print("X after normalizing is", X)

        P = np.concatenate((P, X[:,0:3]), axis=0)
        
        X = X.T

        # find the error for this projection
        # 3x1 = 3x4 . 3x1 
        pt_1 = ((C1 @ X)/(C1 @ X)[2,0])[0:2,0]
        pt_2 = ((C2 @ X)/(C2 @ X)[2,0])[0:2,0]

        # calculate the reporjection error
        err += np.linalg.norm(pt_1 - pts1[i,:])**2 + np.linalg.norm(pt_2 - pts2[i,:])**2

    print("error in this iteration is", err)
    P = P[1:,:]
    return P, err

Summary

Given two camera matrices and keypoint correspondences for two views, we triangulated the point (found 3D point)
We found the reprojection error for this estimated 3D point

Using Triangulate to Find Second Camera Matrix after fixing First Camera Matrix = Identity

Previsously we saw that we need an M2 to triangulate, but we don’t have an M2 yet :/.
However, since our first camera is fixed (identity) we can find the camera matrix M2 of our second camera as:

def camera2(E):
    U,S,V = np.linalg.svd(E)
    m = S[:2].mean()
    E = U.dot(np.array([[m,0,0], [0,m,0], [0,0,0]])).dot(V)
    U,S,V = np.linalg.svd(E)
    W = np.array([[0,-1,0], [1,0,0], [0,0,1]])

    if np.linalg.det(U.dot(W).dot(V))<0:
        W = -W

    M2s = np.zeros([3,4,4])
    M2s[:,:,0] = np.concatenate([U.dot(W).dot(V), U[:,2].reshape([-1, 1])/abs(U[:,2]).max()], axis=1)
    M2s[:,:,1] = np.concatenate([U.dot(W).dot(V), -U[:,2].reshape([-1, 1])/abs(U[:,2]).max()], axis=1)
    M2s[:,:,2] = np.concatenate([U.dot(W.T).dot(V), U[:,2].reshape([-1, 1])/abs(U[:,2]).max()], axis=1)
    M2s[:,:,3] = np.concatenate([U.dot(W.T).dot(V), -U[:,2].reshape([-1, 1])/abs(U[:,2]).max()], axis=1)
    return M2s

Note: The above function gives Four possible values for M2

Why the 4 options?

4 camera orientations

Now there are 2 checks we can use to find which is the right camera:

Determinant(Rotation component of M2) = 1 (so that the rotation belongs to SO(3))
All Z values should be positive (i.e. the 3D point should be in front of both the cameras right?)

Combining the above two functions

Now we have point correspondences, M1 and 4 M2’s. Therefore we’ll try to triangulate points based on the correct criteria for camera orientations. Additionally we’ll also try to minimize reprojection error:

# iterate over M1(fixed) and M2(4 possibilites) by passing them to triangulate
    for i in range(M2.shape[2]):
        M2_current = M2[:,:,i]

        # build the C1 and C2:
        pts_in_3d, err = triangulate((K1 @ M1), pts1, (K2 @ M2_current), pts2)    
        if err < err_min and (np.where(pts_in_3d[:,2] < 0)[0].shape[0] == 0):
            print("satisfies the error criteria")
            err_min = err
            best_M2_i = i
            best_pts_3d = pts_in_3d

    if (best_M2_i is not None) and (best_pts_3d is not None):
        print("min err is", err_min)
        
        # return M2, C2, w(3d points), M1, C1
        return M2[:,:,best_M2_i], (K2 @ M2[:,:,best_M2_i]), best_pts_3d, M1, (K1 @ M1) # last entry is C1

Finally we all together have:

our best_3d_points
correct M2 matrix

Results of Triangulation on Input Images

Bundle Adjustment

We know that the error in the triangulation is basically difference between the projection of a 3D point and the actual point in 2D on the image. Now, we will move around the 3D points slightly and check in which orientation the reprojection error comes to a global minimum. The formula for the above operation is shown below:

The process we will follow now is very code specific. An explanation for only this below code is shown, where we will only be minimizing the rotation and translation (M2 matrix) error.

High level procedure

Use the 2D point correspondences to find the Fundamental Matrix (along with RANSAC to find the inlier points)
Use the inliers to find our best F (fundamental matrix)
Compute an initial guess for M2 by using our old findM2 function
Now, the above function would have given us 3D points (P_init) and an M2_init
Now, we have compiled the following:
- M1 and K1
- M2_init and K2
- F and E (E = (K2.T @ F) @ K1)

Having the above content, we will need to derive our reprojection error. We will do this in the RodriguesResidual function:

RodriguesResidual: rodriguesResidual(x, K1, M1, p1, K2, p2)

x basically contains the translation and rotation of camera2. We can therefore get M2 from x
We can find the camera matrices C1 = K1 @ M1, C2 = K2 @ M2

Use the above equation to get p1’ and p2’
Compare p1’ and p1, p2’ and p2, to get the reprojection error we need in both cameras

Now we have a function which will give us reprojection error for a given M2 matrix. Now lets see how we’ll use this reporjection error to optimize our M2

Optimization of M2

Now that we have a function which will give us reprojection error for any given M2, lets minimize this error by moving around our 3D points slightly such that our reprojection error (for all points cumulative) reduces

We do this using the scipy.optimize.minimize function

# just some repackaging/preprocessing to give x to rodriguesResidual
x0 = P_init.flatten()
x0 = np.append(x0, r2_0.flatten())
x0 = np.append(x0, t2_0.flatten())

# optimization step
x_opt, _ = scipy.optimize.minimze(rodriguesResidual, x0, args=(K1, M1, p1, K2, p2))

Finally our x_opt i.e x_optimal will have the correct rotation and translation of camera 2 and the corrected 3D points

'''
Q5.3 Bundle adjustment.
    Input:  K1, the intrinsics of camera 1
            M1, the extrinsics of camera 1
            p1, the 2D coordinates of points in image 1
            K2,  the intrinsics of camera 2
            M2_init, the initial extrinsics of camera 1 (get this from findM2)
            p2, the 2D coordinates of points in image 2
            P_init, the initial 3D coordinates of points (get this also from findM2)
    Output: M2, the optimized extrinsics of camera 1
            P2, the optimized 3D coordinates of points
            o1, the starting objective function value with the initial input
            o2, the ending objective function value after bundle adjustment

    Hints:
    (1) Use the scipy.optimize.minimize function to minimize the objective function, rodriguesResidual. 
        You can try different (method='..') in scipy.optimize.minimize for best results. 
'''
def bundleAdjustment(K1, M1, p1, K2, M2_init, p2, P_init):
    
    # given M2_init decompose it into R and t
    R2 = M2_init[:,0:3]
    t2 = M2_init[:,3]
    r2 = invRodrigues(R2)

    x_start = P_init.flatten()
    x_start = np.append(x_start, np.append(r2.flatten(), t2))

    obj_start = rodriguesResidual(x_start, K1, M1, p1, K2, p2)
    print("x_start shape is", x_start.shape)

    # optimization step
    from scipy.optimize import minimize
    x_optimized_obj = minimize(residual_norm, x_start, args=(K1, M1, p1, K2, p2), method='Powell')
    print("x_end shape is", x_optimized_obj.x.shape)
    x_optimized = x_optimized_obj.x

    obj_end = rodriguesResidual(x_optimized, K1, M1, p1, K2, p2)

    # recompute the M2 and P
    # decompose x
    P_final = x_optimized[0:-6]
    P_shape_req = int(P_final.shape[0]/3)
    P_final = np.reshape(P_final, newshape=(P_shape_req,3))
    
    r2_final = x_optimized[-6:-3]
    # reshape to 3x1 to feed to inverse rodrigues
    r2_final = r2_final.reshape(3,1)

    # reshape translation matrix to combine in transformation matrix
    t2_final = x_optimized[-3:].reshape(3,1)

    # compose the C2 matrix
    R2_final = rodrigues(r2_final)
    M2_final = np.hstack((R2_final, t2_final))
    
    return M2_final, P_final, obj_start, obj_end

Results on optimizing points after bundle adjustment

Final Pipeline Including RANSAC

if __name__ == "__main__":
              
    np.random.seed(1) #Added for testing, can be commented out

    some_corresp_noisy = np.load('data/some_corresp_noisy.npz') # Loading correspondences
    intrinsics = np.load('data/intrinsics.npz') # Loading the intrinscis of the camera
    K1, K2 = intrinsics['K1'], intrinsics['K2']
    noisy_pts1, noisy_pts2 = some_corresp_noisy['pts1'], some_corresp_noisy['pts2']
    im1 = plt.imread('data/im1.png')
    im2 = plt.imread('data/im2.png')

    templeCoords = np.load('data/templeCoords.npz')
    temple_pts1 = np.hstack([templeCoords["x1"], templeCoords["y1"]])

    #? getting the F matrix from noisy correspondences
    M = np.max([*im1.shape, *im2.shape])

    F, inliers = ransacF(noisy_pts1, noisy_pts2, M, im1, im2)
    inlier_pts1, inlier_pts2 = inliers[0], inliers[1]

    print("shape of noisy_pts1 is", noisy_pts1.shape)
    print("shape of inlier_pts1 is", inlier_pts1.shape)

    F_naieve = eightpoint(noisy_pts1, noisy_pts2, M)

    # use displayEpipolarF to compare how ransac_F and naieve_F behave
    # displayEpipolarF(im1, im2, F)
    # displayEpipolarF(im1, im2, F_naieve)

    # Simple Tests to verify your implementation:
    from scipy.spatial.transform import Rotation as sRot
    rotVec = sRot.random()
    mat = rodrigues(rotVec.as_rotvec())
    assert(np.linalg.norm(rotVec.as_rotvec() - invRodrigues(mat)) < 1e-3)
    assert(np.linalg.norm(rotVec.as_matrix() - mat) < 1e-3)

    #? Getting the initial guess for M2 and P
    # Assuming the rotation and translation of camera1 is zero
    M1 = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0]])
    M2_init, C2, P_init, M1, C1 = findM2(F, inlier_pts1, inlier_pts2, intrinsics)
    print("M2 shape is", M2_init)

    # Optimize the M2
    M2_final, P_final, start_obj, end_obj = bundleAdjustment(
                                                            K1, 
                                                            M1, 
                                                            inlier_pts1, 
                                                            K2, 
                                                            M2_init, 
                                                            inlier_pts2, 
                                                            P_init
                                                            )

    print("error before optimization is", np.linalg.norm(start_obj)**2)
    print("error after optimization is", np.linalg.norm(end_obj)**2)
    
    # compare the old M2 to optimized M2
    plot_3D_dual(P_init, P_final)

Effects of RANSAC

RANSAC was used even before bundle adjustment, to remove noisy coorespondences for the initial best gues of the Fundamental Matrix.

def ransacF(pts1, pts2, M, im1, im2, nIters=100, tol=10):
    """
    Every iteration we init a Fundamental matrix using 4 corresponding
    points and calculate number of inliers. Finally use the Homography
    matrix which had max number of inliers (and these inliers as well)
    to find the final Fundamental matrix
    Args:
        pts1: location of matched points in image1
        pts2: location of matched points in image2
        opts: user inputs used for distance tolerance in ransac

    Returns:
        bestH2to1     : The Fundamental matrix with max number of inliers
        inlier_points : Final list of inliers found for best RANSAC iteration
    """
    max_iters = nIters # the number of iterations to run RANSAC for
    inlier_tol = tol # the tolerance value for considering a point to be an inlier
    locs1 = pts1
    locs2 = pts2

    # define size of both locs1 and locs2
    num_rows = locs1.shape[0]

    # define a container for keeping track of inlier counts
    final_inlier_count = 0
    final_distance_error = 10000000000

    #? Create a boolean vector of length N where 1 = inlier and 0 = outlier
    print("Computing RANSAC")
    for i in range(max_iters):
        test_locs1 = deepcopy(locs1)
        test_locs2 = deepcopy(locs2)
        # chose a random sample of 4 points to find H
        rand_index = []
        
        rand_index = random.sample(range(int(locs1.shape[0])), k=8)
        
        rand_points_1 = []
        rand_points_2 = []
        
        for j in rand_index:
            rand_points_1.append(locs1[j,:])
            rand_points_2.append(locs2[j,:])
        
        test_locs1 = np.delete(test_locs1, rand_index, axis=0)
        test_locs2 = np.delete(test_locs2, rand_index, axis=0)
            
        correspondence_points_1 = np.vstack(rand_points_1)
        correspondence_points_2 = np.vstack(rand_points_2)

        ref_F = eightpoint(correspondence_points_1, correspondence_points_2, M)
        inliers, inlier_count, distance_error, error_state = compute_inliers(ref_F, 
                                                                            test_locs1,
                                                                            test_locs2, 
                                                                            inlier_tol,
                                                                            im1,
                                                                            im2)

        if error_state == 1:
            continue

        if (inlier_count > final_inlier_count):
            final_inlier_count = inlier_count
            final_inliers = inliers
            final_corresp_points_1 = correspondence_points_1
            final_corresp_points_2 = correspondence_points_2
            final_distance_error = distance_error
            final_test_locs1 = test_locs1
            final_test_locs2 = test_locs2
        
    if final_distance_error != 100000000:
        # print("original point count is", locs1.shape[0])
        print("final inlier count is", final_inlier_count)
        print("final inlier's cumulative distance error is", final_distance_error)

        delete_indexes = np.where(final_inliers==0)
        final_locs_1 = np.delete(final_test_locs1, delete_indexes, axis=0)
        final_locs_2 = np.delete(final_test_locs2, delete_indexes, axis=0)

        final_locs_1 = np.vstack((final_locs_1, final_corresp_points_1))
        final_locs_2 = np.vstack((final_locs_2, final_corresp_points_2))

        bestH2to1 = eightpoint(final_locs_1, final_locs_2, M)
        return bestH2to1, [final_locs_1, final_locs_2]
    
    else:
        print("SOMETHING WRONG")
        bestH2to1 = eightpoint(correspondence_points_1, correspondence_points_2, M)
        return bestH2to1, 0

Tracking Real World Objects in 3D

'''
Q6.1 Multi-View Reconstruction of keypoints.
    Input:  C1, the 3x4 camera matrix
            pts1, the Nx3 matrix with the 2D image coordinates and confidence per row
            C2, the 3x4 camera matrix
            pts2, the Nx3 matrix with the 2D image coordinates and confidence per row
            C3, the 3x4 camera matrix
            pts3, the Nx3 matrix with the 2D image coordinates and confidence per row
    Output: P, the Nx3 matrix with the corresponding 3D points for each keypoint per row
            err, the reprojection error.
'''
def MultiviewReconstruction(C1, pts1, C2, pts2, C3, pts3, Thres = 200):
    vis_pts_1 = np.where(pts1[:,2] > Thres)
    vis_pts_2 = np.where(pts2[:,2] > Thres)
    vis_pts_3 = np.where(pts3[:,2] > Thres)
    
    # create a dummy vector to save the 3D points for each corresp 2D pt
    pts_3d = np.zeros(pts1.shape)
    reproj_error = np.zeros(12)

    overlap_all = np.intersect1d(vis_pts_1, vis_pts_2, vis_pts_3)
    for i in overlap_all:
        pts_cam_1_2, err1 = triangulate(C1, pts1[i,:-1], C2, pts2[i,:-1])
        pts_cam_2_3, err2 = triangulate(C2, pts2[i,:-1], C3, pts3[i,:-1])
        pts_cam_1_3, err3 = triangulate(C1, pts1[i,:-1], C3, pts3[i,:-1])

        avg_pt_i = (pts_cam_1_2 + pts_cam_2_3 + pts_cam_1_3)/3
        avg_err = (err1+err2+err3)/3
        pts_3d[i,:] = avg_pt_i
        reproj_error[i] = avg_err
    
    for i in vis_pts_1[0]:
        # print("i is", i)
        if i not in overlap_all:
            # print("computing", i)
            if i in vis_pts_2[0]:
                pts_i, err = triangulate(C1, pts1[i,:-1], C2, pts2[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            elif i in vis_pts_3[0]:
                pts_i, err = triangulate(C1, pts1[i,:-1], C3, pts3[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            else:
                print("point not visible in 2 views")

    for i in vis_pts_2[0]:
        # print("i is", i)
        if i not in overlap_all:
            # print("computing", i)
            if i in vis_pts_3[0]:
                pts_i, err = triangulate(C2, pts2[i,:-1], C3, pts3[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            elif i in vis_pts_1[0]:
                pts_i, err = triangulate(C1, pts1[i,:-1], C2, pts2[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            else:
                print("point not visible in 2 views")

    for i in vis_pts_3[0]:
        if i not in overlap_all:
            # print("computing", i)
            if i in vis_pts_1[0]:
                pts_i, err = triangulate(C1, pts1[i,:-1], C3, pts3[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            elif i in vis_pts_2[0]:
                pts_i, err = triangulate(C1, pts1[i,:-1], C2, pts2[i,:-1])
                pts_3d[i,:] = pts_i
                reproj_error[i] = err
            else:
                print("point not visible in 2 views")
    
    print("pts1 shape is", pts1.shape)
    print("3d points shape is", pts_3d.shape)

    return pts_3d, reproj_error, [vis_pts_1, vis_pts_2, vis_pts_3]

def MutliviewReconstructionError(x, C1, pts1, C2, pts2, C3, pts3, vis_pts_list):
    # decompose x
    P_init = x
    P_shape_req = int(P_init.shape[0]/3)
    P_init = np.reshape(P_init, newshape=(P_shape_req,3))
    
    vis_pts_1 = vis_pts_list[0]
    vis_pts_2 = vis_pts_list[1]
    vis_pts_3 = vis_pts_list[2]

    pts1 = pts1[:,0:2]
    pts2 = pts2[:,0:2]
    pts3 = pts3[:,0:2]

    # list to store error values
    err_list = []

    # build a sub_P matrix for all visible points in pts1, pts2, pts3
    sub_pts1 = np.take(pts1, vis_pts_1, axis=0)[0]
    sub_P1 = np.take(P_init, vis_pts_1, axis=0)[0]
    sub_pts2 = np.take(pts2, vis_pts_2, axis=0)[0]
    sub_P2 = np.take(P_init, vis_pts_2, axis=0)[0]
    sub_pts3 = np.take(pts3, vis_pts_3, axis=0)[0]
    sub_P3 = np.take(P_init, vis_pts_3, axis=0)[0]
    
    P_list = [sub_P1, sub_P2, sub_P3]
    pts_list = [sub_pts1, sub_pts2, sub_pts3]
    C_list = [C1, C2, C3]

    for i in range(len(P_list)):
        P = P_list[i]
        p= pts_list[i]
        C = C_list[i]
        
        # homogenize P to contain a 1 in the end (P = Nx3 vector)
        P_homogenous = np.append(P, np.ones((P.shape[0],1)), axis=1)
        
        # Find the projection of P1 onto image 1 (vectorize)
        # Transpose P_homogenous to make it a 4xN vector and left mulitply with C1
        #  3xN =  3x4 @ 4XN
        p_hat = (C @ P_homogenous.T)
        # normalize and transpose to get back to format of p1
        p_hat = ((p_hat/p_hat[2,:])[0:2,:]).T

        error = np.linalg.norm((p-p_hat).reshape([-1]))**2
        err_list.append(error)
    
    err_total = err_list[0] + err_list[1] + err_list[2]
    # print("error overall is", err_total)

    return err_total
    
def triangulate(C1, pts1, C2, pts2):
    """
    Find the 3D coords of the keypoints

    We are given camera matrices and 2D correspondences.
    We can therefore find the 3D points (refer L17 (Camera Models) of CV slides)

    Note. We can't just use x = PX to compute the 3D point X because of scale ambiguity
          i.e the ambiguity can be rep. as x = alpha*Px (we cannot find alpha)
          Therefore we need to do DLT just like the case of homography 
          (see L14 (2D transforms) CVB slide 61)

    Args:
        C1   : the 3x4 camera matrix of camera 1
        pts1 : img coords of keypoints in camera 1 (Nx2)
        C2   : the 3x4 camera matrix of camera 2
        pts2 : img coords of keypoints in camera 2 (Nx2)

    Returns:
        P    : the estimated 3D point for the given pair of keypoint correspondences
        err  : the reprojection error
    """
    P = np.zeros(shape=(1,3))
    err = 0


    # get the camera 1 matrix
    p1_1 = C1[0,:]
    p2_1 = C1[1,:]
    p3_1 = C1[2,:]

    # get the camera 2 matrix
    p1_2 = C2[0,:]
    p2_2 = C2[1,:]
    p3_2 = C2[2,:]

    x, y = pts1[0], pts1[1]
    x2, y2 = pts2[0], pts2[1]

    # calculate the A matrix for this point correspondence
    A = np.array([y*p3_1 - p2_1 , p1_1 - x*p3_1 , y2*p3_2 - p2_2 , p1_2 - x2*p3_2])
    u, s, v = np.linalg.svd(A)

    # here the linalg.svd gives v_transpose
    # but we need just V therefore we again transpose
    X = v.T[:,-1]
    # print("X is", X)
    X = X.T
    X = np.expand_dims(X,axis=0)
    # print("X after transpose and expand is", X)
    
    # convert X to homogenous coords
    X = X/X[0,3]
    # print("X after normalizing is", X)

    P = np.concatenate((P, X[:,0:3]), axis=0)
    
    X = X.T

    # find the error for this projection
    # 3x1 = 3x4 . 3x1 
    pt_1 = ((C1 @ X)/(C1 @ X)[2,0])[0:2,0]
    pt_2 = ((C2 @ X)/(C2 @ X)[2,0])[0:2,0]

    # calculate the reporjection error
    err += np.linalg.norm(pt_1 - pts1)**2 + np.linalg.norm(pt_2 - pts2)**2

    # print("error in this iteration is", err)
    P = P[1:,:]
    return P[0], err

'''
Q6.2 Plot Spatio-temporal (3D) keypoints
    :param car_points: np.array points * 3
'''
def plot_3d_keypoint_video(pts_3d_video):
    fig = plt.figure()
    # num_points = pts_3d.shape[0]
    ax = fig.add_subplot(111, projection='3d')

    vid_len = len(pts_3d_video)
    vals = np.linspace(0.1,1, num=vid_len, endpoint=False)

    for i in range(len(pts_3d_video)):
        pts_3d = pts_3d_video[i]
        for j in range(len(connections_3d)):
            index0, index1 = connections_3d[j]
            xline = [pts_3d[index0,0], pts_3d[index1,0]]
            yline = [pts_3d[index0,1], pts_3d[index1,1]]
            zline = [pts_3d[index0,2], pts_3d[index1,2]]
            ax.plot(xline, yline, zline, color=colors[j], alpha=vals[i])
    np.set_printoptions(threshold=1e6, suppress=True)
    ax.set_xlabel('X Label')
    ax.set_ylabel('Y Label')
    ax.set_zlabel('Z Label')
    plt.show()


#Extra Credit
if __name__ == "__main__":
         
    pts_3d_video = []
    for loop in range(10):
        print(f"processing time frame - {loop}")

        data_path = os.path.join('data/q6/','time'+str(loop)+'.npz')
        image1_path = os.path.join('data/q6/','cam1_time'+str(loop)+'.jpg')
        image2_path = os.path.join('data/q6/','cam2_time'+str(loop)+'.jpg')
        image3_path = os.path.join('data/q6/','cam3_time'+str(loop)+'.jpg')

        im1 = plt.imread(image1_path)
        im2 = plt.imread(image2_path)
        im3 = plt.imread(image3_path)

        data = np.load(data_path)
        pts1 = data['pts1']
        pts2 = data['pts2']
        pts3 = data['pts3']

        K1 = data['K1']
        K2 = data['K2']
        K3 = data['K3']

        M1 = data['M1']
        M2 = data['M2']
        M3 = data['M3']

        #Note - Press 'Escape' key to exit img preview and loop further 
        # img = visualize_keypoints(im2, pts2)

        C1 = K1 @ M1
        C2 = K2 @ M2
        C3 = K3 @ M3
        pts_3d, err, vis_pts_list = MultiviewReconstruction(C1, pts1, C2, pts2, C3, pts3)
        x_start = pts_3d.flatten()
        
        x_optimized_obj = minimize(MutliviewReconstructionError, x_start, args=(C1, pts1, C2, pts2, C3, pts3, vis_pts_list), method='Powell')
        print("x_end shape is", x_optimized_obj.x.shape)
        x_optimized = x_optimized_obj.x

        P_final = x_optimized
        P_shape_req = int(P_final.shape[0]/3)
        P_final = np.reshape(P_final, newshape=(P_shape_req,3))
        plot_3d_keypoint(P_final)
        pts_3d_video.append(P_final)
        visualize_keypoints(im1, pts1, Threshold=200)
    
    plot_3d_keypoint_video(pts_3d_video)
    out_dir = "/home/sush/CMU/Assignment_Sem_1/CV_A/Assignment_4/code/outputs"
    check_and_create_directory(out_dir, create=1)
    np.savez_compressed(
                        os.path.join(out_dir, 'q6_1.npz'),
                        P_final)