3D positioning using single camera (CV Project part 3)

In this post, I’m going to use only one camera to determine the 3D position of a target whose size is known apriori. The extracted position will be in the camera coordinate system. In the first step, I’m going to execute a calibration stage. For this, first I take the photos of a known object from the distances of 1,2, …, 5 meters. The actual width of the red object is 0.082 meters and I fix the camera resolution to 720p. At each photo, I measure the object width in terms of pixels. The procedure could be seen in the following figure.

Width of the object (pixels) in different distances

After that I obtain the number of pixels vs distance plot as follows.

Distance vs number of pixels

As we see that the plot has an exponential behaviour, we then use an exponential function to fit the data. In the same figure, we can observe the resulting fitted curve. The function that I use has the following form:

p = alpha * exp(-beta * d^gamma)

d: distance
p: number of pixels
alpha, beta, gamma: unknown coefficients

The result of the fitting operation gives the following numbers for the unknown coefficients:

alpha: 329
beta: 2
gamma: 0.4

We next need to invert the function to use it properly as follows:

d = (-1/beta*log(p/alpha))^(1/gamma)

If we multiply this function with the ratio (real object width/reference object width), we can then calculate the distance of the object, approximately. The final formula becomes:

d = (-1/beta*log(p/alpha))^(1/gamma)*(real object width/reference object width)

We also need to find x and y coordinates of the object of interest in order to localize it in 3D space. If we know the pixel locations of the object in the image (i,j), we can then easily convert them in x-y space using the following well known formulae:

x = depth * (i/(width-1)-0.5)*tan(fh/2) 
y = depth * (0.5-j/(height-1))*tan(fv/2) 

width: width of the image in pixels
height: height of the image in pixels
fh: horizantal fov of the camera in radians
fv: vertical fov of the camere in radians
depth: equivalent to d which we already determined

After giving the details, here I share the full python localization class as follows:

from kalmanFilter import kalmanFilter
import numpy as np

class localization():
    def __init__(self, frame_rate, width, height):
        
        self.width = width
        self.height = height
        self.frame_rate = frame_rate
        self.kalman_speed = kalmanFilter(state_dim=6, measurement_dim=3,
                                         dt=1/self.frame_rate, Q=0.001, R=50)
        self.reduction_rate = 0.8
        self.fov_h = 100
        self.fov_v = 70
    
    def predict(self, boundingBox, real_length):

        (x,y,w,h) = boundingBox   
        #print('loc: {:.2f}, {:.2f}'.format(loc[0], loc[1]))
        depth = self.calculate_dist(w*self.reduction_rate, real_length)
        loc = (x+w/2,y+h/2)
        (px,py,pz) = self.calx_xyz(depth, loc)
        ret = self.kalman_speed.predict(np.array([px,py,pz]))
        (kx,ky,kz,kvx,kvy,kvz) = ret.ravel()
        pos = (kx,ky,kz)
        vel = (kvx,kvy,kvz)
        return pos, vel
    
    def calculate_dist(self, pixels, real_length):
        alpha = 329
        beta = 2
        gamma = 0.4

        return (-1/beta*np.log(pixels/alpha))**(1/gamma) * real_length/0.082;
    
    def calx_xyz(self, depth, search_point):
        (i,j) = search_point
        fh = self.fov_h/180*np.pi
        fv = self.fov_v/180*np.pi
        
        px = depth * (i/(self.width-1)-0.5)*np.tan(fh/2) 
        py = depth * (0.5-j/(self.height-1))*np.tan(fv/2)
        pz = depth
        return (px,py,pz)

Please note that the class requires the kalmanFilter which I have already shared in the following link:

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *