Custom object tracker (CV Project part 4)

Visual object trackers are commonly used in computer vision applications. The goal of a tracker is to estimate the location of a target of interest in each frame of an image sequence. There are plenty of trackers in the literature. Some of them are based on convolutional neural networks (CNN). It has been shown that while the features in the shallow layers of a network are suitable for tracking, those in the deep layers are more suitable for classification applications. In addition to neural features, some of the trackers utilize histogram of gradients (HOG) features as well. The most successful trackers that use those mixed features are LADCF and MFT. Similarly, SiamRPN uses siamese networks and region proposal subnetwork. Some other trackers also utilize discriminatively trained correlation filters (DCF). Classical methods such as MOSSE and KFC uses fast convolution in frequency domain.

Although the tracking is successfully maintained in ideal cases, difficult conditions such as camera motion, illumination change, occlusion, motion change, scale variations may cause disruption. Advanced trackers can handle the difficulties up to some extent.

In this article, our goal is to design a simple and lightweight tracker which is based on template matching, kalman filtering, and adaptive scaling. In the first step, the tracker initiates the track using the function ‘init’. The initial bounding box is given by the object detector. While the bounding box corresponds to a track gate, there is also a larger gate which corresponds to the acquisition gate. The search is performed only in the acquisition gate. The obtained raw location information is fed into the kalman filter. Also, at each frame or time step, we obtain a correlation value as a result of executing the template matching procedure. When the object moves, usually the correlation value drops mostly because of scale changes. When it drops below a certain threshold level, the algorithm executes an adaptive scale search process. It tries a smaller and larger version of the tracked pattern (selected_img). If one of them produces a larger correlation value, then the track is continued with that version of the tracked pattern. Also note that this pattern is updated during the adaptation operation. After giving the brief info about the tracker, here I share the class implementation of the tracker which is written in python.

import cv2
import numpy as np
from kalmanFilter import kalmanFilter

class customTracker():
    def __init__(self):
        self.kernel = np.ones((3,3))
        self.width = 0
        self.height = 0
        self.tracking_method = cv2.TM_CCOEFF_NORMED
        self.boundingBox = [] 
        self.acq_offset = 20
        self.acq_scale = 1.5
        self.match_threshold = 0.5
        self.refresh_threshold = 0.5
        self.frame_rate = 30
        self.small_image_scale = 0.9
        self.large_image_scale = 1.1
        self.acq_pen_color = (255,255,255)
        self.acq_pen_thickness = 2
        self.selected_img = []
        self.found_x = 0
        self.found_y = 0
        self.search_point = (0,0)
        self.track_loss = False        
        self.kalman_tracker = kalmanFilter(state_dim=4, measurement_dim=2,
                                         dt=1, Q=10, R=1)        
        self.lost_box = (0, 0, 20, 20)
    
    def clear(self):
        self.__init__()

    def init(self, frame, boundingBox):
        frame = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
        frame = cv2.equalizeHist(frame)
        (self.height, self.width) = frame.shape
        self.boundingBox = boundingBox 
        (x,y,w,h) = self.boundingBox
        self.selected_img = frame[y:y+h,x:x+w]
        self.found_x = x
        self.found_y = y
        self.search_point = (self.found_x+w/2, self.found_y + h/2)
        
        
    def update(self, frame):
        frame_new = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)   
        frame_new = cv2.equalizeHist(frame_new)
        if self.track_loss:
            return False, self.lost_box
        ret = True
        max_val, top_left = self.inner_temp_match(frame_new,self.selected_img, self.boundingBox)
        
        if max_val < self.match_threshold:
           
            rescaled_img_larger, larger_bounding_box = self.adapt_img_size(self.large_image_scale,0)
        
            max_val_larger, top_left_larger = self.inner_temp_match(frame_new, rescaled_img_larger, larger_bounding_box)
            
            rescaled_img_smaller, smaller_bounding_box = self.adapt_img_size(self.small_image_scale,1)
            
            max_val_smaller, top_left_smaller = self.inner_temp_match(frame_new, rescaled_img_smaller, smaller_bounding_box)

            
            print('min max: ',max_val_larger, max_val_smaller)
            max_val = max(max_val_larger, max_val_smaller)
            
            if max_val > self.match_threshold:
                if max_val_larger > max_val_smaller:
                    top_left = top_left_larger
                    self.selected_img = rescaled_img_larger
                    self.boundingBox = larger_bounding_box
                    if max_val > self.refresh_threshold:
                        self.refresh_selected_img(frame_new,0)
                  
                else:
                    top_left = top_left_smaller
                    self.selected_img = rescaled_img_smaller
                    self.boundingBox = smaller_bounding_box 
                    if max_val > self.refresh_threshold:
                        self.refresh_selected_img(frame_new,1)

            else:
                ret = False
                self.track_loss = True
                return False, self.lost_box

            
        print('corr: {:.2f}'.format(max_val))
        
        (xx,yy,ww,hh) = self.get_acq_box(self.boundingBox)
        
        (_,_,w,h) = self.boundingBox
        
        kalman_res = self.kalman_tracker.predict(np.array(top_left))
        
        (inner_pos_x, inner_pos_y, vx, vy) = kalman_res.ravel()
        
        self.found_x = inner_pos_x + xx
        self.found_y = inner_pos_y + yy
        self.search_point = (self.found_x + w/2, self.found_y + h/2 )
        return ret, (self.found_x, self.found_y, w, h)


    def plot_acq_box(self,frame):
        (xx,yy,ww,hh) = self.get_acq_box(self.boundingBox)
        cv2.rectangle(frame, (xx, yy), (xx + ww, yy + hh), self.acq_pen_color, self.acq_pen_thickness)
        cv2.putText(frame,'Acquisition Gate', org=(xx-5,yy-5),
                    fontFace=cv2.FONT_HERSHEY_COMPLEX,fontScale=0.5,color=self.acq_pen_color)

        
    def get_acq_box(self, bounding_box):
        (x,y,w,h) = bounding_box
        ww = int(w*self.acq_scale + self.acq_offset)
        hh = int(h*self.acq_scale + self.acq_offset)
        
        xx = int(self.search_point[0] - ww/2)
        yy = int(self.search_point[1] - hh/2)
        
        xx = max(0,xx)
        yy = max(0,yy)
        
        if xx + ww >= self.width:
            xx = self.width - ww -1
            
        if yy + hh >=self.height:
            yy = self.height - hh -1
        
        return (xx,yy,ww,hh)
    
    def adapt_img_size(self, img_scale, scale_type):
        (x,y,w,h) = self.boundingBox        
        w_new = int(w*img_scale)
        h_new = int(h*img_scale)
        
        if scale_type == 0:
            residue_x = - (img_scale-1) / 2 * w_new
            residue_y = - (img_scale-1) / 2 * h_new * self.height/self.width
        else:
            residue_x = (1-img_scale) / 2 * w_new
            residue_y = (1-img_scale) / 2 * h_new * self.height/self.width
            
        x = int(self.found_x + residue_x)
        y = int(self.found_y + residue_y)

        new_bounding_box = (x,y,w_new,h_new)              
        rescaled_img = cv2.resize(self.selected_img,(w_new,h_new))
        return rescaled_img, new_bounding_box
    
    
    def inner_temp_match(self, frame, template_img, bounding_box):
        
        (xx,yy,ww,hh) = self.get_acq_box(bounding_box)
        frame_new = frame[yy:yy+hh, xx:xx+ww]
        (x,y,w,h) = bounding_box
        res = cv2.matchTemplate(frame_new, template_img, self.tracking_method)
        min_val,max_val, min_loc, max_loc = cv2.minMaxLoc(res)
        top_left = max_loc
        return max_val, top_left

    def refresh_selected_img(self, frame, scale_type):
        (x,y,w,h) = self.boundingBox
    
        
        self.selected_img = frame[y:y+h,x:x+w]

The tracker class uses the kalman filter class which is available at the following article:

For multi-object tracking, I also share the codes of my custom multi-object tracker below:

class customMultiTracker():
    
    def __init__(self):
        self.tracker_list = []
    
    def add(self, tracker):
        self.tracker_list.append(tracker)
        
    def clear(self):
        self.tracker_list = []
        
    def update(self,frame):
        box_list = []

        res = True
        for tracker in self.tracker_list:
            if tracker.track_loss == False:
                ret, box = tracker.update(frame)
                if ret:
                    res = True
                    box_list.append(box)
        
        return res, box_list

In the feature, I’m planning to include HOG features which perform better than template matching in the different illumination conditions. That’s all for this article. Enjoy your tracker!

Sharing is caring!

Leave a Reply

Your email address will not be published. Required fields are marked *