Visual object trackers are commonly used in computer vision applications. The goal of a tracker is to estimate the location of a target of interest in each frame of an image sequence. There are plenty of trackers in the literature. Some of them are based on convolutional neural networks (CNN). It has been shown that while the features in the shallow layers of a network are suitable for tracking, those in the deep layers are more suitable for classification applications. In addition to neural features, some of the trackers utilize histogram of gradients (HOG) features as well. The most successful trackers that use those mixed features are LADCF and MFT. Similarly, SiamRPN uses siamese networks and region proposal subnetwork. Some other trackers also utilize discriminatively trained correlation filters (DCF). Classical methods such as MOSSE and KFC uses fast convolution in frequency domain.
Although the tracking is successfully maintained in ideal cases, difficult conditions such as camera motion, illumination change, occlusion, motion change, scale variations may cause disruption. Advanced trackers can handle the difficulties up to some extent.
In this article, our goal is to design a simple and lightweight tracker which is based on template matching, kalman filtering, and adaptive scaling. In the first step, the tracker initiates the track using the function ‘init’. The initial bounding box is given by the object detector. While the bounding box corresponds to a track gate, there is also a larger gate which corresponds to the acquisition gate. The search is performed only in the acquisition gate. The obtained raw location information is fed into the kalman filter. Also, at each frame or time step, we obtain a correlation value as a result of executing the template matching procedure. When the object moves, usually the correlation value drops mostly because of scale changes. When it drops below a certain threshold level, the algorithm executes an adaptive scale search process. It tries a smaller and larger version of the tracked pattern (selected_img). If one of them produces a larger correlation value, then the track is continued with that version of the tracked pattern. Also note that this pattern is updated during the adaptation operation. After giving the brief info about the tracker, here I share the class implementation of the tracker which is written in python.
import cv2
import numpy as np
from kalmanFilter import kalmanFilter
class customTracker():
def __init__(self):
self.kernel = np.ones((3,3))
self.width = 0
self.height = 0
self.tracking_method = cv2.TM_CCOEFF_NORMED
self.boundingBox = []
self.acq_offset = 20
self.acq_scale = 1.5
self.match_threshold = 0.5
self.refresh_threshold = 0.5
self.frame_rate = 30
self.small_image_scale = 0.9
self.large_image_scale = 1.1
self.acq_pen_color = (255,255,255)
self.acq_pen_thickness = 2
self.selected_img = []
self.found_x = 0
self.found_y = 0
self.search_point = (0,0)
self.track_loss = False
self.kalman_tracker = kalmanFilter(state_dim=4, measurement_dim=2,
dt=1, Q=10, R=1)
self.lost_box = (0, 0, 20, 20)
def clear(self):
self.__init__()
def init(self, frame, boundingBox):
frame = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
frame = cv2.equalizeHist(frame)
(self.height, self.width) = frame.shape
self.boundingBox = boundingBox
(x,y,w,h) = self.boundingBox
self.selected_img = frame[y:y+h,x:x+w]
self.found_x = x
self.found_y = y
self.search_point = (self.found_x+w/2, self.found_y + h/2)
def update(self, frame):
frame_new = cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
frame_new = cv2.equalizeHist(frame_new)
if self.track_loss:
return False, self.lost_box
ret = True
max_val, top_left = self.inner_temp_match(frame_new,self.selected_img, self.boundingBox)
if max_val < self.match_threshold:
rescaled_img_larger, larger_bounding_box = self.adapt_img_size(self.large_image_scale,0)
max_val_larger, top_left_larger = self.inner_temp_match(frame_new, rescaled_img_larger, larger_bounding_box)
rescaled_img_smaller, smaller_bounding_box = self.adapt_img_size(self.small_image_scale,1)
max_val_smaller, top_left_smaller = self.inner_temp_match(frame_new, rescaled_img_smaller, smaller_bounding_box)
print('min max: ',max_val_larger, max_val_smaller)
max_val = max(max_val_larger, max_val_smaller)
if max_val > self.match_threshold:
if max_val_larger > max_val_smaller:
top_left = top_left_larger
self.selected_img = rescaled_img_larger
self.boundingBox = larger_bounding_box
if max_val > self.refresh_threshold:
self.refresh_selected_img(frame_new,0)
else:
top_left = top_left_smaller
self.selected_img = rescaled_img_smaller
self.boundingBox = smaller_bounding_box
if max_val > self.refresh_threshold:
self.refresh_selected_img(frame_new,1)
else:
ret = False
self.track_loss = True
return False, self.lost_box
print('corr: {:.2f}'.format(max_val))
(xx,yy,ww,hh) = self.get_acq_box(self.boundingBox)
(_,_,w,h) = self.boundingBox
kalman_res = self.kalman_tracker.predict(np.array(top_left))
(inner_pos_x, inner_pos_y, vx, vy) = kalman_res.ravel()
self.found_x = inner_pos_x + xx
self.found_y = inner_pos_y + yy
self.search_point = (self.found_x + w/2, self.found_y + h/2 )
return ret, (self.found_x, self.found_y, w, h)
def plot_acq_box(self,frame):
(xx,yy,ww,hh) = self.get_acq_box(self.boundingBox)
cv2.rectangle(frame, (xx, yy), (xx + ww, yy + hh), self.acq_pen_color, self.acq_pen_thickness)
cv2.putText(frame,'Acquisition Gate', org=(xx-5,yy-5),
fontFace=cv2.FONT_HERSHEY_COMPLEX,fontScale=0.5,color=self.acq_pen_color)
def get_acq_box(self, bounding_box):
(x,y,w,h) = bounding_box
ww = int(w*self.acq_scale + self.acq_offset)
hh = int(h*self.acq_scale + self.acq_offset)
xx = int(self.search_point[0] - ww/2)
yy = int(self.search_point[1] - hh/2)
xx = max(0,xx)
yy = max(0,yy)
if xx + ww >= self.width:
xx = self.width - ww -1
if yy + hh >=self.height:
yy = self.height - hh -1
return (xx,yy,ww,hh)
def adapt_img_size(self, img_scale, scale_type):
(x,y,w,h) = self.boundingBox
w_new = int(w*img_scale)
h_new = int(h*img_scale)
if scale_type == 0:
residue_x = - (img_scale-1) / 2 * w_new
residue_y = - (img_scale-1) / 2 * h_new * self.height/self.width
else:
residue_x = (1-img_scale) / 2 * w_new
residue_y = (1-img_scale) / 2 * h_new * self.height/self.width
x = int(self.found_x + residue_x)
y = int(self.found_y + residue_y)
new_bounding_box = (x,y,w_new,h_new)
rescaled_img = cv2.resize(self.selected_img,(w_new,h_new))
return rescaled_img, new_bounding_box
def inner_temp_match(self, frame, template_img, bounding_box):
(xx,yy,ww,hh) = self.get_acq_box(bounding_box)
frame_new = frame[yy:yy+hh, xx:xx+ww]
(x,y,w,h) = bounding_box
res = cv2.matchTemplate(frame_new, template_img, self.tracking_method)
min_val,max_val, min_loc, max_loc = cv2.minMaxLoc(res)
top_left = max_loc
return max_val, top_left
def refresh_selected_img(self, frame, scale_type):
(x,y,w,h) = self.boundingBox
self.selected_img = frame[y:y+h,x:x+w]
The tracker class uses the kalman filter class which is available at the following article:
For multi-object tracking, I also share the codes of my custom multi-object tracker below:
class customMultiTracker():
def __init__(self):
self.tracker_list = []
def add(self, tracker):
self.tracker_list.append(tracker)
def clear(self):
self.tracker_list = []
def update(self,frame):
box_list = []
res = True
for tracker in self.tracker_list:
if tracker.track_loss == False:
ret, box = tracker.update(frame)
if ret:
res = True
box_list.append(box)
return res, box_list
In the feature, I’m planning to include HOG features which perform better than template matching in the different illumination conditions. That’s all for this article. Enjoy your tracker!