Introduction
Depth cameras are increasingly popular in building automation, occupancy management, security and access control, enabled by low-cost depth sensors like the OPT8241 and OPT8320 3D Time-of-Flight chipsets from Texas Instruments. A key benefit depth cameras is the ability to use depth to segregate foreground from the background. Once foreground objects are isolated, they can be recognized, tracked and counted using modern image processing algorithms available in OpenCV. In this post, I will describe how to use the OPT8241-CDK-EVM depth camera, BSD-licensed Voxel SDK, and OpenCV to create a simple people counter and tracking application.
The general strategy of people counting and tracking is as follow:
- Foreground-Background Separation
- Convert to Binary Image and Apply Morphology Filters
- Shape Analysis
- Tracking
Foreground-Background Separation
Foreground-background separation starts with registering the background, which is necessary before one can separate foreground from background through image subtraction–if depth camera is used, image substraction would be between two depth images. Setting the background could be as simple as capturing a frame when the scene is absent of foreground objects. But simple approach also means background objects that may have subsequently moved will be detected, though noticing the initial change would be interesting to some applications. A more sophisticated approach would be to slowly fade in any alteration back into the background, if the alteration is not from objects being tracked, and the alteration is no longer changing. The first would require recognition; the latter is determined if there is sustaind period of no-change in the altered areas in image. If the sophisticated approach is adopted, the defnition of foreground is the fast-changing component of the scene, and background is the slow-changing component. The rate at which the foreground fades into the background should be a programmable parameter that depends on the type of applications. After subtraction, the result would be from newly present or absence of objects. To reduce the impact of camera noise, the “foreground” may need to be further qualified by minimum delta depth (“thickness”) and minimum amplitude (“brightness”).
The code example below illustrates a simple case of foreground-background separation:
void Horus::clipBackground(Mat &dMat, Mat &iMat, float dThr, float iThr)
{
for (int i = 0; i < dMat.rows; i++) {
for (int j = 0; j < dMat.cols; j++) {
float val = (iMat.at(i,j) > iThr && dMat.at(i,j) > dThr) ? 255.0 : 0.0;
dMat.at(i,j) = val;
}
}
}
where iThr is the intensity threshold, and dThr is the depth threshold.
Binary Image and Morphology Filter
The foreground from subtraction may contain speckles to noise, as noise varies from frame to frame. The morphology operators can be applied to remove speckles and fill in small gaps. The open operator first erodes the image using the chosen morphology element, then dilates the result to fill in the gaps and smooth the edges. The OpenCV example is given below, where the image on the left is the original image, and image on the right is the result after applying the open operator. Note the small hole and gaps are filled in.
Shape Analysis
After the foreground is isolated as a binary image, shape analysis can be performed to find individual objects in the foreground. This step is where people counting solutions vary–algorithms that differentiate people from objects with high accuracy are considered superior than those that do not. People tracking algorithms depend heavily on camera angles. Algorithms for ceiling-mounted camera are generally more simple than those for corner-mounted cameras because from the ceiling “people” look like well-formed blobs; but from the corner, “people” become complex overlapping silhouettes which harder to separate. Several relevant shape analysis algorithms useful in people tracking and coutning are described below. Most of them are available in OpenCV.
Blob Anlaysis
Blob analysis works by connecting joined, self-enclosing regions in the foreground sharing common properties such as area, thresholds, circularity, inertia and convexity. Proper selection of these properties can greatly enhance accuracy. A great summary article on blob analysis with example code is available from Satya Mallick.
Blob analysis works best when the camera is ceiling mounted, because people will generally look like well-formed blobs from that camera angle. However, people in physical contact with one another can cause their blobs to join, leading to miscount. The erode operator is useful in this case, as it can split thinly connected blobs. Even though blob analysis is a natural fit for ceiling-mounted cameras, it can be appllied to corner-mounted cameras if the overlapping issue can be resolved. One way to deal with overlap is to “slice” the observed volume along the camera’s z-axis and perform blob analysis one “slice” at a time.
Contour Analysis
Foreground shapes can also be recognized and tracked by contours, a list of points that form a self-enclosing outline of the foreground object it encloses. A contour has a length and an enclosed area. A point in the image can be inside or outside a contour; and a contour can be nested inside another; but contours do not cross path. Contours can be compared for similarity. With proper setting of this set of properties to reflect those of a “person”, the number of contours in the foreground becomes a people counter.
A key benefit of contours is the ability to identify appendages, or body parts, such as fingers, legs, arm, shoulder. This ability is available through contour operators like convex hull and convexity defects. In the example below, convex hull is the vertices of the green convex polygon; and convexity defects are the red points at the bottom of “valleys”. The “valleys” are called convexity defects because they represent violations of convexity. Once convex hull and convexity defects are identified, together with the contour centroid and some heuristics, they identify head, arms, legs of a person.
Region Growing
For corner-mounted cameras, people in the foreground may overlap, especially in a crowded room. The point cloud of the foreground pixels should be exploited to group points belonging to the same individual. The region growing algorithm can be applied to group pixels having similar
The first step is finding suitable seeding points. One way is to histogram each foreground blob and identify the top 2-3 local maxima, but the maxima must meet some mininum separation requirement. Then seed the point in each maxima that is closest to the centroid of all points belonging to the same maxima. To grow the region, set each seed as the center, then scan the 8 neighbors to qualify or disquality them into the group based on the
Region Growing Algorithm in People Counting [1].
Tracking
In some applications, tracking the movement of people in a room is important. For example: monitoring presence of suspicious or unusual activities, or quantifying the interest of a crowd to particular products or showcases. Tracking also enables one to maintain proper head count in situations where people may be partially or even fully occluded. In these scenarios, if tracker has not detected any “people” leaving the scene from the sides of the camera view, then any disappearing blobs must be due to occlusion, therefore head count must remain unchanged. Tracking requires matching foreground entities in consecutive frames. The matching can be based multiple criteria, such as shortest centroid displacement and similarity of contour shape and intensity profile. Subtraction of consecutive frames will also give excellent indication of direction of motion, enabling prediction of where in the new frame the tracked object is.
Simple Code Example
The code snippet below illustrate people tracking and counting that I described above. The #if macro comments selects between blob tracking and contour tracking.
void Horus::update(Frame *frame)
{
vector< vector > contours;
vector hierarchy;
RNG rng(12345);
if (getFrameType() == DepthCamera::FRAME_XYZI_POINT_CLOUD_FRAME) {
// Create amplitude and depth Mat
vector zMap, iMap;
XYZIPointCloudFrame *frm = dynamic_cast(frame);
for (int i=0; i< frm->points.size(); i++) {
zMap.push_back(frm->points[i].z);
iMap.push_back(frm->points[i].i);
}
_iMat = Mat(getDim().height, getDim().width, CV_32FC1, iMap.data());
_dMat = Mat(getDim().height, getDim().width, CV_32FC1, zMap.data());
// Apply amplitude gain
_iMat = (float)_ampGain*_iMat;
// Update background as required
if (!_setBackground) {
_dMat.copyTo(_bkgndMat);
_setBackground = true;
cout << endl << "Updated background" << endl;
}
// Find foreground by subtraction and convert to binary
// image based on amplitude and depth thresholds
Mat fMat = clipBackground((float)_depthThresh/100.0, (float)_ampThresh/100.0);
// Apply morphological open to clean up image
fMat.convertTo(_bMat, CV_8U, 255.0);
Mat morphMat = _bMat.clone();
Mat element = getStructuringElement( 0, Size(5,5), cv::Point(1,1) );
morphologyEx(_bMat, morphMat, 2, element);
// Draw contours that meet a "person" requirement
Mat drawing = Mat::zeros( _iMat.size(), CV_8UC3 );
Mat im_with_keypoints = Mat::zeros( _iMat.size(), CV_8UC3 );
cvtColor(_iMat, drawing, CV_GRAY2RGB);
int peopleCount = 0;
#if 1
// Find all contours
findContours(morphMat, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point(0,0));
for ( int i = 0; i < contours.size(); i++ ) {
if (isPerson(contours[i], _dMat)) {
peopleCount++;
drawContours( drawing, contours, i, Scalar(0, 0, 255), 2, 8, vector(), 0, cv::Point() );
}
}
#else
// Find blobs
std::vector keypoints;
SimpleBlobDetector::Params params;
// Filter by color
params.filterByColor = true;
params.blobColor = 255;
// Change thresholds - depth
params.minThreshold = 0;
params.maxThreshold = 1000;
// Filter by Area.
params.filterByArea = true;
params.minArea = 100;
params.maxArea = 100000;
// Filter by Circularity
params.filterByCircularity = false;
params.minCircularity = 0.1;
// Filter by Convexity
params.filterByConvexity = false;
params.minConvexity = 0.87;
// Filter by Inertia
params.filterByInertia = false;
params.minInertiaRatio = 0.01;
cv::Ptr detector = cv::SimpleBlobDetector::create(params);
detector->detect( morphMat, keypoints );
cout << "Keypoints # " << keypoints.size() << endl;
for ( int i = 0; i < keypoints.size(); i++ ) {
cv::circle( drawing, cv::Point(keypoints[i].pt.x, keypoints[i].pt.y), 10, Scalar(0,0,255), 4 );
}
peopleCount = keypoints.size();
#endif
putText(drawing, "Count = "+to_string(peopleCount), cv::Point(200, 50), FONT_HERSHEY_PLAIN, 1, Scalar(255, 255, 255));
imshow("Binary", _bMat);
imshow("Amplitude", _iMat);
imshow("Draw", drawing);
imshow("Morph", morphMat);
}
}
Below is a video of the people tracking using contour:
