summer 2011: head segmentation: Literature review (part 9)

Took the wrong train today, so had extra time to read. Today we have ears, eyebrows, and head detection. I think the girl sitting next to me on the train was wondering why I was looking at arrays of photos of ears.

"On Model-Based Analysis of Ear Biometrics" by Arbab-Zavar et al.: The task is biometrics using the ear, claiming that ears are distinctive. Their dataset is a subset of the XM2VTS dataset, sides of heads with the ears clearly visible, but not aligned.

They describe their model as a constellation model. They train it by taking one image from each of the 63 subjects and manually cropping to the ear. They run SIFT on all these images and cluster the resulting descriptors, using information from the descriptor space as well as the location space; they can use location space information because the ears are roughly registered.

Given a test image, they find the elliptical shape of the ear in the test image and crop to the ear. Then they extract SIFT features. The final match score is the cost of matching all of the cluster centers to the SIFT descriptors in the test image.

They do some occlusion tests by synthetically occluding images of ears (though the enrollment ears remain fully visible). Their performance drops with occlusion, but they find they do better than a competing method based on PCA.

"Robust 2D ear registration and recognition based on sift point matching" by Bustard and Nixon: Registers ears using SIFT (assuming the ear is planar). With SIFT for registration and a simple pixel-to-pixel distance, they get results equal to manual registration with PCA.

In their method, ears in the gallery are masked, finding ear pixels and non-ear pixels. Then SIFT features are extracted from the ear regions in the gallery. For a query image, SIFT features are extracted and matched across the gallery via RANSAC. Based on language in the experiments section, the matching appears to be image-to-image. A score for each gallery image is computed by warping the query by the correct homography and taking squared pixel error. The error is made robust to occlusion by capping the error any one pixel can contribute. Additionally, before comparison the images are normalized to minimize effects of lighting.

They test with the XM2VTS dataset as well as a dataset of their own devising, showing decent results even with occlusion and viewpoint variation.

They should have given results for manual registration and their image distance, to prove that for perfect registration, their distance is better than PCA. Then they could have made two contributions.

Contains a good review section.

"A method for estimating and accurately extracting the eyebrow in human face image" by Chen and Cham: They extract eyebrow contours using a k-means technique, which appears to be clustering of image patches. They use the initial estimate of the eyebrow vs non-eyebrow pixels to initialze a snake (contour fitting) which just uses image gradients.

Once they've segmented the eyebrows, they characterize the curves of the top halves of the eyebrows and compare the characterizations. On the Olivetti face dataset, they achieve 87% accuraccy.

"Active shape models-their training and application" by Cootes et al.: The desire is to capture the shapes of objects while allowing some variability in shape. Unlike active contour models, active shape models parameterize the ways the shape can change, so that only class-specific deformations are allowed.

"Feature detection and tracking with constrained local models" by Cristinacce: Like the active appearance model, uses shape and texture to to parameterize a novel image. However, this work cites active appearance models and claims to have improved localization accuracy.

"An accurate algorithm for head detection based on XYZ and HSV hair and skin color models" by Gunes and Piccardi: The goal is head segmentation from background, including even back-of-head segmentation.

They learn a Gaussian mixture model for hair color, representing color in both XYZ and HSV space. Colors with sufficient density under the GMM are considered to be hair colors. Interestingly, they seem to make an error by adding probabilities in equation (3) instead of multiplying them.

They learn a similar model for skin color, but before feeding skin pixels to their model, they first filter them with the technique of Hsu et al. A pixel is estimated as a head pixel if it is a hair or skin pixel. So no naked people. Given the detected head pixels in an image, they perform morphological closure to fill in gaps and fit an ellipse to the resulting shape.

Unfortunately, they do not talk about recognition; this is detection paper only.

"Eyes and eyebrows parametric models for automatic segmentation" by Hammal and Caplier: They work with video data, and assume the face in the first frame of each video has been detected. They then track faces using block matching.

They detect irises, eyes, and eyebrows, but I focused on eyebrows. Assuming they a rectangle in which the eyebrow lies, they detect the x locations of the eyebrow endpoints by looking at the zero crossings of the first derivative of the vertical projection of the rectangle. They detect the shared y location by looking at the maximum of the horizontal projection of the rectangle. They fit a Bezier curve through the two points as well as the point in between them. They refine the curve by adjusting it locally to maximize the flow of luminance gradient through the curve.

summer 2011: head segmentation

Wednesday, September 7, 2011

Literature review (part 9)

No comments:

Post a Comment