summer 2011: head segmentation: Literature review (part 6)

We have a grab-bag today. A paper on co-segmentation, which is the task of finding the same thing in two images, a quite complicated hair segmentation paper from industry, and a new graphics paper on beard transfer.

"An efficient algorithm for co-segmentation" by Hochbaum and Singh: In co-segmentation, two images are presented, where an object co-occurs in the two images. The task is to segment the co-occurring object in each of the images, and as a byproduct determine which pixels belong to the co-occurring object in each image.

This paper draws inspiration from a previous approach to the same problem [Rother 2006], in which the error function combines MRFs for each of the two images (the segmentation part of the error) along with a histogram agreement term. The histogram agreement term considers the color histograms in the foregrounds of each of the two images, and encourages them to be similar (if the foregrounds between the two images are exactly the same, the color histograms will be identical). Apparently for useful histogram similarity functions, the entire optimization problem becomes intractable, reducing the practitioner to approximate methods. In this paper, a similar objective function is presented, but which can be reduced to a min cut problem and thus solved in polynomial time.

Interestingly, the technique is robust to small changes in the color histograms of the foreground object, as well as being completely invariant to how pixels are arranged in the foreground object. This enables the authors to match pairs of real Flickr photos which have a recurring object, such as photographs of similar lawn gnomes.

They also have a medical application of co-segmentation. Image slices of two different brains are affine-aligned and co-segmented. The parts that don't match (the background) are likely to be lesions in one of the brains.

"Automatic Hair Detection in the Wild" by Julian et al.: The task is hair segmentation. The dataset is user uploaded photos from a virtual fitting room for glasses (a private dataset). Thus the photos are probably frontal and fairly high quality.

They detect the face and eyes using cascade classifiers. Then they fit a constrained local model [Cristinacce and Cootes 2006] to the detected face and eyes to refine the estimate of the locations of the eyes and to get an initial estimate of the temple location. With the temple location estimates, they initialize a hybrid active shape / active contour model [Cootes 1995, Leventon 2000] to find the hair. They call this model the Upper Hair Shape Model (UHSM) because it only tries to find the hair on the top of the head and a little on the sides (crew cut). The model is agnostic to the color of the hair, only seeking color uniformity. They use the fitted UHSM and the eye locations to find background, hair, and face regions. They use these regions to initialize yet another model, an image adapted appearance model which is presumably not agnostic to color. The details of this further model are given in a previous paper, which I cannot find online: "P. Julian, V. Charvillat, C. Dehais, and F. Lauze. On the interest of texture for face segmentation. In Orasis, 2009."

They present no empirical measurements (probably trade secrets). Based on the photos (qualitative performance), I'd guess all that complexity didn't buy them much.

"Toward image-based facial hair modeling" by Herrera et al.: This is a graphics paper, where the task is to transfer facial hair from one registered image to another. What's interesting is they don't assume they know where is the hair in the source image, so they must detect hair pixels. Being a graphics paper, I hoped it would offer some fresh ideas on the subject.

They extract 4 features from each pixel: 1) The response of the orientation filter along the dominant angle (hair tends to all grow in the same direction at a point) 2) The absolute value of the local image gradient (hair tends to have high image gradients, smooth skin low image gradients), 3) The sum of the color channels R + G + B, 4) R - G and R - B (two values) because red dominates skin color and looking at differences reduces the effect of specularity.

The texture information they extract (1 and 2) can, morally speaking, be calculated from a SIFT descriptor. However, a SIFT descriptor has 128 dimensions, many of which may be useless, and calculating (1) from a SIFT descriptor would require a nonlinear function (max). It would be interesting to see how well SIFT would perform as a drop-in replacement. The color features are simply linear functions of the raw colors, so would be useless to extract if the next step were some sort of linear classifier or Mahalanobis metric learning.

But they don't do anything like that in the next step. Instead for classification, they bin each of their features and use naive Bayes to get probability given hair versus probability given skin.

The rest of the paper is graphics.

summer 2011: head segmentation

Saturday, September 3, 2011

Literature review (part 6)

No comments:

Post a Comment