summer 2011: head segmentation: Literature review (part 4)

"A closed-form solution to natural image matting" by Levin et al.: Image matting aims to solve an inverse problem. A given image is assumed to be equal to an alpha blend of two latent images, the foreground and background images. The problem, which seeks to recover the latent images as well as the alpha mask, is underconstrained; each pixel in an RGB image yields three constraints but produces seven degrees of freedom. However, part of the alpha mask is supplied as user input (the user labels some pixels as pure foreground and some as pure background).

This paper assumes foreground and background windows are locally constant, and additionally imposes a smoothness constraint on the alpha mask. With these assumptions, the alpha mask can be recovered exactly by solving an unconstrained quadratic program. In the case of color images, the authors relax their assumptions somewhat, assuming the pixel values in foreground and background windows lay on lines in RGB space (generally, they could lay anywhere in the RGB cube). In this case, their optimization problem still reduces to a QP.

Their method can be interpreted as producing an affinity function between pixels, even before the alpha matte is known. This can be used to help the user of the matting system, by telling the user which pixels are though to be similar, allowing the user to quickly spot and fix incorrect assumptions.

It seems like a fairly simple idea, and the results are impressive. They are able, for example, to matte a woman and her fly-away hair, preserving long thin filaments of hair that are surrounded by background. A matting with this much detail is probably not needed for texture-based descriptors, but would be great for shape-based descriptors. A long, thin, thread is a very distinctive shape.

"Markov random field models for hair and face segmentation" by Lee et al.: This paper does fully automatic hair / face / background segmentation using an MRF-style system.

To get the per-segment probability for a pixel, they use two sources of information. First, they look at the pixel location; since the images are aligned, hair occurs in the same parts of the image. Because hairstyles differ, they express the probability that a pixel will be hair using a mixture model, hand-clustering the training data into one of six hair types: {thin, thick} x {short, medium, long}. The second source of information is the pixel color; they model the probability of a color given a segment type using a Gaussian mixture model. With these two sources of information, they get their per-class evidence. They combine this with a prior that penalizes neighboring pixels having different segment types, assigning a penalty of either zero or one. To recover the approximately optimal solution to this MRF-type problem, they use loopy belief propagation.

The idea seems sound to me, but the methods are old school. They could probably get better initial segment type estimates if they used texture (hair vs skin should be easy to distinguish with texture). There is probably also a clever way to replace the MRF inference with matting as described in the previous paper, though there might be issues: 1) going from foreground-background to multiple segmentation types 2) conceptually dealing with "soft" segmentations (alpha values) 3) mapping the idea of wanting to keep foreground and background constant to the idea of wanting to preserve evidence.

summer 2011: head segmentation

Wednesday, August 31, 2011

Literature review (part 4)

No comments:

Post a Comment