summer 2011: head segmentation: August 2011

Wednesday, August 31, 2011

Literature review (part 4)

"A closed-form solution to natural image matting" by Levin et al.: Image matting aims to solve an inverse problem. A given image is assumed to be equal to an alpha blend of two latent images, the foreground and background images. The problem, which seeks to recover the latent images as well as the alpha mask, is underconstrained; each pixel in an RGB image yields three constraints but produces seven degrees of freedom. However, part of the alpha mask is supplied as user input (the user labels some pixels as pure foreground and some as pure background).

This paper assumes foreground and background windows are locally constant, and additionally imposes a smoothness constraint on the alpha mask. With these assumptions, the alpha mask can be recovered exactly by solving an unconstrained quadratic program. In the case of color images, the authors relax their assumptions somewhat, assuming the pixel values in foreground and background windows lay on lines in RGB space (generally, they could lay anywhere in the RGB cube). In this case, their optimization problem still reduces to a QP.

Their method can be interpreted as producing an affinity function between pixels, even before the alpha matte is known. This can be used to help the user of the matting system, by telling the user which pixels are though to be similar, allowing the user to quickly spot and fix incorrect assumptions.

It seems like a fairly simple idea, and the results are impressive. They are able, for example, to matte a woman and her fly-away hair, preserving long thin filaments of hair that are surrounded by background. A matting with this much detail is probably not needed for texture-based descriptors, but would be great for shape-based descriptors. A long, thin, thread is a very distinctive shape.

"Markov random field models for hair and face segmentation" by Lee et al.: This paper does fully automatic hair / face / background segmentation using an MRF-style system.

To get the per-segment probability for a pixel, they use two sources of information. First, they look at the pixel location; since the images are aligned, hair occurs in the same parts of the image. Because hairstyles differ, they express the probability that a pixel will be hair using a mixture model, hand-clustering the training data into one of six hair types: {thin, thick} x {short, medium, long}. The second source of information is the pixel color; they model the probability of a color given a segment type using a Gaussian mixture model. With these two sources of information, they get their per-class evidence. They combine this with a prior that penalizes neighboring pixels having different segment types, assigning a penalty of either zero or one. To recover the approximately optimal solution to this MRF-type problem, they use loopy belief propagation.

The idea seems sound to me, but the methods are old school. They could probably get better initial segment type estimates if they used texture (hair vs skin should be easy to distinguish with texture). There is probably also a clever way to replace the MRF inference with matting as described in the previous paper, though there might be issues: 1) going from foreground-background to multiple segmentation types 2) conceptually dealing with "soft" segmentations (alpha values) 3) mapping the idea of wanting to keep foreground and background constant to the idea of wanting to preserve evidence.

Literature review (part 3)

"The development of differential use of inner and outer face features in familiar face identification" by Campbell et al.: Studied schoolchildren, with task of identifying whether the image of a face belongs to a fellow classmate. Had three conditions: 1) use photo of entire face, 2) use just inside of face (crop outside), 3) use just outside of face (white-out internal features). In all conditions using the entire face resulted in the best accuracy, but young children do better with external features than internal features, with the effect flipping around age 9.

Also includes citation to Ellis, Shepherd, and Davies 1979, showing internal features are more useful than external features the more familiar a face. They speculate this may be caused by external features changing periodically in well-known faces

"When does the Inner-face Advantage in Familiar Face Recognition Arise and Why?" by Campbell et al.: The outer-face advantage is when external face information is more useful than internal face information. Shows inner-face advantage in adults when recognizing celebrity faces, with inner-face advantage for children, with the switch happening near age 15. Adolescents with a mental age of under 10 years showed outer-face advantage, suggesting the outer-face advantage is a developmental rather than a maturational phenomenon. This throws some water on the idea of using external face information with long-range photographs, because it is evidence the shift in humans from external focused to internal focused is independent of the improvement in eyesight with age.

Monday, August 29, 2011

Literature review (part 2)

"The contribution of external features to face recognition" by Lapedriza et al.: Point is that external features are useful for classification. They use the ARFace database (front-facing images, some with occlusion). External features are calculated using Building Blocks (BB) trained on outside-the-face rectangles (forehead, left side, right side, head).

BB is an image reconstruction / re-representation technique in which a region is reconstructed using a set of fragments. Each fragment is a small image patch, and the "correct location" of the patch is determined by convolving it with the image to be reconstructed and finding the best match. Once a set of patches have been placed, their weights are adjusted using nonnegative matrix factorization to approximately reproduce the pixels in the target image.

Once the BB fragments are selected, a new face is characterized as a vector of the normalized convolutions of the BBs with the query image. The inside-the-face region is represented as raw pixels. In the combined condition, external features are concatenated with internal features and dimension is reduced using nonparametric discriminant analysis (NDA), followed by nearest neighbor classification. In the internal-only condition, only the raw pixel values from inside the face are used, with dimensionality reduction via either PCA or NDA.

The authors find including the external information boosts performance. The conclude external information is useful, even when internal information is available.

"Face Verification using External Features" by Lapedriza et al.: Most recent of this set of papers reviewed so far. They basically do the same thing, though the application this time is face verification. The verification application seems to be motivated by a new technique they use called the Local Boosted Discriminant Projections (LBDP) Learning Algorithm, which is a dimensionality reduction technique which aims to minimize intra-class distance while maximizing inter-class distance, and works only with two classes.

Pipeline: External features are reconstructed with Building Blocks, and the BB reconstruction weights are used to represent the external features. This is in contrast to the previous paper, where the convolutions of the BB fragments with the original image were the representation. Internal features are represented as raw pixels. LBPD is used to reduce dimensionality.

Experiments contrast external features only versus internal features only, using the FRGC database and ARFace. They found external features outperform internal features in conditions of occlusion and variations in illumination, and otherwise internal features win.

Sunday, August 21, 2011

Literature review

We've set up a Mendeley group to organize our literature review. Unfortunately, it's not permitted to share pdfs in public Mendeley groups (limiting its usefulness). So we set up a parallel private group from which the public group will be occasionally updated.

"Relative Contributions of Internal and External Features to Face Recognition" by Jarudi and Sinha: This is a psychometrics paper, in which the subject's ability to recognize a celebrity's face is measured across variations in the presentation of the face. There are four conditions: 1) The eyes, nose, and mouth have been cropped and are presented individually, 2) Only the interior of the face is visible, 3) The entire head is visible, but the face has been blurred, 4) The entire head is visible.

Performance in each of these conditions is measured as a function of the amount of synthetic blur added to the image. A result: performance with external features is robust to blurring; this isn't surprising, but it is useful to have quantified as it helps justify the use of external features when the data is blurry surveillance video. Another result: performance with whole faces is better than performance with external features plus performance with internal features. I haven't yet decided what this means.

Also, in the related works mentions the idea of a hierarchy over facial features (which features are the most useful for discrimination), citing Fraser and citing Haig. Apparently for unfamiliar faces, the shape of the head is the most informative, while for familiar faces, what's inside the face is most informative.

"Are External Face Features Useful for Automatic Face Classification?" by Lapedriza and Masip: A computer vision paper in which external facial cues are used to determine gender. More precisely, their system uses three rectangular regions around the face (they assume the face shot is frontal and aligned): a block to the left of the face (ears and side hair), a block above the face (forehead and top hair), and a block to the right of the face. This mostly captures hair style, which (obviously) predicts gender. They have a heuristic translation-invariant nonnegative matrix factorization (NMF) scheme which uses normalized cross correlation to determine where to place features, which is fairly cool, though also crying out for rigor. The weights they get when doing the NMF reconstruction become the descriptor for an image, which they feed to an ML black box.

This paper seemed more of a proof-of-concept than an attempt to build the best system possible; they start knowing the faces are aligned, and use that information to black out the interior of the face, surely losing something in the process. Still, we could cite this in the vein of "stuff you can do with just external features".

Tuesday, August 16, 2011

Basic idea

This research idea fits in the category of using face bounding boxes to determine identity. The techniques that I know of in this category do one of the following: 1) extract features uniformly throughout the box, 2) extract fiducials on the face, or 3) build a 3D model from an image. There are also face-agnostic techniques which have been successfully applied to recognition.

The idea: perhaps there is hitherto unexploited information in the shape of the face region in the bounding box. For example, a pudgy person could be distinguished from a gaunt person with a segmentation of face from background. Circumstantially supporting evidence: what's outside the face (jawline, hair, etc.) plays a major role in human visual perception of faces (Result 6 of Sinha et al.).

In this blog, we'll explore the "external face information" idea (scare quotes appropriate), including figuring out more precisely what it should mean and what other people have done. We'll do some literature reviews to start off, propose some ideas if they come, maybe run some tests, and generally play it by ear.