Facebook machine learning aims to modify faces, hands and… outfits

INSUBCONTINENT EXCLUSIVE:
The latest research out of Facebook sets machine learning models to tasks that, to us, seem rather ordinary — but for a computer are
still monstrously difficult
These projects aim to anonymize faces, improvise hand movements and — perhaps hardest of all — give credible fashion advice. The
research here was presented recently at the International Conference on Computer Vision, among a few dozen other papers from the company,
which has invested heavily in AI research, computer vision in particular. Modifying faces in motion is something we&ve all come to associate
with &deepfakes& and other nefarious applications
But the Facebook team felt there was actually a potentially humanitarian application of the technology. Deepfakes use a carefully cultivated
understanding of the face features and landmarks to map one person expressions and movements onto a completely different face
The Facebook team used the same features and landmarks, but instead uses them to tweak the face just enough that it no longer recognizable
to facial recognition engines. This could allow someone who, for whatever reason, wants to appear on video but not be recognized publicly to
do so without something as clunky as a mask or completely fabricated face
Instead, they&d look a bit like themselves, but with slightly wider-set eyes, a thinner mouth, higher forehead and so on. The system they
created appears to work well, but would of course require some optimization before it can be deployed as a product
But one can imagine how useful such a thing might be, either for those at risk of retribution from political oppressors or more garden
variety privacy preferences. In virtual spaces it can be difficult to recognize someone at all — partly because of the lack of nonverbal
cues we perceive constantly in real life
This next piece of research attempts to capture, catalog and reproduce these movements, or at least the ones we make with our hands. It a
little funny to think about, but really there not a lot of data on how exactly people move their hands when they talk
So the researchers recorded 50 full hours of pairs of people having ordinary conversations — or as ordinary as they could while suited up
in high-end motion capture gear. These (relatively) natural conversations, and the body and hand motions that went with them, were then
ingested by the machine learning model; it learned to associate, for example, that when people said &back then& they&d point behind them, or
when they said &all over the place,& they&d make a sweeping gesture. What might this be used for? More natural-seeming conversations in
virtual environments, perhaps, but maybe also by animators who&d like to base the motions of their characters in real life without doing
motion capture of their own
It turns out that the database Facebook put together is really like nothing else out there in scale or detail, which is valuable in and of
itself. Similarly unique, but arguably more frivolous, is this system meant to help you improve your outfit
If we&re going to have smart mirrors, they ought to be able to make suggestions, right? Facebook picks up retail computer vision outfit
GrokStyle Fashion++ is a system that, having ingested a large library of images labeled with both the pieces worn (e.g
hat, scarf, skirt) and overall fashionability (obviously a subjective measure), can then look at a given outfit and suggest changes
Nothing major — it isn&t that sophisticated — but rather small things like removing a layer or tucking in a shirt. It far from a
digital fashion assistant, but the paper documents early success in making suggestions for outfits that real people found credible and
perhaps even a good idea
That pretty impressive, given how complex this problem proves to be when you really consider it, and how ill-defined &fashionable& really
is. Facebook ICCV research shows that the company and its researchers are looking fairly broadly at the question of what computer vision can
accomplish
It always nice to detect faces in a photo faster or more accurately, or infer location from the objects in a room, but clearly there are
many more obscure or surprising aspects of digital life that could be improved with a little visual intelligence
You can check out the rest of the papers here.