Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You never saw edges2cats I take it? https://affinelayer.com/pixsrv/

> I don't understand how the edges-to-faces can possibly work. The inputs seem to be black & white, and yet the output pictures have light skin tones.

The step you're missing is that an edge detector is run on the entire database of training images to produce a database of edge images. The input edge image is run against that corpus of edge images in order to find which edge images match, then sample the corresponding original color images and synthesize a new color image.



Thanks for that link, I'd never seen that before. In fact, the edges2shoes sample on that page exactly summarises the issue I have: You start with what effectively appears to be a rough line drawing sketch of a shoe, and the algorithm 'fills in' a realistic shoe to fit the sketch. The sketch never had any colour information and so the algorithm has to pick one for it. In their example output, the algorithm has picked a black shoe, but it could just as realistically chosen a red one. The colouring all comes from their training data (in their case, 50k shoe images from Zappos). So in short, the algorithm can't determine colour.

But shoes and cats are one thing; reconstructing people's faces is another. I know the paper & the authors are demonstrating a technology here, rather than directly saying "you can use this technology for purpose X", but the discussion in these comments has jumped straight into enhancing images and improving existing pictures/video. But there is a very big line between 'reconstituting' or 'reconstructing' an image and 'synthesising' or 'creating' an image, and it appears many people are blurring the two together. Again, in the authors' defence, they are clear that they talk about the 'synthesis' of images, but the difference is critical.


> So in short, the algorithm can't determine colour.

That's right. But with the caveat that a large training set can determine plausible colors and rule out implausible ones. This is more true for faces than for shoes! The point is that there is some correlation between shape and color in real life. The color comes from the context in the training set. This is what @cbr meant nearby re: "skin color is relatively predictable from facial features (ex: nose width), it should be able to do reasonably well."

There are CNNs trained to color images, and they do pretty well from training context: http://richzhang.github.io/colorization/

> there is a very big line between 'reconstituting' or 'reconstructing' an image and 'synthesising' or 'creating' an image, and it appears many people are blurring the two together.

Yep, exactly! Synthesis != enhance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: