Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For a few more very common examples of humans using context:

A human will spot a person wearing headphones and recognize that person has a low situational awareness. The computer doesn't come close to even having the optical resolution to do that if the AI was perfect - remember human vision is 570+ megapixels, even a 4K video stream is literally two orders of magnitude lower.

[Now think about the fact that if we built a camera capable of recording 400 megapixels, you'd currently need to schlep around a ~750 lbs 25 node cluster, consuming about 50 horsepower to feed it with electricity, just to be able to process the video stream at 25 fps. Moore's law aint' growing that fast these days, so matching the resolution of human vision is not a realistic option.]

Another example is kids. How does the AI recognize that the 5'1" 30-year-old woman has much better awareness and can be treated differently from the 5'2" 12-year-old boy? Humans can spot that difference even from behind.

How about recognizing an adult who is drunk? Or a blind person? Mourners at a funeral, or fans celebrating after a football game? Or a million other conditions that significantly affect pedestrian situational awareness that human drivers will instantly infer from context?

What will happen when kids figure out they can stop a driverless car on its way to collect its owner just by standing in the street in front of it? They'll have a lot of fun, for sure.

How about when carjackers figure out the same? That they can dress up like construction workers, stop the car in the street, tow it onto a flatbed with built-in RF jammer and head straight for their underground chop shop? There goes your cheaper insurance.



> remember human vision is 570+ megapixels

This seems to come from http://www.clarkvision.com/articles/eye-resolution.html

But that number is a calculation of the maximum resolving power of the human eye filled across a 120 degree field of view. The fovea is the only portion of the retina that actually attains that acuity and it encompasses roughly 2 degrees in the center of the retina.

There are roughly 120 million rod cells and 6 million cone cells in the retina. The rod cells for color vision and cone cells for low light. As each individual rod cell is primarily sensitive to one of red, green or blue they match fairly well to the rgb channels of a pixel. So the eye could be considered to provide data roughly equivalent to a 40 megapixel color image and grayscale 6 megapixel. So ~5 times a 4k image.

Edit: And even that actually over estimates the amount of data the brain is actually processing. A 4k 60 fps video is handled by 6Gbps and the human optic nerve only has roughly 8.75Mbps of bandwidth.


A lot of processing is done "on the road", starting from ganglia in the retina itself; so a direct comparison with 4k 60fps video is grossly incorrect.


The computer could just assume the worst case:

That all people are classified as drunk children wearing headphones with low situational awareness.


If the self-driving proponents were to say that human drivers would be banned, and all vehicles would therefore be self-driving I'd have more respect for their arguments. That said, you still need to account for things like pedestrians and snow. If we're talking about "self-driving on the I-5 when it isn't raining and there are no human drivers permitted" then yes I think we're probably close..




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: