That's scary. Using deep learning to recognize "obstruction" using cameras.
Discriminate "car", "bike", "pedestrian", and "bus", sure. Recognize signs and traffic lights, fine. But "obstruction"? No. That's a geometry problem. If it's not flat road, it's an obstacle. Deep learning is for deciding what kind of obstacle. Because deep learning just isn't that good. It's going to be badly wrong a few percent of the time.
"The car will almost never be on the left side of the road, and the cameras will never flip angles, so training on flipped data forces the network to overgeneralize to situations it will never see." What could possibly go wrong?
> "The car will almost never be on the left side of the road, and the cameras will never flip angles, so training on flipped data forces the network to overgeneralize to situations it will never see." What could possibly go wrong?
That quote jumped off the page for me as well. This sounds like the start of a
new blog post: Falsehoods Programmers Believe About Driving
You can throw a few more in there.
- "Pedestrians will use crosswalks"
- "Pedestrians will be walking"
- "Traffic signals will always illuminate one of the three lamps"
- "Traffic signals have three lamps"
- "Lane markings are either white or yellow"
- "Lane markings exist"
- "There are lanes"
Your other comment is right on I think. Relying on DNNs to get near-100%
coverage on possible scenarios is a fool's errand. There's a very long tail of
possible circumstances on the road that no amount of training data is going to
cover. When you're trying to classify your photo library, this is okay. When
making real-time decisions on live input, it's not going to be okay. A more
structured and scrutible system is needed.
Until there's a breakthrough in our ability to understand what a NN "thinks", I
don't think we should place too much trust in them.
Or, I guess, we can wait and see if the imperfect NNs are doing better than
humans in the long run. From what I read, that may be true today only in ideal
circumstances (sunny, dry, maintained roads, etc.)
Structured, scrutible systems perform really badly on this task and I don’t think anyone knows how to do much better. The current state of the art is an extremely unstructured and inscrutable system with no design behind it whatsoever, and that system gets a lot of people killed. The moment anything can do better than that, we should push it hard even if it has stupid failures.
And what was an assumption, again becomes a circular axiom by the end of the line: "[it will be easy to do, because] the bar to clear is not high, [because it's easy]" Do you have any data to support that? (Or more precisely, are we talking about the 80/20 Pareto bar? ("it is okay, as long as it kills fewer people per million miles, on average"))
It’s not circular. I’m referring to specific objections people raise, like misclassifying objects or failing to react or whatever. The current state of the art often spends several seconds at a time with its cameras pointed at a screen instead of at the road.
I appreciate you continuing to refer to sacks of meat as "the current state of the art". It does drive the point home that we don't need to outrun the bear, just the other camper :)
I suppose that's a part of the issue: the SDV camp has been overenthusiastic in flying their "Mission Accomplished" banners, and hitting (pun not intended) yet another unexpected problem right afterwards. In other words, we're not at that point yet - in fact, we might not even have the complete toolset to measure this.
The field is in flux - alchemic approaches "what if we try something unrelated" might work for unrelated reasons etc.; practical applications that don't spontaneously combust are still some way out there.
Okay, "baseless" then. "It's easy [that's the baseless claim], we just haven't gotten around to it [which tends to point to the task not actually being easy]." In my opinion, this is the same class of "easy" that AI researchers have been chasing for half a century now, and the end goal always seems to be juuust out of reach - in other words, seems easy but isn't.
I didn’t say it’s easy. My point is merely that current systems are horribly flawed and the needs to be kept in mind when evaluating the flaws of the new systems. The fact that neural net systems are hard to understand and have no systematic design is not necessarily a blocker.
Not a blocker, but an eventual solution needs to come up with a meaningful comparison metric - the ones referred to now are meaningless ("no crashes except for things that are not counted as crashes under our definition, which doesn't look suspiciously circular").
It's also concerning that the system seems to rely on processing images independently - I would have expected at least parallax and previous-frame processing in there, possibly an edge detection pass. All very traditional computer vision techniques. Parallax will tell you how far away things are and automatically separate out things at different depths. Previous-frame will show you how things are moving relative to the vehicle.
If DNNs are unreliable, why is it ok to use them to discriminate a car from a pedestrian? The planning that you do in order to avoid hitting a car is very different from what you need in order to avoid a pedestrian. If you erroneously treat a pedestrian as a car, you are very likely to cause a tragedy.
If it's a pedestrian, you will do whatever is necessary to avoid it. You will break suddenly or you will even do a very heavy turn. In other words, you will probably put your own life in danger, and for a good reason.
If it's a car, you will NOT do whatever is necessary to avoid it (I'm speaking about low speed crashes here). You will prefer a "soft" crash that doing certain super dangerous maneuvers. Like, you will brake and "accept" a soft crash instead of doing a ultra heavy turn to the sidewalk.
Crashing two cars together, at a low speed, will bring you some bent metal, a pile of paperwork, and expenses - "brake as hard as reasonable, don't worry too much about collision avoidance." Crashing a car into a pedestrian, even at low speed, will get you manslaughter charges - "avoid collision at all costs." Is that different enough?
in addition to the comments about having a much stronger bias against hitting a pedestrian than a car, the pedestrian or car currently presenting an obstacle may make evasive actions (or other movements) of their own, which will be quite different...
Forgive me hijackjng a self-driving ML discussion to ask a novice question on self-driving — “stateful” vs “stateless”, and how can state be used safely?
Specifically I mean pre-determined saved/downloaded knowledge about the current route being driven, or your specific instantaneous location. Anything from annotated maps, to LIDAR scans.
It seems like the paradox is that stateful algorithms can dramatically improve performance by factoring in data that real drivers benefit from tremendously, namely, familiarity with the road.
However detecting when known state is violated due to some temporal change (road work, accident, natural disaster), and being able to shift back into “general flight” rules, seems in some ways to be even trickier than not using historical state in the first place.
So you need an algorithm that can learn to drive better than a human, without having basically any contextual knowledge of the road it’s driving on. In other words the algorithm needs to be a better driver the first time it ever goes down a road than a human who has driven the road for years.
As a human, familiarity with the road makes a tremendous difference for how I drive it. I generally drive the same roads 80% of the time (I.e. commuting) and knowing what to expect around each bend absolutely changes how I drive the road, even down to where the speed traps will be.
Any ML driving solution that depends on super-fancy stateful pre-scans of the environment seem fundamentally flawed. If you can’t drive a road safely that isn’t pre-scanned in hidef LIDAR for instance, I don’t suppose you can safely drive that road on an arbitrary Monday. Maybe solutions like this were never even attempted, but certainly some amount of statefulness is inherent in some of the commercial solutions out there (Supercruise?)
So what kind of state can you use safely? First thought was basics like speed limit, number of lanes, type of road surface, and your algorithm may have predictions of upcoming changes being constantly weighted against the current assessment of the present state.
So maybe the fundamental rule is no hard-coded state that can’t be reliably detected in real-time to the point where at some point a real-time signal is able to over-ride the programmed state?
But then it seems that inevitably you get to the point where your real-time classifiers are basically running the show anyway so are you back to - what good is a Map anyway?
Would love interested but accessible readings on the subject, or maybe it’s so off base that it’s not really part of the discussion?
The question of statefulness reminds me of some not-immediately-intuitive observations from game development. Say you're working on potentially-visible-set computation for a first-person shooter. Where do you spend your optimization efforts? Do you put a lot of work into taking advantage of object coherence between frames, for instance? Chances are good that something that's visible in one frame will remain visible in the next, right? So if you start with that assumption you might think you could save some time by not bothering to test any surfaces on that object for visibility until some nominally-unrelated condition is met, such as the player turning rapidly in place.
You can waste a lot of time thinking about optimizations like that, but at some point it'll occur to you that it's a giant waste of time to optimize for anything but the worst case, where visibility information computed during one frame is completely unusable in the next frame for whatever reason. Otherwise all that your clever frame-coherence hacks can ever do is speed up the rendering of the sorts of frames that weren't going to dominate the player's perception of the game's performance anyway. An engine that renders 95% of frames at 60 FPS and 5% at 43 FPS is going to look pretty terrible, so you're usually better off putting work into the slowest frames rather than wasting time looking for hacks and shortcuts that make the fast frames even faster.
Likewise, yes, you can assume that the car will almost never be traveling backwards on the left side of the road or whatever, so the temptation to take advantage of that is going to be high. But the cases where that assumption breaks down will hurt the user's experience badly, possibly disastrously. So you're better off without relying too much on assumptions that contain phrases like "hardly ever" or "most of the time" or "typically."
The AI people would argue that simply following the prime directive ("Don't hit anything") covers a multitude of driving sins, and they're not wrong in a technical sense. But the prime directive doesn't cover all of them. Driving defensively involves much more than just not hitting stuff.
My favorite example is a humorous image that went around several years ago, a photo that depicted someone driving in a Miata or similar convertible with the top down. The convertible was driving behind a sewage truck, the kind that has a large tank with a hose attachment to clean out septic tanks.
The sewage truck, in turn, was heading straight for an overpass that obviously had nowhere near enough clearance. An alert human driver would have no problem anticipating what was about to happen, but the oblivious one in the Miata was clearly about to find out the hard way.
Every time I find myself idly wondering if would be fun to work on self-driving cars, I flash back to that image. Then I get back to work on whatever I'm actually supposed to be doing.
"But then it seems that inevitably you get to the point where your real-time classifiers are basically running the show anyway so are you back to - what good is a Map anyway?"
Yep, pretty much. Robust real-time perception, for a wide variety of tasks that happen very rarely is one of the key bottlenecks.
"and the car will always be on the right side of the road (assuming US driving laws)." In other words, overfitting to USA, questionably usable anywhere else, definitely not in UK? Okay, this quote is worth preserving.
I find it interesting that the project team had been implementing several "improvements" that actually worsened performance, and that it took an intern to figure this out.
Or are these augmentations default-on options in deep learning frameworks?
Says so in the article. Essentially "we do this because everybody else does, therefore it's good." Note the title "ML practitioners" - it's used akin to alchemy these days: if the stars are right, it works.
not turned on by default but they are sort of a default choice for vision tasks. it may also be that they were useful on an earlier iteration of the model, and only became harmful later.
nobody’s quite sure why anything works, though. hence comparisons to alchemy — we have lots of cool empirical results, and a highly elaborated set of theories and ideas trying to explain what works and what doesn’t, but in fact these theories often fail to make correct predictions. So the only way to make progress is constant blind experimentation.
>> It’s worth noting that these augmentation tricks won’t work on datasets that include images from different camera types, at different angles and scales.
Discriminate "car", "bike", "pedestrian", and "bus", sure. Recognize signs and traffic lights, fine. But "obstruction"? No. That's a geometry problem. If it's not flat road, it's an obstacle. Deep learning is for deciding what kind of obstacle. Because deep learning just isn't that good. It's going to be badly wrong a few percent of the time.
"The car will almost never be on the left side of the road, and the cameras will never flip angles, so training on flipped data forces the network to overgeneralize to situations it will never see." What could possibly go wrong?