Yes: now we all forget how to correctly manage memory. Rust: where resources are...

pcwalton · on Aug 19, 2016

"Rust: where resources are unlimited and aborting when you run out of them is okay"

I'd change "Rust" to "safe programming" here, but OK.

There have been lots of fun security vulnerabilities coming from programs trying to handle OOM gracefully. e.g. SpiderMonkey: https://bugzilla.mozilla.org/show_bug.cgi?id=982957, https://bugzilla.mozilla.org/show_bug.cgi?id=730415

Manishearth · on Aug 19, 2016

Rust has provided ways to not abort on oom for a while now.

wahern · on Aug 19, 2016

If by "a while" you mean several months. But, yes, the recent addition of catch_unwind answered one of my biggest gripes with Rust.

  https://doc.rust-lang.org/std/panic/fn.catch_unwind.html

Does anybody know if the standard library allocator aborts the process or throws a catchable exception? When they added catch_unwind they officially split panics into two types: abort panics and unwind panics. Code can choose one or the other when panic'ing. Also, programs can be built in such a way that all panics become aborts. That kind of sucks but it's probably one of the compromises needed to assuage those Rust developers who don't believe it's practical to handle OOM conditions.

steveklabnik · on Aug 20, 2016

To be clear, _every_ panic has to have the same implementation; you can't choose unwind vs abort for some panics vs others.

The default allocator crate always does an abort, both before and after this change.

wahern · on Aug 20, 2016

When building an app, does changing the default allocator effect all imported libraries? And is that allocator used for boxing, when boxing requires dynamic allocation?

steveklabnik · on Aug 20, 2016

Currently, "the allocator" is global in Rust. That is, the standard library uses liballoc to do all of its heap allocation, and it's liballoc that aborts on OOM.

You can of course write code that does anything, including use some other mechanism than liballoc.

There's motion on several fronts here:

The first, and currently unstable, is swapping out the implementation of liballoc for a different one. (we ship jemalloc and a pass-through to the system allocator with Rust)

The second, and currently in the "working on an RFC" phase, is per-object allocators.

Both of these are unstable because we're not 100% sure of the interface we want to stabilize yet, it's still a work in progress.

  > when boxing requires dynamic allocation?

Boxing always implies dynamic allocation.

vertex-four · on Aug 19, 2016

The vast majority of programs that you run on a day to day basis will abort when they run out of memory. On a default Linux system, there's not even any way to prevent that - you need to faff with the config to make it actually return errors when allocating memory.

wahern · on Aug 19, 2016

That's all irrelevant for a language which is supposed to be a systems programming language, however that term is defined.

vertex-four · on Aug 20, 2016

I don't think it is - less, bash, gcc, systemd, most interpreters, and Firefox will all pretty much just crash if malloc() fails - they're varying levels of things you'd want a "systems language" for. Unless you're trying to define systems language to mean one used for kernel or bare-metal embedded development (both cases where even in C, you skip the standard library and write your own memory allocation functions etc), there's really very few cases where you (a) are allowed to allocate memory and (b) need to fail gracefully somehow if you can't.

wahern · on Aug 20, 2016

I'm not saying that aborting on OOM never makes sense. I'm saying there are many situations where you shouldn't abort on OOM, and those situations overlap considerably with the situations in which a systems language is ideal[1]. So a systems language that doesn't allow for robust and fine-grained OOM handing isn't much of a systems language.

Example: Basically any highly concurrent network daemon that multiplexes many clients on the same thread. In that case, you want much more control over where to put your recovery point. Even if the process doesn't abort, if the recovery point is beyond the scope of the kernel thread (i.e. a controller thread, which was the recommended solution before catch_unwind), that can be really inconvenient, and also requires a lot of unnecessary passing of mutable state between threads, which is usually something you try to avoid.

[1] Lua has robust support for OOM recovery, which is noteworthy because there can be situations where you both want to handle OOM but where a scripting language is more preferable. Example: An image manipulation program with scriptable filters, where you don't want an operation that can't complete to take down your process or thread.

vertex-four · on Aug 20, 2016

On the other hand... who actually disables over-committing on their Linux servers or the machines they run image editors on so that you could actually handle out-of-memory errors? I've never come across anyone who's changed the setting unless told to by a specific piece of software (almost always database software).

wahern · on Aug 20, 2016

I do. Not only do I disable overcommit, I disable swap.

Most software is riddled with buffer overflows and other exploits, and yet it's rare that you come across an intruder while he's installing his rootkit. That doesn't mean it's not happening, just that people are ignorant about it; and that things can appear normal even with rootkits installed.

Like buffer bloat, people can be experiencing a problem without even realizing it's a problem. When software crashes under load they just think that it's _normal_ to crash under load.

Or when it crawls to a snails pace under load because it's swapping like mad, they think that's normal, even though QoS would have been much better if the software failed the requests it couldn't serve rather than slowing everybody down until they _all_ timeout, sometimes even preventing administrators from diagnosing and fixing the problem.

OOM provides back pressure. Back pressure is much more reliable and responsive than, e.g., relying on magic constants for what kind of load you _think_ can be handled.

nnethercote · on Aug 20, 2016

> Firefox will all pretty much just crash if malloc() fails

It's not nearly as simple as that. A general approach we try to use is the following.

- Small allocations are infallible (i.e. abort on failure) because if a small allocation fails you're probably in a bad situation w.r.t. memory.

- Large allocations, especially those controlled by web content, are fallible.

The distinction between small and large isn't clear cut, but anything over 1 MiB or so might be considered large. You certainly don't want to abort if a 100 MiB allocation fails, and allocations of that size aren't unusual in a web browser.

wahern · on Aug 20, 2016

That approach may make sense on the desktop, but it doesn't make sense for network servers. On a network server allocations tend to be small. Your 10,001st request might fail not because of a huge allocation but because it just pushed you over the line.

Aborting all 10,000 requests provides really poor QoS. You can get away with that if you're Google, because 10,000 requests is still miniscule relative to your total load across "the cloud". For almost everybody else it will hit hard, and among other things makes you more susceptible to DoS.

pcwalton · on Aug 20, 2016

So, to be clear, you're implying that you shouldn't write network services in Python, Ruby, PHP, Go, Java in practice [1], Perl, C++ without exceptions, JavaScript, etc. etc.?

[1]: http://stackoverflow.com/questions/1692230/is-it-possible-to...

quotemstr · on Aug 20, 2016

Malloc can always fail, even when you have overcommit enabled. You can always run out of address space. It's pretty common on mobile, which is still predominantly 32-bit.