Not to go all 'Rust Evangelism Strike Force' but almost universally, these exploits leverage memory unsafety somewhere in the stack, usually in a parser of some kind (image, text, etc). The fact that this is still tolerated in our core systems is a pox on our industry. You don't have to use Rust, and it won't eliminate every bug (far from it), but memory safety is not optional.
We truly need to work more towards eliminating every memory unsafe language in use today, until then we're fighting a forest fire with a bucket of water.
It's worth engaging with the fact that essentially nobody disagrees with this (someone will here, but they don't matter), and that it's not happening not because Apple and Google don't want it to happen, but because it's incredibly, galactically hard to pull off. The Rust talent pool required to transition the entire attack surface of an iPhone from C, C++, and ObjC to Rust (substitute any other memory safe language, same deal) doesn't exist. The techniques required to train and scale such a talent pool are nascent and unproven.
There is probably not a check Apple can write to fix this problem with memory safe programming languages. And Apple can write all possible checks. There's something profound about that.
Well to some extent these companies are self sabotaging by centering interviews around algorithm problems, not only by selecting a certain kind if talent for further investment of resources, but also by signaling to the market the kinds of training needed to land a good job.
If instead, the talent pool were incentivized to increase their ability to understand abstractions, and we selected for that kind of talent, it might not be so hard to use new languages.
Wait, this whole thread is about moving to languages that eliminate classes of security holes by virtue of the language itself. The premise is that being a security conscious programmer is not by itself enough to achieve good security.
In Apple's case they wouldn't need to move everything to Rust. Swift is a little bit higher level and a lot of stuff could be moved into it, with Rust as the lower level layer to replace ObjC / C / C++.
Still a gargantuan effort, but for them it doesn't require everyone learn Rust, just to learn Swift, which is kind of table stakes for a lot of user facing dev I'm sure there.
I honestly don't understand this. If Google or Apple wanted it to happen, they could force those developers to learn Rust. Are you saying the people that wrote the products in question can't learn Rust well enough to achieve the goal?
Forcing their employees to learn rust doesn't mean Google has the capacity to rewrite all their software in rust. They have tons and tons of code which would need to be rewritten from scratch.
Of course if they dropped all other development and told their employees to rewrite to rust, they may end up with a piece of software written in rust but no customers.
I agree, but there's so many people at Google (132,000 if you can believe the search results), it's hard for me to believe they couldn't devote a small percentage of them to moving to a secure stack.
Let's say 50 000 write code and a small percentage is 10%. So then your idea is that 45 000 would continue to write code in the unsafe languages and 5 000 would rewrite old and newly written code in rust? How many years do you think it would take for the 10% of developers to write all the old code and all the newly written by the 90%?
Start with the fact that practically all software development at Apple and Google would cease for multiple months while people dealt with the Rust learning curve, which is not gentle, and proceed from there to the fact that Rust demands (or, at least, urgently requests) architectural changes from typical programs designed in other languages.
Now: rewrite 20 years worth of code.
Let's make sure we're clear: I agree --- and, further, assert that every other serious person agrees --- that memory safety is where the industry needs to go, urgently.
When you say architectural changes how do you mean? Most of the memory stuff isn't particularly exotic there's a lot of different syntax/Functional programming influences but I'm curious why it would be wildly exotic compared to most C++ code or have I misunderstood?
It doesn't matter how exotic something is when you're taking about rewriting an entire platform - the sheer amount of man hours required to reimplement something for an advantage the vast majority of customers simply don't value enough is the limiting factor.
In that context, even a small architectural difference can be seen as a high barrier.
It'd still be like replacing the engines of an aeroplane mid-flight surely? I know Rust can do C interop and it'd probably be done piecemeal but it'd still be an absolutely gargantuan task. I'd say there's a fair chance the sheer time and effort such an undertaking would involve would cost more than the memory safety bugs using C or C++ introduces.
You don't need to move the entire attack surface of the iphone to Rust. There are plenty of smaller areas that tend to have the most vulnerabilities. They could absolutely write a check to radically reduce these sorts of issues.
It'll take years to have impact, but so what? They can start now, they have the money.
> nobody disagrees with this (someone will here, but they don't matter)
There are so many people out there who don't understand the basics. HN can be sadly representative.
I' don't think the real question is “how feasible is it to rewrite everything in Rust”, because as you say, the answer to this question is clearly “not a all”. But “rewriting all parsers and media codec implementation” is a much smaller goal, and so is “stop writing new codec implementation in memory unsafe language”, yet none of those two more achievable are being pursued either, which is sincerely disappointing.
macOS runs the Darwin kernel (developed at NeXT using the Mach kernel, then at Apple). NeXTSTEP was based on a BSD UNIX fork. Development of BSD at Berkeley started in 1977. NeXT worked on their kernel and the BSD UNIX fork in the '80s and '90s before being purchased by Apple. NeXTSTEP formed the base of Mac OSX (which is why much of the Objective-C base libraries start with `NS-something`. There is 45 years worth of development on UNIX, and Linux is a completely different kernel with a completely different license. Linux kernel has been in development for about 31 years.
Languages and understanding them is not special, but decades of development of two different kernels is a huge time investment. Even though Linus Torvalds wrote the basic Linux kernel in 5 months, it was very simple at first.
I doubt writing an entire POSIX-compatible replacement for a kernel would be a small or quick endeavor, and Apple has shown resistance to adopting anything with a GPL 3 license iirc. That is why they switched to ZSH from Bash.
For code added in the future, you need devs no matter what language they use, so switching their language is the easy part of this large hard project.
For code added in the past, more evolution means that for every X lines of code written, a smaller and smaller fraction of X still exists. Which means less work to replace the end product.
As far as I'm aware, every major company in the industry is working on exactly this. I'm telling you why we don't just have an all-memory-safe iPhone right now, despite Apple's massive checking account. I'm not arguing with you that the industry shouldn't (or isn't) moving towards memory safety.
Do you think Apple is already at the frontier of what can be done to detect or refactor out these bugs in their existing languages? Static analysis, Valgrind, modern C++, etc?
Honestly at this point I’ve given in and am now advocating that we rewrite every damned widget from scratch in Rust, because by the time we’re mostly done, my career will be winding down, and seeing that shit still gets pwned like, exactly as much, will be “good TV”.
Rust is cool because it’s got a solid-if-slow build story that doesn’t really buy into the otherwise ubiquitous .so brain damage. Rust is cool because Haskell Lego Edition is better than no Haskell at all, and Rust is cool because now that it’s proven affine/linear typing can work, someone will probably get it right soon.
But if I can buy shares in: “shit still gets rocked constantly”, I’d like to know where.
> Honestly at this point I’ve given in and am now advocating that we rewrite every damned widget from scratch in Rust, because by the time we’re mostly done, my career will be winding down, and seeing that shit still gets pwned like, exactly as much, will be “good TV”.
Rust won't solve logic bugs but it can help bring up the foundations. So long as memory safety bugs are so pervasive we can't even properly reason on a theoretical level about logic bugs. The core theorem of any type system is "type safety" which states that a well-typed program never goes wrong (gets stuck, aka UB). Only then can you properly tackle correctness issues.
> Rust is cool because Haskell Lego Edition is better than no Haskell at all, and Rust is cool because now that it’s proven affine/linear typing can work, someone will probably get it right soon.
I don't understand the condescending remarks about "Haskell Lego Edition".
I do agree that Rust has shown that substructural type systems work and are useful, and that they will be a 'theme' in the next batch of languages (or I can hope).
And frankly I don’t see how it’s even remotely fair to call a no-nonsense statement that some things are simplified versions of other things with a cheeky metaphor “condescending”.
I could just as easily throw around words like “anti-intellectual” if my goal was to distract from the point rather substantially replying.
But Rust isn't remotely a simplified version of Haskell, and I'm not sure where you got that impression. It's inspired by several languages, but is predominantly a descendant of ML and C++. The only similarity they have is that Rust traits resemble Haskell typeclasses, but even there they are quite different in semantics and implementation.
I like Rust in a lot of ways, I write a fuckload of it and I get value from doing so. Not “praise the lord” value, but real value.
But the attitude is an invitation to getting made fun of. It’s absurdly intellectually dishonest when Rust-as-Religion people actively hassle anyone writing C and then get a little precious when anyone mentions Haskell and then extremely precious when they step on the landmine of the guy who likes Rust enough to know the standard, the compiler, the build tool, the people who wrote the build tool, and generally enough to put it in its place from a position of knowledge.
SSH servers? Yeah, I’d go with Rust. Web browsers? In a perfect world, lot of work. Even for Mozilla who timed the fuck out on it.
Everything ever so no security problem ever exists ever again? Someone called it the “green energy of software security” on HN like this year.
It’s not the coolest look that one of my “blow off some steam” hobbies is making those people look silly, but there are worse ways to blow off some steam.
It sounds like you’re saying, you spent a lot of time focused on learning rust, so now you like to discuss its shortcomings as abrasively as you can for sport.
Upthread I’ve already surrendered. There are certain gangs you just don’t pick a fight with. I’m a slow learner in some ways but I get the message. Got it, learning Rust nuts and bolts only makes it worse to say anything skeptical about it.
Nearly every answer you gave in this thread doesn't address the parent comments point at all.
It seems you are just raging and reading subtext and drama where there is none.
Further up someone mentioned Rust and Haskell aren't similar and you go on about Rust-religion and where to use Rust. Why don't you just address the point? "Lego" is also not a synonym or metaphor for simplified.
Your argument seems to mostly boil down to "Rust isn't magic", which nobody is really arguing. It does help eliminate one class of really nasty bugs, which tend to repeatedly show up in a lot of massive security hacks, and which generally everyone would like to see eliminated. Therefore: use Rust.
Comparisons to other languages like Haskell don't really work, since they don't fit in the same space nor have the same goals as Rust or C.
lol if "shit still gets rocked" means "programs exit safely but unexpectedly sometimes" we're on very different pages
I'm searching your posts in this topic trying to find something of value and coming up short. You assert that you know rust, and therefor your opinions have merit, but... lots of people know rust and disagree. But somehow your opinions are More Right and the others are just religious Rust shills.
I don't think you know what you're talking about honestly. If you want to pick fights on HN that's cool, we all get that urge, but you're really bad at it.
The flaw in the idea of "rewrite it in rust" is that, next to the memory issues, the biggest issues are logic bugs.
Rewriting something from scratch isnt going to magically not have bugs, and the legacy system likely has many edge cases covered that a modern new implementation will have to learn about first.
Right, but a memory unsafety but is what takes a harmless logic bug in an image parser with no filesystem access to an RCE and sandbox escape.
Memory unsafety allows you to change the 'category' of the bug, you become free to do whatever whereas a logic bug forces to to work within the (flawed) logic of the original program.
Not necessarily; see https://github.com/LinusHenze/Fugu14/blob/master/Writeup.pdf for example. It's a full chain that repeatedly escalates privileges without exploiting any memory safety bugs by tricking privileged subsystems into giving it more access than it should have, all the way up through and beyond kernel code execution.
These Rust vs C comparison often get fixated on the, somewhat unique, memory safety advantage of Rust. But the proper comparison should be ANY modern language vs. C, cause those remove a heap of other C footguns as well. Most modern language have:
- sane integers: no unsafe implicit cast, more ergonomic overflow/saturate/checked casts
- sane strings: slices with length, standardized and safe UTF-8 operations
- expressive typing preventing API misuse: monads like Optional/Result, mandatory exception handling, better typedefs, ADTs vs tagged unions
And even without the full Rust ownership model, I'd expect the following to solve a majority of the memory safety problems:
- array bounds checks (also string bounds checks)
- typed alloc (alloc a specific type rather than N bytes)
- non-null types by default
- double-free, use-after-free analysis
- thread-save std APIs
In the write-up you linked, Section 2 is a missing error check => Result<T> would surface that. The macOS case contains a relative path vs string comparison => expressive typing of Path would disallow that. DriverKit exploit is a Non-NULL vs NULL API mistake. Kernel PAC is a legit ASM logic bug, but requires a confusion of kernel stack vs. user stack => might have been typed explicitly in another language.
That’s not a zero-click vulnerability though. I didn’t read the entire pdf but 2 of the first 4 steps involve active user participation and assistance (install exploit app 1 and exploit app 2).
I think regardless, you’re right, we will still have logic bugs… but that example is also an “exception proves the rule” kind of thing.
It's not a zero click, that is correct. I presented it as an example of how every layers of Apple's stack, far beyond what is typically targeted by a zero-click exploit chain, can still have logic bugs that allow for privilege escalation. It's not just a memory corruption thing, although I will readily agree that trying to reduce the amount of unsafe code is a good place to start fixing these problems.
An improvement is an improvement. A flaw of seatbelts is that some people still die when they wear them. That's not a valid argument to not wear seatbelts.
It's important to have good foundations (memory safety) because then it becomes much more attractive to spend effort on the rest of the correctness and security. If you want to build a sturdy house, and see how to make the roof well, don't give up on it just because you'll need to do something else for good doors and windows.
Do you imagine how long it would take to compile the Linux kernel if it were rust only? Not to mention the kernel has to allow for third party closed source stuff like drivers, wouldn't that force you to allow unsafe Rust and put you back to square one?
That seems an insignificant price to pay if it would truly provide the promised benefits (big if). Even most Linux users don't compile the kernel themselves, and the 5% that do care can afford the time and/or computing resources.
I took the time to learn Rust well in spite of how annoying the Jehovas Witness routine has been for like, what, 5-10 years now? I worked with Carl and Yehuda both on the project right before Cargo (which is pretty solid, those guys don’t fuck around).
Do you have a different opinion on whether or not syntax for a clumsy Maybe/Either Monad is a bit awkward? Do you think that trait-bound semantics are as clean as proper type classes as concerns trying to get some gas mileage out of ad-hoc polymorphism? Do you think that the Rust folks might have scored a three-pointer on beating the Haskell folks to PITA-but-useable affine types?
You don't know what you're asking for. In reality, you'll end up replacing C code with memory unsafely with Rust code written by people who understand Rust less than they understand C. The problem? The Rust Evangelism Strike Force always assumes that if you replace a C program with a Rust program, it'll be done by a top-tier expert Rust programmer. If that isn't the case (which it won't be), then the whole thing falls apart. There are vulnerabilities in JS and Ruby code, languages that are even easier (and just as type-safe) as Rust.
There's something to be said for taking the entire class off vulnerability off of the table.
For instance, in the past I worked at a sort of active directory but in the cloud company. We identified parsers of user submitted profile pictures in login windows as a privilege escalation issue. We couldn't find memory safe parsers for some of these formats that we could run in all these contexts, and ended up writing a backend service that had memory safe parsers and would recompress the resulting pixel array.
Rust parsers at the time would have greatly simplified the workflow, and I'm not sure how we would have addressed the problem except as whack-a-mole at the time if there wasn't our central service in the middle (so MMS can't do that).
This is just incorrect. The beauty of Rust is even bad programmers end up writing memory safe code because the compiler enforces it. The ONLY rule an organization needs to enforce on their crappy programmers is not allowing use of unsafe. And there are already available tools for enforcing this in CI, including scanning dependencies.
I think what they're saying is that by making devs use a less familiar language, you're going to end up with at least as many security bugs, just ones not related to memory safety. (Not weighing in either way, just clarifying.)
Keeping mind that, if you have an RCE bug, any other class of bug is irrelevant. It's a bit like diagnosing someone with the flu after their head has been cut off. And while acknowledging that you're not personally weighing in either way, I will personally call the idea that you'll end up with just as many bugs of a weaker class to be quite silly. Everyone starts as unfamiliar in every language, but not every language makes it equally easy to accidentally introduce vulnerabilities. Defaults matter, tooling matters, and community norms matter, and all of these make it less likely for a low-quality Rust programmer to introduce vulnerabilities than even a medium-quality C programmer.
> There are vulnerabilities in JS and Ruby code, languages that are even easier (and just as type-safe) as Rust.
This is completely misleading. The vulnerabilities that exist in those languages are completely different. They often are also far less impactful.
Memory safety vulnerabilities typically lead to full code execution. It is so so so much easier to avoid RCE in memory safe languages - you can grep for "eval" and "popen" and you're fucking 99% done, you did it, no more RCE.
I think the only question that matters is how much longer it takes to write a moderately-sized program in Rust vs C. If it takes around the same time, then an average C programmer will probably write code with more bugs than an average Rust programmer. If it takes longer in Rust, the Rust programmer could start taking some seriously unholy shortcuts to meet a deadline, therefore the result could be worse.
All code can have bugs, it's mostly just a question of how many. Rust code doesn't have to have zero bugs to be better than C. It's not like all C programmers are top-tier programmers and all Rust programmers are the bottom of the barrel.
I've written a few things at work in C/C++ and Rust. I can move much faster in Rust, personally, as long as the pieces of the ecosystem I need are there. Obviously I only speak for myself.
Part of that is because I'm working in code where security is constantly paramount, and trying to reason about a C or C++ codebase is incredibly difficult. Maybe I get lucky and things are using some kind of smart ptr, RAII and/or proper move semantics, but if they're not then I have to think about the entire call chain. In rust I can focus very locally on the logic and not have to try and keep the full codebase in my head
That assumes that writing unsafe code would make you go faster. It wouldn't. In general if you want to write code in Rust more quickly you don't use unsafe, which really wouldn't help much, but you copy your data. ".clone()" is basically the "I'll trade performance for productivity" lever, not unsafe.
Rust doesn't need top tier programmers. It just needs competent programmers.
The C in use today wasn't written by experts either. And if it was, we can leave it alone for now, or at least until said experts get tired of maintaining it.
1. In the video he's saying that you can't replace memory safety mitigation techniques like ASLR with memory safe languages. He notes that there will always be some unsafe code and that mitigation techniques are free, so you'll always want them.
No one should disagree with that. ASLR is effectively "free", and unsurprisingly all Rust code has ASLR support and rapidly adopts new mitigation techniques as well as other methods of finding memory unsafety.
2. The link about replacing gnu utils has nothing to do with memory safety. At all.
Even if it were related, it would simply be an argument from authority.
Wouldn't it be a lot easier to just use a C compiler that produces memory-safe code?
I'm sure someone else has already thought of this, but in case not... All you need to do is represent a pointer by three addresses - the actual pointer, a low bound, and a high bound. Then *p = 0 compiles to code that checks that the pointer is in bounds before storing zero there.
I believe such a compiler would conform to the C standard. Of course, programs that assume that a pointer is 64-bits in size and such won't work. But well-written "application level" programs (eg, a text editor) that have no need for such assumptions should work fine. There would be a performance degradation, of course, but it should be tolerable.
That's essentially what ASAN is, with some black magic for performance and scope reasons. The problem is that ensuring that your code will detect or catch memory unsafety isn't enough, because the language itself isn't designed to incorporate the implications of that. If you're writing a system messenger for example, you can't just crash unless you want to turn all memory unsafety into a zero-click denial of service.
Programs that would crash when using the memory-safe compiler aren't standards conforming. If you're worried that programs crashing due to bugs can be used for a denial-of-service attack... Well, yes, that is a thing.
Low-level OS and device-handling code may need to do something that won't be seen as memory safe, but I expect that for such cases you'd need to do something similarly unsafe (eg, call an assembly-language routine) in any "memory safe" language.
I'm not familiar with how ASAN is implemented, but since it doesn't change the number of bytes in a pointer variable, I expect that it either doesn't catch all out-of-bounds accesses or has a much higher (worst case) performance impact than what I outlined.
I brought up ASAN because it's a real thing that already exists and gets run regularly. The broad details of how ASAN is implemented are best summarized in the original paper [1]. The practical short of it is that there are essentially no false negatives in anything remotely approaching real-world use. A malicious attacker could get around it, but any "better algorithm" would still run into the underlying issue that C doesn't have a way to actually handle detected unsafety and no amount of compiler magic will resolve that.
You have to change the code. Whether that's by using another language annotations or through annotations like checked C is an interesting (but separate) discussion in its own right.
As for the point that programs with memory unsafety aren't standards conforming; correct but irrelevant. Every nontrivial C program ever written is nonconformant. It's not a matter of "just write better code" at this point.
From the linked ASAN paper: "...at the relatively low cost of 73% slowdown and 3.4x increased memory usage..."
That's too big a performance hit for production use - much bigger than you would get with the approach I outlined.
I don't agree that any nontrivial C program is nonconformant, at least if you're talking about nonconformance due to invalid memory references. Referencing invalid memory locations is not the sort of thing that good programmers tolerate. (Of course, such references may occur when there are bugs - that's the reason for the run-time check - but not when a well-written program is operating as intended.)
I usually find it safe to assume that compiler folks are conscious of optimization opportunities and make pretty intelligent tradeoffs on that spectrum. This is one such case. There's a long history of bounds checking compilers. The first that I know of is bcc in the 80s, which had a 10x slowdown! Austin et al. [1] came along a few years later way back in '94 and improved things to a mere 2-5x slowdown. That's pretty much where things stood for the next two decades because pointer accesses are everywhere in C and register pressure is nothing to sneeze at. Moreover, changing pointer sizes breaks your ability to link external things that weren't compiled with the same flags, like the system libc. ABI compatibility is make-or-break for a C compiler. You can get around that by breaking up the metadata from the actual pointer (e.g. softbound), but the performance cost is still ~3-4x [2].
ASAN was notable because
1) it was very efficient. That initial 73% was utterly fantastic at the time.
2) It was production-usable (i.e. worked on big codebases)
3) With hardware support, the performance hit is often under 10%. HWAsan on modern platforms is low-cost enough to run it all the time.
And no, I'm saying that pretty much every nontrivial C program has UB, not that they're specifically memory unsafe.
With all due respect, why do you assume that your “thought about it 3 mins straight” idea would perform better than one that has been in the works for a long time now by people working on similar topics all of their lives?
Don’t get me wrong, I often fell into this as well, but I think programmers really should get a bit of an ego-check sometimes, because (not you) it often affect discussions in other fields as well where we don’t know jackshit about.
I do this pretty often, and it's often a very valuable exercise, even though I'm almost always wrong. Interrogating the apparent contradiction between my beliefs and existing reality is a highly fruitful learning experience. There are several serious failure modes, though:
1. I can get my ego so wrapped up in my own idea that, even once I have the necessary information to see that it's wrong, I still don't abandon it. In fact, this always happens to some extent; when I change my mind it's always embarrassing in retrospect how blind I was. But the phenomenon can be more or less extreme.
2. In a context where posturing to appear smart and competent is demanded, such as marketing, advocating totally stupid ideas puts me at a disadvantage, even if I recant later. Maybe especially then, because it reminds people who might have forgotten.
3. People who know even less than I do about a subject may be misled by my wrong ideas.
4. This approach is most productive when people who know more than I do about a subject are kind enough to take the time to explain why my ideas are wrong. This happens surprisingly often, both because people are often kind and because the people who know the most about a subject are generally very interested in it, which means they like to talk about it. Still, attention of experts is a valuable, limited resource.
5. People who know more than I do about a subject can get angry and defensive when I question something they said about it, particularly if they're mediocre and insecure. The really top people never act this way, in my experience; if they pay attention at all, either they can explain immediately why I'm wrong, as AlotOfReading did here (though I may not understand!) or they go "Hmm, now that's interesting," before figuring out why I'm wrong. (Or, occasionally, not.) But people with a good working understanding of a field may know I'm wrong without knowing why. And there are always enormously more of those in any field than really top people.
So, I try to do as much of the process as possible in my own notebook rather than on permanently archived public message boards. The worst is when group #3 and #5 start arguing with each other, producing lots of heat but no light.
My theory about why the angry and defensive people in group #5 are never the top people is that they stopped learning when they reached a minimal level of competence, because their ego became so attached to their image of competence that they stopped being able to recognize when they were wrong about things, so they are limited by whatever mistaken beliefs they still had when they reached that level. But maybe I'm just projecting from my own past experience :)
Yes, I know. But this thread is about detecting invalid memory references in production, to prevent security exploits. ASAN seems too slow to solve that problem.
Based on recent experience, you'd really want your media decoders compiled with a safe compiler, and if it crashes, don't show the media and move on. Performance is an issue, but given the choice between RCE and DoS, DoS is preferable.
It would be nice if everything was memory safe, but making media decoding memory safe would help a lot.
I absolutely agree that it's a step in the right direction. My point is that we can't get all the way to where we want to be simply by incremental improvements in compilers. At some point we have to change the code itself because it's impossible to fully retrofit safety onto C.
There are similar approaches, ie: Checked-C which work surprisingly well. However, I'm not sure that this approach would be expressive enough to handle the edge cases of C craziness and pointer arithmetic. There's more to memory unsafety than writing to unallocated memory, even forcing a write to slightly wrong memory (ie setting `is_admin = true`) can be catastrophic.
I think it handles all standards-conforming uses of pointer arithmetic. Even systems-level stuff like coercing an address used for memory-mapped IO may work. For example,
struct dev { int a, b; } *p; p = (struct dev *) 0x12345678;
should be able to set up p with bounds that allow access only to the a and b fields - eg, producing an error with
int *q = (int *) p; q[2] = 0;
Of course, it doesn't fix logic errors, such as setting a flag to true that shouldn't be set to true.
Yes, such approaches can be compliant. There's even a few C interpreters. Very popular back in the day for debugging C programs when you didn't have full OS debugging support for breakpoints and etc. Such an approach would be quite suitable for encapsulating untrusted code. There is definitely some major overhead, but I don't see why you couldn't use JIT.
Good point. There's also the problem of pointers to no-longer-existing local variables. (Though I think it's rare for people to take addresses of local variables in a context where the compiler can't determine that they won't be referenced after they no longer exist.)
Nothing wrong with Rust, but I still think making operating systems with airtight sandboxing and proper permission enforcement is the only thing that can truly solve these issues.
Only if the barriers have a finer resolution than a single application. Most applications need access to more than enough data to cause problems in the case of an exploit. You need sandboxing between different components of the application as well.
Still not enough, because apps still need to interact with the outside world, so there would have to be intentional holes in the sandbox out through which the compromised app could act maliciously.
That is why you need a well designed permission system. Android and iOs had a chance of doing this in a time when the requirements could reasonably be understood, but I don't think either came close.
It is a tradeoff. Making an airtight sandbox is not that hard. Making it run programs near hardware speed is a lot harder. Making it run legacy machine code is a nightmare.
JavaScript is not machine code, but still a good deal harder to make fast than a language designed for fast sandboxing. Of course there have been bugs, but mostly I think the JS VMs have done a pretty good job of protecting browsers.
Memory safety is optional in Rust. It might not be obvious at the moment, because Rust is written by enthusiasts who enjoy fighting with the compiler until their code compiles, but once developers will be forced to use it on their jobs with tight deadlines, unsafe becomes the pass-the-borrow-checker cheat code.
I write Rust at $WORK. Using `unsafe` to meet a deadline makes 0 sense. It doesn't disable the borrow checker unless you're literally casting references through raw pointers to strip lifetimes, which is... insane and would never pass a code review.
99% of the time if you're fighting the borrow checker and just want a quick solution, that solution is `clone` or `Arc<Mutex<T>>`, not `unsafe`. Those solutions will sacrifice performance, but not safety.
Rust is used in production though at large companies: Amazon, Microsoft, Mozilla, etc... I would be highly surprised that the borrowchecker would be the reason code couldn't ship in the first place, once you get over the initial mental hurdles it's usually a non-issue.
Besides equivocating between a pervasively unsafe-by-default language and one with an explicit bounded opt-in is a little disingenuous. Time after time, it has been shown that even expert C developers cannot write memory safe C consistently, each line of code is a chance to blow up your entire app's security.
I think unsafe rust is a lot more awkward to work with and easier to cause UB with compared to c and especially c++. This is just my opinion though!
&mut aliasing is a good example of running into instant UB in unsafe rust, but there are many more that you have to be aware of.
I would check out the unsafe rust "book" for yourself and see what you think. There is a section where you implement Vec and some other data structures from scratch!
I think it's easier to write correct safe Rust than C, I wouldn't say it's easier to write correct Rust with unsafe blocks than C (many operations strip provenance, you can't free a &UnsafeCell<T> created from a Box<T>, you can't mix &mut and const but you might be able to mix Vec<T> and const (https://github.com/rust-lang/unsafe-code-guidelines/issues/2...), self-referential &mut or Pin<&mut> is likely unsound but undetermined), and it's absolutely more difficult to write sound unsafe Rust than C (sound unsafe Rust must make it impossible for callers to induce UB through any possible set of safe operations including interior mutability, logically inconsistent inputs, and panics).
I must be more tired than I thought if I said “non-tight Rust” and forgot ten minutes later.
I just think if mistakes need to be literally low as possible you’ve got a better bet than Rust unsafe.
The language spec is smaller, the static analyzers have been getting tuned for decades, and the project leaders arent kinda hostile to people using it in the first place.
We could set up a prediction market for this. A study would be performed of attempts ti pentest randomly selected unsafe Rust and tight ANSI C programs. A prediction market used to estimate the probability of either language winning before publication of results.
Someone needs to make this a thing.
Unsafe code is rarely necessary, especially unsafe code that isn't just calling out to some component in C. You can easily forbid developers from pushing any code containing `unsafe` and use CI to automatically enforce it.
This is kinda disingenuous. Whenever we people use unsafe its like an alarm because you can setup CI system that warns DevOps team regarding the usage of unsafe code.
And, most of the time unsafe code is not required. I think many people will just use clone too much, or Arc rather than unsafe. Additionally, I have never seen unsafe code at least where I work.
Following this idea, using a memory managed language like Go, Java or C# should also prevent most security issues (at least in non-core systems). Somehow I don't think this would work.
While I think garbage collected languages produce programs that are more safe, I also think they are often enablers for new classes of security issues. For example ysoserial, log4j etc.
The underlying bug in log4j is having a deserialization mechanism that can automatically deserialize to any class in the system, combined with method code that runs upon deserialization that does dangerous things. It has nothing to do with GC at all.
It's a recurring problem in dynamic scripting languages where the language by its very nature tends to support this sort of functionality. It's actually a bit weird that Java has it because statically-typed languages like that don't generally have the ability to do that, but Java put a lot of work into building this into its language. Ruby has a very large issue with this a few years back where YAML submitted to an Ruby on Rails site would be automatically deserialized and execute a payload before it got to the logic that would reject it if nothing was looking for it. Python's pickle class has been documented as being capable of this for a long time, so the community is constantly on the lookout for things that use pickle that shouldn't, and so far they've mostly succeeded, but in principle the same thing could happen with that, too.
It would be nearly impossible for Go (a GC'd language) to have that class of catastrophic security error, because there is nowhere the runtime can go to get a list of "all classes" for any reason, including deserialization purposes. You have to have some sort of registry of classes. It's possibly to register something that can do something stupid upon unmarshaling, but you have to work a lot harder at it.
Go is not unique. You don't see the serialization bugs of this type in C or C++ either (non-GC'd languages), because there's no top-level registry of "all classes/function/whatever" in the system to access at all. You might get lots of memory safety issues, but not issues from deserializing classes that shouldn't be deserialized simply because an attacker named them. Many other languages make this effectively impossible because most languages don't have that top-level registry built in. That's the key thing that makes this bug likely.
> The underlying bug in log4j is having a deserialization mechanism that can automatically deserialize to any class in the system
Getting objects out of a directory services is what JNDI is all about, I'm hesitant to call it a bug.
The bug is that Java is way too keen on dynamically loading code at runtime. Probably because it was created in the 90s, where doing that was kinda all the rage. I think retrospectively the conclusion is that it may be the easiest way to make things extensible short-term, but also the worst way for long-term maintenance. Just ask Microsoft about that.
"Getting objects out of a directory services is what JNDI is all about, I'm hesitant to call it a bug."
I didn't call it a bug. I called it a bit of functionality that makes the security problem possible. There are many things that result in security issues that come from some programmer making something just too darned convenient, but are otherwise "features", not some mistake or something.
It's the underlying problem. You should have to declare what classes are able to be deserialized. To the extent that it's inconvenient, well, so was the log4j issue.
No, not validating user input and passing it to some crazy feature rich library like JNDI is possible in any language. Not denying that Java did contribute by shipping with such an overengineered mess like JNDI in the first place.
Log4Shell wasn't a bug, log4j worked as expected and documented. It's just a stupid idea for a logging library to work in such a way.
No, it's not unique. But a GC strengthens, like dynamic typing, your ability to develop more dynamically and together with reflection or duck typing to write code that can deal with to some extent unknown input more easily. You can pass arbitrary objects (also graphs of objects) around very generously. Ysoserial is based more or less on this idea. Passing arbitrary objects around was deemed so useful, that it is also supported by java's serialization mechanism and thus could be exploited. Log4shell exploits similar mechanisms that would be a hell to implement in non-gc languages.
Around 70% of security flaws are from memory unsafety (according to Google and Microsoft), which isn't "almost universally" but is still a significant percentage and worth attacking. But we'll still have a forest fire to fight afterwards from the other 30%.
Yeah, I started noticing huge flaws in Apple's Music app, which I told them about and work around mostly, but...are they because Apple software is written in C? C++, Objective-C, same thing. Like can C code ever really be airtight?
I'd say lack of QA. Apple Music (especially on macOS) is EXTREMELY buggy, unresponsive, slow, and feels like a mess to use. Same for iMessage.
Other apps are also written using the same stack with almost no bugs. I wouldn't blame the language here, but the teams working on them (or more likely their managers trying to hit unrealistic deadlines).
No, I would not blame the teams or their managers. You can't just blame a manager you've never met just because he's a manager, we're talking about the manager of Apple Music, they could very well be capable and well-minded, likely personally capable of coding. So let me give you another example in the same vein as C, where everybody uses a technology that is terrible, questioning it only at the outset, and then just accepting it: keyboard layouts. QWERTY is obsolete. It wasn't designed at random, it would have aged better if it did, it was designed to slow down typing so typewriters wouldn't jam. And secondly, in order for salesmen to type "TYPEWRITER" with just the top row, so the poor woman he was selling didn't realize typewriters were masochistic. So that's how you end up with millions of people hunting and pecking, or getting stuck for months trying to learn touch typing for real with exercises like "sad fad dad." It takes weeks before you can type "the". It's just the network effects of keyboard layouts are just next-level. Peter Thiel talks about this in "Zero to One" as an example of a technology that is objectively inferior but is still widely used because it's so hard to switch, illustrating the power of network effects. I for one did switch, and it was hard because I couldn't type neither Qwerty nor Dvorak for a month. But after that Dvorak came easily, you don't need an app to learn to type, you just learn to type by typing, slowly at first, then very soon, very fast.
So with regard to C, I would say it is not objectively inferior like QWERTY became, it's actually pretty well designed. It does produce fast code. I use it myself sometimes, not a bad language for simple algorithm prototypes of under 60 lines. But it's based to a huge degree on characters, the difference between code that works and code that fails can come down to characters, C is about characters, pretty much any character, there's no margin of error. Whereas with Lisp, you have parentheses for everything, you have an interpreter but you can also compile it, I am actually able to trust Lisp in a way that is out of the question with C. There's just so incredibly many gotchas and pitfalls, buffer overflows, it's endless, you have to really know what you're doing if you want to do stunts with pointers, memory, void types.
I guess the bottom line is if you're want your code to be perfect, and you write it in C, you can't delegate to the language, you yourself have to code that code perfectly in the human capacity of perfection.
> you can't delegate to the language, you yourself have to code that code perfectly in the human capacity of perfection.
Clarifying that what I mean by this is that it's not realistic to expect large C codebases to be perfect. Bug-free, with no exploits. Perfect. Same thing.
You're being downvoted to oblivion, even though your general point (rephrased, C is an unforgiving language and safer languages are a Good Thing) is pretty mainstream. Here are my guesses why:
1. You start off by saying you can't just blame the team or their manager if you're dissatisfied with a product, then instead of explaining why the people who made a piece of software aren't responsible for its faults you go off on a long non-sequitur about QWERTY.
2. Your rant on QWERTY just isn't true. You namedrop Peter Thiel and his book, so if he's your source then he's wrong too. QWERTY is not terrible, not obsolete, it was not designed to slow down typists, and there's no record of salesmen typing "typewriter quote" with just the top row. It's true that it was designed to switch common letters between the left and right hands, but that actually speeds up typing. It also does not take weeks for someone to type "the" ; and if you mean learning touch-typing, I don't know of any study that claims that alternative keyboard layouts are faster to learn.
The various alt. keyboard layouts (dvorak, coleman, workman) definitely have their advantages and can be considered better than QWERTY, sure; people have estimated that they can be up to ~30% faster, but realistically, people report increasing their typing speeds by 5-10%; or at least the ones who have previously tried to maximize their typing speeds... If learning a new layout is the first time they'd put effort into that skill, they'd obviously improve more. It's probably also true that these layouts are more efficient in the sense that they require moving the fingers less, reducing the risk of RSI (though you'd really want to use an ergonomic keyboard if that's a concern.)
QWERTY is still used because it's not terrible, it's good enough. You can type faster than you can think with it, and for most people that's all they want. There's nothing wrong with any of the alternative layouts, I agree that they're better in some respects, but they're not order-of-magnitudes better as claimed.
3. Your opinions about C are asinine.
"not objectively inferior like QWERTY" - So, is C good or not? We're talking about memory safety, C provides literally none. Is this not objectively inferior? Now, I would argue that it's not, it's an engineering trade-off that one can make, trading safety for an abstract machine that's similar to the underlying metal, manual control over memory, etc. But you're not making that point, you're just saying that it's actually good before going on to explain that it's hard to use safely, leaving your readers confused as to what you're trying to argue.
"not a bad language for simple algorithm prototypes of under 60 lines" - It's difficult to use C in this way because the standard library is rather bare. If my algorithm needs any sort of non-trivial data-structure I'll have to write it myself, which would make it over 60 lines, or find and use an external library. If I don't have all that work already completed from previous projects, or know that you'll eventually need it in C for some reason, I generally won't reach for C... I'll use a scripting language, or perhaps even C++. Additionally, the places C is commonly used for its strengths (and where it has begun being challenged by a maturing Rust) are the systems programming and embedded spaces, so claiming C is only good for 60-line prototypes is just weird.
"C is about characters" - Um, most computer languages are "about characters". There are some visual languages, but I don't think you're comparing C to Scratch here... You can misplace a parentheses with Lisp or make any number of errors that are syntactically correct yet semantically wrong and you'll have errors too, just like in C. Now, most lisps give you a garbage collector and are more strongly typed than C, for instance, features which prevent entire categories of bugs, making those lisps safer.
4. You kinda lost the point there. You started by saying that the people who wrote Apple Music "could very well be capable and well-minded, likely personally capable of coding", i.e., they're good at what they do. Fine, let's assume that. Then, your bottom line is that in C "you have to really know what you're doing" and "you yourself have to code that code perfectly in the human capacity of perfection". What's missing here is a line explaining that humans aren't perfect, and even very capable programmers make mistakes all the time, and having the compiler catch errors would actually be very nice. Then it would flow from your initial points that these are actually fine engineers, but they were hamstrung by C.
And the tangent on QWERTY just did not help at all.
> So, is C good or not? We're talking about memory safety, C provides literally none. Is this not objectively inferior? Now, I would argue that it's not, it's an engineering trade-off that one can make, trading safety for an abstract machine that's similar to the underlying metal, manual control over memory, etc.
One might make the argument that Oberon, with its System module, provides the same memory control abilities but few of the disabilities of C.
> so claiming C is only good for 60-line prototypes is just weird.
That seems like a misrepresentation of the claim above?
> Um, most computer languages are "about characters". There are some visual languages, but I don't think you're comparing C to Scratch here... You can misplace a parentheses with Lisp or make any number of errors that are syntactically correct yet semantically wrong and you'll have errors too, just like in C.
Well, not really. Lisp is actually about trees of objects. The evaluator doesn't even understand sequences of characters. That you can enter it as a sequence of characters is purely coincidental, but there have been structured syntactic tree editors (sadly they went down for being proprietary and expensive at the time).
Sure, and that would be a good argument, there are several interesting languages out there that do various things better than C. I'm not intimately familiar with the Wirth languages, but I thought Oberon provided garbage collection?
> [...] misrepresentation [...]
Fine, they never claimed it was only good for that, but I still find it weird to claim that "it's fine, it's great for X" where X is a thing that the language is not particularly good at, while ignoring Y, the thing it's well known for.
> [...] trees of objects [...]
I just don't think that "about characters" or "about trees of objects" is an interesting way to differentiate between programming languages, and I think that this discussion is actually confusing between two different properties. First, is how the source code is represented and edited. It's almost always as a plain text file. Some languages have variants on the plain text file: SQL stored procedures are stored on the RDBMS, Smalltalk stores source code in a live environment image. There are other approaches, such as visual editing as-in Scratch, or Projectional Editing (https://martinfowler.com/bliki/ProjectionalEditing.html) as in... um... Cedalion? I don't actually know any well-known ones.
The other property is how the language internally represents its own code. Sure, Lisp has the neat property that its code is data that it can manipulate, but other languages represent their code as (abstract) syntax trees, too. Basically every compiler or interpreter for a 3rd generation language or above, i.e., anything higher-level than assembly language, parses source code the same way: tokenization then parsing into an abstract syntax tree using either manually-coded recursive descent, or a compiler generator (Bison, Yacc, Antlr, Parser combinators, etc.) So your point that the Lisp evaluator doesn't even understand sequences of characters is true for any compiler, they all operate on the AST.
I think that there's a point to be made somewhere in here that one language's syntax can be more error-prone than another's, but that wasn't the argument being made... Not that I understood, anyway.
> So your point that the Lisp evaluator doesn't even understand sequences of characters is true for any compiler, they all operate on the AST.
Lisp does not really operate on an AST. It operates on nested token lists, without syntax representation. For example (postfix 1 2 +) can be legal in Lisp, because it does not (!) parse that code according to a syntax before handing it to the evaluator.
Lisp code consists of nested lists of data. Textual Lisp code uses a data format for these nested lists, which can be read and printed. A lot of Lisp code, though, is generated without being read/printed -> via macros.
If (postfix 1 2+) is ready to be handed to the evaluator, it's because it has been parsed. That means it must be a parsed representation. "Parse tree" doesn't apply because parse trees record token-level details; ( and ) are tokens, yet don't appear to the evaluator. "Abstract syntax tree" is better, though doesn't meet some people's expectations if they have worked on compilers that had rich AST nodes with lots of semantic properties.
The constitutents of the list are not "tokens" in Common Lisp. ANSI CL makes it clear that the characters "postfix", in the default read table, are token constituents; they get gathered into a token until the space appears. That token is then converted into a symbol name, which is interned to produce a symbol. That symbol is no longer a "token".
You're arguing semantics, I think. I would simply say that Lisp's AST is S-expressions (those nested token lists), and that the parser is Lisp's read function. Then your example is just something that's allowed by Lisp's syntax, while something like ')postfix 1 2 +(' would be something that's not allowed by the syntax.
What you say about Lisp code being generated without being read or printed is of course true, and while Lisp takes that idea and runs with it, it's not exactly unique to Lisp either; Rust's macro can do the same thing, without S-expressions. In other languages you usually generate source code, for example Java has a lot of source code generators (e.g., JAXB's XJC that used to come with the JDK).
The parser for s-expressions is READ. S-Expressions are just a data syntax and know nothing about the Lisp programming language syntax. Lisp syntax is defined on top of s-expressions. The Lisp evaluator sees no parentheses and no text, It would not care if the input text contains )postfix 1 2+( . The reader can actually be programmed to accept that as input. The actual Lisp forms need to be syntax checked then by a compiler or interpreter.
There are lots of language with code generators. Lisp does this as core part of the language and can do it at runtime as part of the evaluation machinery.
Bugs in Apple's Music apps have essentially nothing to do with it being written in C++ and Objective-C (and these days a significant portion of it is JavaScript and Swift).
- Add reference counting to ObjC to get rid of a lot of use after free bugs (still of course possible because it's just a language suggestion and not strictly enforced like Rust or Swift)
- push for adding ObjC notations to let the tooling help catch some set of bugs. Still not perfect by any means but helps a little.
- created an entirely new memory safe language as Swift.
Did you even read what you linked? I wouldn't say "already happening" more like first early steps. Operating systems have a massive attack surface, would take years to convert code from C\C++ to Rust and likely be more vulnerable initially(the old code base went through decades of scrutiny, hundreds of scanners\fuzzers etc)
My view is that it should be state mandated for products with over 1 million users.
In the long run it would pay for itself with the money that no longer has to be spent on mitigating cyber security problems.
Cyber security is national security is the people’s security. Ever since my aunt was doxxed and had her online banking money stolen I’ve become a cyber security hardliner.
the core problem is trusting a byte you read and use to make decisions. this is the superset of memory safety and not protected against by any language as of now.
It doesn't have to be Rust. But for all the people who insisted for more than a decade that GCs and VMs don't necessarily compromise performance of practical applications, there has never been an acceptable browser engine in C#, Java, D, Common Lisp, Go, OCaml, Haskell or any of the others. Meanwhile Apple uses Objective-C with its ARC (and later Swift, which is even more like Rust) for everything and it was great.
So the community tried to make OCaml with the memory model of ObjC and as much backwards compatibility with C as they could muster. In context, this doesn't seem like a weird strategy.
How well do the best and worst thing you’ve done in the last 7 days “track” with where you were at 13 years ago? We’re there any ups or downs during that time?
We truly need to work more towards eliminating every memory unsafe language in use today, until then we're fighting a forest fire with a bucket of water.