It’s because modern devs by and large have zero concept of latency differences. I had to explain to people yesterday why an on-disk temporary table in MySQL was slower than an in-memory one. “But the disk is an SSD,” was an actual rebuttal I got. Never mind the fact that this was Aurora, so the “disk” is a slice of (multiple) SSDs exposed over a SAN…
I've been trying to impress upon people from my own research in Python packaging: Pip is slow because it defaults to pre-compiling bytecode (and doesn't invoke multiprocessing for that, although this seems to be in the works from the discussion I've seen); imports literally hundreds of modules even when it ultimately does nothing; creates complex wrappers to set up for connecting to the Internet (and using SSH, of course) even if you tell it to install from a locally downloaded wheel directly; caches things in a way that simulates network sessions instead of just actually having the files... you get the idea.
"It's written in Python and not e.g. in Rust" is simply not relevant in that context.
(For that matter, when uv is asked to pre-compile, while it does some intelligent setup for multiprocessing, it still ultimately invokes the same bytecode compiler - which is part of the Python implementation itself, written in C unless you're using an alternative implementation like PyPy.)
There may be a bit of culture at play sometimes as well. If a language isn't meant to be fast, then perhaps devs using the language do not prioritize performance very much. For some, as long as it is possible to point to some externality ("hey this is Python, what do you expect") this is sufficient.
Of course, not always the case. C++ is a good counter example with a massive range of performance "orientedness". On the other hand, I suspect there are few Rust / Zig or C programmers that don't care about performance.
To me, Python is more of a challenge than an excuse when it comes to performance.
On the flip side, I've seen quite a few C developers using their own hand-rolled linked lists where vector-like storage would be more appropriate, without giving it a second thought. Implementing good hash tables from scratch turns out not to be very much fun, either. I'm sure there are off the shelf solutions for that sort of thing, but `#include` and static compilation in C don't exactly encourage module reuse the same way that modern languages with package managers do (even considering all the unique issues with Python package management). For better or worse.
(For what it's worth, I worked in J2ME for phones in the North American market in the mid 00s, if you remember what those were like.)
Not sure what you mean, but I guess you're getting at something like "is it really pip's fault that it imports almost everything on almost every run?".
Well first off, pip itself does defer quite a few imports - just not in a way that really matters. Notably, if you use the `--python` flag, the initial run will only import some of the modules before it manages to hand off to the subprocess (which has to import those modules again). But that new pip process will end up importing a bunch more eventually anyway.
The thing is that this isn't just about where you put `import` statements (at the top, following style guidelines, vs. in a function to defer them and take full advantage of `sys.modules` caching). The real problem is with library dependencies, and their architecture.
If I use a library that provides the `foo` top-level package, and `import foo.bar.baz.quux`, at a minimum Python will need to load the `foo`, `foo.bar`, `foo.bar.baz` and `foo.bar.baz.quux` modules (generally, from the library's `foo/__init__.py`, foo/bar/__init__.py`, `foo/bar/baz/__init__.py` and `foo/bar/baz/quux.py` respectively). But some libraries offer tons of parallel, stand-alone functionality, such that that's the end of it; others are interconnected, such that those modules will have a bunch of their own top-level `import`s, etc. There are even cases where library authors preemptively import unnecessary things in their `__init__.py` files just to simplify the import statements for the client code. That also happens for backward compatibility reasons (if a library reorganizes some functionality from `foo` into `foo.bar`, then `foo/__init__.py` might have `from . import bar` to avoid breaking existing code that does `import foo`... and then over time `bar` might grow a lot bigger).
For pip, rich (https://pypi.org/project/rich/) is a major culprit, from what I've seen. Pip uses little of its functionality (AFAIK, just for coloured text and progress bars) and it tends to import quite a bit preemptively (such as an emoji database). It's very much not designed with modularity or import speed in mind. (It also uses the really old-fashioned system of ad-hoc testing by putting some demo code in each module behind an `if __name__ == '__main__':` block - code that is only ever used if you do `python -m rich.whatever.submodule` from the command line, but has to be processed by everyone).
And yes, these things are slow even with Python's system for precompiling and caching bytecode. It's uniformly slow, without an obvious bottleneck - the problem is the amount of code, not (as far as I can tell) any specific thing in that code.
It's also a Product issue: "fast and snappy" is almost never on their list of priorities. So product managers will push devs for more features, which satisfy the roadmap, and only get concerned about speed when it reaches painful levels.
But is it actually faster by any significant amount on your workload? Did you benchmark? Temporary tables in databases rarely actually do disk IO in any blocking code path but mostly just dirty buffers in the OS or in the database itself and then something writes it to disk in the background. This limits throughput but does not add much extra latency in the common cases.
Edit: It still might be a bad idea to waste the IO if you do not have to but the latency of a temporary table is usually RAM latency, not disk latency even for temporary tables on disk.
Didn’t have to, prod benchmarked it for us, twice.
If you’re curious, the EBS disks Aurora uses for temporary storage, when faced with a QD of approximately 240, can manage approximately 5000 IOPS. This was an r6i.32xlarge.
My hypothesis is currently that the massive context switching the CPU had to do to handle the interrupts slowed down its acceptance of new connections / processing other queries enough such that everything piled up. I’ve no idea what kind of core pinning / isolation AWS does under the hood, but CPU utilization from disk I/O alone, according to Enhanced Monitoring, was about 20%.
And I thought we were past this sort of gatekeeping and elitism. I've worked with people who had a master's degree in CS who couldn't code their way out of a wet paper bag. In my experience there's very little correlation between how someone learned programming and how deep their understanding of the entire stack is.
Yeah, deep understanding I think is a matter of how much time you spend investigating the craft and improving. Maybe the real question is: what motivates that to happen? Maybe it's school type, but maybe it's not.
My personal anecdotes, which are music centric but all apply to my software career:
1. I've studied music my whole life, and baked into music is the idea of continual practice & improvement. Because of this experiential input, I believe that I can always improve at things if I actively put a bit of energy into it and show up. I believe it because I've put in so many hours to it and have seen the results. This is deeply ingrained.
2. When I picked up bass in high school, I spent the first year learning tabs in my bedroom. It was ok, but my ability accelerated when I started a band with my friends and had to keep up. I really think the people you surround yourself with can: push you more, teach you things you didn't know, and make the process way more of a fun hang than doing it by yourself.
3. Another outcome from music education was learning that I really love how mastery feels. There's a lot of positive feeling from achieving and growing. As a result, I try to seek it out in other places as well. I imagine sports/etc are the same?
> I've worked with people who had a master's degree in CS who couldn't code their way out of a wet paper bag.
"Programming" consists of an insanely large number of isolated communities. Assuming the respective person is capable, I would assume that he simply comes from a very different "programming culture".
I actually observe a very related phenomenon for myself: the more I learn about some very complicated programming topics, the more "alien" I actually (start to) become to the programming topics that I have to do at work.
Hope this doesn't come off as disrespectful, as in that I don't believe you, but out of personal interest, would you consider expanding on that? I'd love to hear about the particular example you were thinking of, or in what ways self-taught coders surprised you over academically-taught ones, if you've had experience working with both over a meaningful span of time. Also, if the case, in what ways self-taught coders were/are maybe lacking on average.
If you've ever given answers to that in another comments on HN or elsewhere, feel free to link.
It's certainly true in my experience. The main thing that makes a difference is simply how curious and interested you are.
Plenty of graduates simply got into to it to make money, and have no real deep interest. Some of them love the theory but hate the practice. And some of them are good at both of course.
By contrast, self taught people tend to have personal interest going for them. But I've also worked with self taught people who had no real understanding (despite their interest), and who were satisfied if something just worked. Even if they are motivated to know more, they are often lacking in deeper theoretical computer science (this is a gap I've had to gradually fill in myself).
Anyway, the determining factor is rarely exactly how they acquired their skills, it's the approach they take to the subject and personal motivation and curiosity.
Makes sense, out of all the potential differentiators the source of skill attainment simply isn't the necessarily dominant one. Thanks for the answer :)
Not disrespectful at all. I agree with the sibling comments, I think what allows someone to become a great software developer, to have great intuition, and to understand systems and abstractions on a really deep level is their curiosity, hunger for knowledge, and a lot of experience.
There are many college educated software developers who have that sort of drive (or passion, if you will) and there are just as many who don't, it's not something college can teach you, and the same is true for self-taught developers.
At the end of the day "self-taught" is also a spectrum that ranges from people who created their first "hello world" React app 2 months ago to people who have been involved in systems programming since they were 10 years old, and have a wealth of knowledge in multiple related fields like web development, systems administration, and networking. That's why I think it's silly to generalize like that.
Software development is extremely broad so depending on the discipline self-taught developers might not be missing anything essential, or they might have to learn algorithms, discrete mathematics, linear algebra, or calculus on their own. I learned all of that in college but I'd probably have to learn most of it again if I really needed it.
Thanks for the answer, very nice of you to take the time even hours after.
Guess it makes sense; I'm self taught myself, but thought academically taught developers should have a leg up in theory and mathematics, at the same time though, at one point I considered further formal education for myself (in at least paid courses and such), I realized that I don't think there's much I can't teach myself with the resources available (which includes high quality university lectures which are available for free).
Not the GP, but here is my N=1 (N=3, actually, as you'll see).
I have a Master's in Economics. After 3 years of economics, I started a Master's program in maths (back then the Master's degree was the standard thing you achieved after 4.5–5 years of studying in my country, there was basically nothing in-between high school and that). 9 years later I got a PhD in mathematical analysis (so not really close to CS). But I've been programming as a hobby since late 80s (Commodore 64 Basic, Logo, then QBasic, of course quite a bit of Turbo Pascal, and a tiny bit of C, too). I also read a bit (but not very much) about things like algos, data structures etc. Of course, a strong background in analysis gives one a pretty solid grasp of things like O(n) vs O(n log n) vs O(n^2) vs O(2^n). 9 years ago I started programming as a side job, and 2 years later I quit the uni.
I lack a lot of foundations – I know very little about networks, for example. But even in our small team of 5, I don't /need/ to know that much – if I have a problem, I can ask a teammate next desk (who actually studied CS).
Of course, _nemo iudex in causa sua_, but I think doing some stuff on the C64 and then Turbo Pascal gave me pretty solid feeling for what is going on under the hood. (I believe it's very probable that coding in BASIC on an 8-bit computer was objectively "closer to the bare metal" than contemporary C with all the compiler optimizations.) I would probably be able to implement the (in)famous linked list with eyes closed, and avoiding N+1 database queries is a natural thing for me to do (having grown up with a 1 MHz processor I tend to be more frugal with cycles than my younger friends). Recently I was tasked with rewriting a part of our system to optimize it to consume less memory (which is not necessarily an obvious thing in Node.js).
Another teammate (call them A) who joined us maybe 2 years ago is a civil engineer who decided to switch careers. They are mainly self-taught (well, with a bit of older-brother-taught), but they are a very intelligent person with a lot of curiosity and willingness to learn. I used to work with someone else (call them B) who had formal CS education (and wasn't even fresh out of a university, it was their second programming job, I think), but lacked general life experience (they were 5-10 years younger than A), curiosity and deep understanding, and I preferred to work with A than with B hands down. For example, B was perfectly willing to accept rules of thumb as a universal truths "because I was told it was good practice", without even trying to understand why, while A liked to know _why_ it was a good practice.
So – as you yourself noticed – how you acquire knowledge is not _that_ important. IMHO the most important advantage of having a formal CS education is that your knowledge is more likely (but not guaranteed!) to be much more comprehensive. That advantage can be offset by curiosity, willing to learn, some healthy skepticism and age. And yes, I think that young age – as in, lack of general life experience – can be a disadvantage. B was willing to accept even stupid tasks at face value and just code his way through them (and then tear the code apart because it had some fundamental design problem). A, as a more mature person, instinctively (or maybe even consciously) saw when a task did not fit the business needs and sometimes was able to find another solution which was for example simpler/easier to code and at the same time satisfied the actual needs of the customer better.
I’ve worked with both (I personally have an unrelated BS and an MS in SWE, which I used purely to get my foot in the door – it worked), and IMO if someone has a BS, not MS, there’s a decent chance they at least understand DS&A, probably took an OS course, etc.
That said, I have also worked with brilliant people who had no formal education in the subject whatsoever, they just really, really liked it.
I’m biased towards ops because that’s what I do and like, but at least in that field, the single biggest green flag I’ve found is whether or not someone has a homelab. People can cry all they want about “penalizing people for having hobbies outside of their job,” but it’s pretty obvious that if you spend more time doing something – even moreso if you enjoy it – you will learn it at a much faster rate than someone who only does it during business hours.
Maybe you should skip bashing modern devs and people who learnt from online courses when the parent poster is the one in the wrong. Unless InnoDB has a particularly bad implementation of disk backed temporary tables almodt all disk IO should be done in the background so the latency cost should be small. I recommend benchmarking on you particular workload of you really want to know hur big it is.
I am an old-school developer with a computer engineering degree but many of the old famous devs were self taught. Yes, if you learn how to code through online courses you will miss some important fundamentals but those are possible to learn later and I know several people who have.
We have excellent metrics between PMM and AWS Perf. Insights / Enhanced Monitoring. I assure you, on-disk temp tables were to blame. To your point, though, MySQL 5.7 does have a sub-optimal implementation in that it kicks a lot of queries out to disk from memory because of the existence of a TEXT column, which internally treated as a BLOB. Schema design is also partially to blame, since most of the tables were unnecessarily denormalized, and TEXT was often used where VARCHAR would have been more appropriate, but still.
That really should include "sync write 4KB to an SSD". People don't realize it's purely in-RAM on the disk these days (you pay a PCIe round trip, and that's about it).
I do want to add that many of the devs I work with and have worked with are genuinely interested in learning this stuff, and I am thrilled by it.
What bothers me is when they neither have an interest in learning it, nor in fixing their bad patterns, and leadership shrugs and tells infra teams to make it work.