Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Lock-free is not the same as wait-free.

> Web application servers will often hit the wall on something like a shared cache or the session-state store. Unless 100% of the shared data uses efficient lock-free algorithms, then given enough threads eventually some mutex somewhere will be the limit.

Without strong LL/SC (which exists nowhere for useful block sizes), it's difficult if not impossible to implement complex data structures, particularly compound data structures (lockless doesn't compose well), that can't block somewhere. Many sophisticated lock-free and even nominally wait-free data structures implicitly rely on dynamic memory and complex garbage collection strategies, which just hides the ball.

If you have shared resource contention, usually something is going to block somewhere. And the more you scale up, the more your optimization hacks to minimize real-world blocking breakdown, thus the long history of multithreaded software hitting limits at 2, 4, 8, 16, 32, 64, 128, etc CPUs.

The way around these is to architect your application at a higher-level to avoid or mitigate resource contention. And how you do that is extremely context dependent, though "share memory by communicating" is a rule of thumb intended to steer you in a less failure-prone direction, keeping your options open. If you go in planning to just add lockless implementations of basic data structures everywhere, you're going to fail hard. Often because the hit to pipelining, etc, for using atomic operations in your critical path impose huge costs upfront.



> lockless doesn't compose well

Are you familiar with any resources on this?I've noticed this when interviewing candidates and asking follow-ups related to how to make an LRU cache thread-safe. Many, many candidates usually reach for converting their HashMap into a ConcurrentHashMap, which effectively buys them nothing due to the point you raised.

Seems like a missing abstraction that makes it hard to compose these structures, and therefore construct complex structures. Another thing to the research list, irrespective. Or maybe it is naturally the case that the implementations of composition for lockless doesn't scale well?

I still need to read GP's article from 1024cores which seems to potential get into this as well.


I'm not an expert in this area, I've just read enough literature on software transactional memory to understand some of the fundamental limitations of modern hardware architectures.

This is one of the few areas where a subscription to the ACM Digital Archive is priceless. It's been several years since I went down the rabbit hole, and many new data structures have since been published. If there's a decent self-contained solution, there's a good chance it's been published there. Any particular paper is likely floating around on the Internet, but efficiently sifting through the literature benefits from friction-free access to all the citations.


If you have a DOI, sci-hub can generally find it for you. I consider paywalls mostly a solved problem at this point.


Another interesting article here is "MCS locks and qspinlocks": https://lwn.net/Articles/590243/


Can you elaborate on the HashMap? I have used java.util.concurrent.ConcurrentHashMap for caches on machines with up to 16 cores. I am actually very impressed by the implementation every time I use it. Easy to use, and the performance is excellent. I've never had a chance to use it on 128 core machine, and may be my experience is limited, but I think that for the majority of use cases using ConcurrentHashMap for caches is a solid choice.


unless I'm misunderstanding terribly, the issue is that an LRU cache is composed of both a list of ordered cache entries and the map of data itself. Converting just the map to a threadsafe implementation doesn't actually fix the thread safety issues.


> Seems like a missing abstraction that makes it hard to compose these structures, and therefore construct complex structures. Another thing to the research list, irrespective. Or maybe it is naturally the case that the implementations of composition for lockless doesn't scale well?

Would transactional memory make lock-free algorithms compose better, or just race/conflict more?


The vast majority of STMs use lock-based transaction management. There are lock-free designs, but these tend to perform worse due to creating a lot of memory bus traffic. Unfortunately lock-free algorithms can easily perform worse than their lock-based counterparts. There are often simpler alternatives to reduce lock contention pitfalls on hot paths (like map reads), while still taking advantage of locking (like map writes).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: