I'm glad folks are starting to look at memory efficiency of Java programs - ther...

moring · on Oct 11, 2022

Others have mentioned IntMap, and in a more general way, I'd recommend treating those standard classes as a general-purpose implementation that should be replaced be specialized implementations -- to improve performance, memory footprint etc. -- WHERE NEEDED (avoid premature optimization!), and forget about it in the remaining 99% of cases where a HashMap contains less than 100 entries.

It's true that C/C++/Rust don't have this problem because they are optimized for this. OTOH if the problem you are solving is more suitable to garbage collection than manual or static-analysis based memory management then Java's high-performance GC will shine.

fulafel · on Oct 11, 2022

Object representation/memory layout optimization is an area that has been really missed in the PLT field, in no small part because of restrictions imposed by specified PL semantics. There are huge amounts of slack left on the table for nearly all languages as currently implemented. Both vs compactness of representation and placement in memory for efficiency of access. The bag of tricks that low-level programmers use to painstakingly optimize memory representations and placement should be available from compilers as automatic transforms, incorporated into profile feedback based optimizer passes, etc.

These are not just memory usage gains but also execution speed gains since memory access is frequently the main bottleneck and cpu cache effectiveness depends on how many of your objects they can hold.

Java is actually one of the few languages that does at least something in this field already before this JEP: Compressed object references (https://www.baeldung.com/jvm-compressed-oops).

whizzter · on Oct 11, 2022

It's a bit of a chicken-and-egg issue (compare to the Python GIL).

Doing these kinds of analysis and optimizations takes time (and might impact language design), most "popular" languages before Rust got traction seldomly took memory models into accounts (still in the mid/late 90s lookup tables was still faster than doing certain kinds of "primitive" calculations that CPUs/GPUs eats for breakfast now). Bakers's Linear Lisp (a kind of precursor to Rust) and some StandardML that had region allocations, but that research was still mostly about tradeoffs in memory liveness.

The "data-oriented-design" movement in games has led to ideas of allowing structure-of-arrays instead of arrays-of-structures to be used in identical ways in a couple of languages, sadly many people involved in these spheres are allergic to GC's so it's still a bit of a narrow design field whereas I'm kinda curious about where layered designs could be taken.

fulafel · on Oct 14, 2022

The Mike Acton & co C++ gamedev targeted data oriented design campaign is good and deserves credit but data representation/layout emphasis was old hat in other domains perf work before that. And since the "memory wall" has been talked about a lot much earlier, the fact that memory/cache is the bottleneck was inevitable. Thus the need to start programming around that absent hardware inventions to solve it.

Game programmers hit this on consoles due to facing platforms with relatively slow/tricky memory a bit earlier than PC devs (Cell processor[1]). But HPC programmers had by then been worryign and programming around the problem for longer, since the extinction of SSI vector supercomputers and transition to message passing clusters. (Also the data oriented design movement is only partly about this, and much about other C++ / programming / measurement etc ideas)

I was searching for older memory wall discussions and found at least this from 1994: https://www.eecs.ucf.edu/~lboloni/Teaching/EEL5708_2006/slid...

[1] in fact if you browse some of Mike Actons slides you can see his username on slideshare is "cellperformance" - eg https://www.slideshare.net/cellperformance/data-oriented-des...

TazeTSchnitzel · on Oct 11, 2022

Apple's Objective-C runtime does a lot of very interesting tricks to make more efficient use of memory. Despite being fundamental types in the language, NSString, NSNumber, NSDictionary and so on are only interacted with via dynamically-linked dynamic dispatch, which gives the runtime a lot of freedom to change their representations without affecting the ABI. For example, contents of strings are sometimes stuffed into tagged pointers rather than being heap-allocated.

exabrial · on Oct 11, 2022

I would use the Apache Collections int map and move on though… I see the Map specifically oriented towards objects, not basic types

winrid · on Oct 11, 2022

This is very true and the suck of Java today, but fyi you can improve this exact scenario by using a special IntMap etc, which there are many libraries for (LibGDX has some, and Caffeine has off-heap maps).

JanecekPetr · on Oct 11, 2022

Not Caffeine, that's the caching library. You probably meant one of FastUtil, Eclipse Collections, Koloboke, HPPC(-RT), they all provide primitive specializations of Maps and Collections. For off-heap maps, now that's more interesting! I know about Chronicle Map, MapDB, https://github.com/snazy/ohc and https://github.com/RuedigerMoeller/fast-serialization/wiki/O....

NovaX · on Oct 11, 2022

Another to add to your collection, https://github.com/yahoo/Oak

NovaX · on Oct 11, 2022

Caffeine doesn't offer off-heap maps or primitive collections (many other great libraries do). It does use a data sketch for compact frequency storage and optimizes against the cpu cache line to avoid unnecessary memory accesses. There are many dimensions to being hardware efficient.

winrid · on Oct 11, 2022

oh, you're right. Weird, wonder why I thought it was off heap. No wonder that service has higher memory usage than I was expecting. LOL

winrid · on Oct 14, 2022

coming back to this - amazing what you can accomplish with String.intern() and switching to Shenandoah GC. Memory usage is about 2.2gb now for 800mb of cache entries: http://media-server.fastcomments.com:8881/stats

About 500mb of that is netty SSL session cache...

lenkite · on Oct 11, 2022

Solved by libraries such as Eclipse collections: https://www.baeldung.com/java-eclipse-primitive-collections which has memory-optimized lists, sets, stacks, maps, and bags for all the primitive types

Someone · on Oct 11, 2022

I wouldn’t call that solved, but improved. Programmers will have to do extra thinking and work to get those benefits, code reviews may spend hours discussing whether making the switch is worthwhile.

And that’s not one-time work. In the ideal world, if the size of your data grows (or shrinks), you’ll have to revisit those choices.

Also, as a library author, you can’t know what choice to make.

Yes, you’ll always have that problem, but this change would make ‘just use the provided collections’ a much safer choice.

kjeetgill · on Oct 11, 2022

I disagree. In general once you have a primitive specialized library kicking around in your path, you can pretty much just always use it for anything with primitive keys no questions asked. You don't have to over-optimize by deciding which one each time.

Usually you just pick fastuil or kobloke and then use it every time.

Edit: as comments below corroborate.

rockwotj · on Oct 11, 2022

Also a lot of types like this fall back to boxing when accessed generically, which has its own performance issues

marginalia_nu · on Oct 11, 2022

java.util.collections.* isn't very optimized in that regard.

Both GNU Trove and fastutil offers primitive collections that are still Java, but much better memory optimized.

lichtenberger · on Oct 11, 2022

Fastutil is amazing. Specialized collection types for almost everything.

kwertzzz · on Oct 11, 2022

I would genuinely interested to see how much bytes a HashMap<Integer, Integer> would have in Java (or other languages) for a million entries. If I am not mistaken in julia, this would be 151 MB in julia or 88% more than just storing the keys and values.

     julia> k = Int32.(1:1000_0000);  v = rand(Int32,1000_0000); 
     julia> dict = Dict(zip(k,v));
     julia> Base.summarysize(dict) / (2 * 4 * 1000_0000)
     1.8874391
     julia> Base.summarysize(dict)
     150995128

sundarurfriend · on Oct 11, 2022

1000_0000 is 10 million, you want 1000_000. And k is better initialized as `Int32(1):Int32(1000_000)`, which is significantly faster (though it doesn't matter for the measurement here).

Checking with a million values, on my Julia v1.8.0, the dict takes up 18 MB, or about 130% more than the plain values (or an array of Pairs) would.

kwertzzz · on Oct 11, 2022

Thank you for correcting this! (I get the same numbers as you for a million elements).

adgjlsfhk1 · on Oct 11, 2022

note that the overhead will vary a ton based on the number of elements you use. I believe Julia's dicts currently resize to be 4x current size (to avoid hash collisions), so you should see anywhere from 30% extra to 300% extra depending on how many elements you have. There has been some effort recently to move to a more SwissDict-like approach which should reduce the memory overhead.

chrisseaton · on Oct 11, 2022

> HashMap<Integer, Integer>

I feel like that's a crazily-specific worst-case. Could have made it HashMap<Integer, Optional<Integer>> to really push the boat out.

grishka · on Oct 11, 2022

There's a reason Android offers SparseArray as a more efficient way of mapping integers to objects. Though Android is a whole different beast with its ART and register-based bytecode.

origin_path · on Oct 11, 2022

ART and Dalvik bytecode doesn't change the Java memory model in any way, though.

Thiez · on Oct 11, 2022

Just using C# and the nice generic `Dictionary<int, int>` will fix this. The memory overhead is really Java's fault for sticking with type erasure.

kjeetgill · on Oct 11, 2022

I don't want to rehash this here, but this is actually a less obvious choice than it can seem like at first blush! Erasure and primitive specialization have a very deep set of considerations and tradeoffs that touch a ton of language features. There's a ton written about it, value-types generally, monomorphization, etc.

Thiez · on Oct 11, 2022

What a nice and condescending reply. Do you believe that anyone who knows about these considerations and tradeoffs would automatically conclude that type erasure is superior, so someone who disagrees with it is obviously uninformed?

Since you didn't actually give any actual argument apart from "I disagree and I am smart" I don't think we have anything concrete to argue about, so let's stick to my original point: would you agree that in C# it is much easier to do "the right thing" when putting primitives in a datastructure such as a list, map, or set, because C# has value types and does not use type erasure for its containers, thereby preventing both a lot of overhead and the need for specialized container types for various (combinations of) primitives?

kjeetgill · on Oct 12, 2022

Hey, sorry if it came across as condescending, I really didn't mean it to be. All I wanted to say was I don't think the choice is obvious and leave a few googleable buzzwords for anyone interested to go read up more on it.

I don't think I took a stance pro/anti erasure other than to say I think it's way more complicated a topic than it might seem (maybe not to you, but likly to others!) and it's worth reading up on. AND I didn't want to dig up old HN comments/posts to confirm what I'd read about it, so I was necessarily vague.

Edit 1: For instance, I vaguely recall that Scala head trouble porting to C# because of erasure related stuff. But I'm not genius enough to remember the details so I didn't get into it.

Edit 2: So I dug one up: https://cr.openjdk.java.net/~briangoetz/valhalla/erasure.htm... And posted here https://news.ycombinator.com/item?id=33171832

Edit 3: Other good conversations on it: https://news.ycombinator.com/item?id=18679684 https://news.ycombinator.com/item?id=13051594

Again, my only stance is that it's a surprisingly deep topic — Not making any claims!

adgjlsfhk1 · on Oct 11, 2022

And this isn't even getting into special cases where it can be worse. There are lots of places where a better language can use an 8 bit integer, while if you want to use anything vaguely generic in Java, you have an 8 byte header (plus another 8 bytes for a pointer to it).