Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Zig, Rust, and Other Languages (eatonphil.com)
139 points by tim_sw on March 15, 2024 | hide | past | favorite | 113 comments


>Zig is not a mature language. But it has made enough useful choices for a number of companies to invest in it and run it in production. The useful choices make Zig worth talking about.Go and Rust are mature languages. But they have both made questionable choices that seem worth talking about.

Well, Zig's native string type support, or lack thereof, is also a questionable choice. It's not like Zig did all the right choices and only Go and Rust made questionable ones.


In fact, Zig has pretty much the same support for strings as both Rust and Go. The primary difference which people seem to complain about, that "[]const u8" is not written as "string", can be solved by writing `const string = []const u8;` at the top of your program if you like.

Rust: `&str`, 'some bytes, and a length', 'Constructing a non-UTF-8 string slice is not immediate undefined behavior, but any function called on a string slice may assume that it is valid UTF-8, which means that a non-UTF-8 string slice can lead to undefined behavior down the road.' - also `String` and `OsString`.

edit: apparently Rust enforces that &str cannot be created with invalid UTF8, at the additional performance cost of runtime checks, so it has stronger guarantees than other languages like Go/Java/Zig/etc (as usual)

Go: `string`, 'in effect a read-only slice of bytes', 'a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.'

Zig: `[]const u8`, 'some bytes, and a length', 'Zig has no concept of strings', 'by convention parameters that are "strings" are expected to be UTF-8 encoded slices of u8.', 'Generally, you can use UTF-8 and not worry about whether something is a string.' - and the stdlib has functions for working with unicode and strings.


> 'Constructing a non-UTF-8 string slice is not immediate undefined behavior

Rust’s string constructors (for both str and String) check that the data contains strictly valid UTF8.

You can opt out of this check in performance sensitive code - but doing so is considered unsafe. In safe rust it’s impossible to construct a str / String which contains invalid utf8.

Eg: https://doc.rust-lang.org/std/str/fn.from_utf8.html


You shouldn't misuse unsafe Rust to create `&str` or `String` with invalid UTF-8. Doing that violates assumptions some optimizations rely on. I even would contend it's undefined behavior. At least I am convinced that you should never have invalid UTF-8 in `&str` or `String`. If you need that, there's the [bstr][0] crate for byte strings which may or may not be valid UTF-8. Essentially these are just byte slices `&[u8]` and the owned version byte vectors `Vec<u8>` and traits and methods around them. Very useful for text data which is not reliably UTF-8.

[0]: https://crates.io/crates/bstr

The idea of using unsafety here is to have a way to tell the Rust compiler: "Trust me, I promise that I only fill the strings with valid UTF-8 data. You don't need to insert code to check UTF-8 validity." And the Rust compiler answers: "Okay. I accept that but if something goes wrong I can't catch that."

This allows avoiding repeated validations of a byte slice for UTF-8 validity, for example.


It changed, it used to be immediately considered UB, but isn’t any longer, as your grandparent says, that’s what is documented now. It still leads to UB, but isn’t inherently UB on its own. There’s a term for this but I’m forgetting at the moment and search is failing me.

I agree that bstr is what you want unless you specifically understand this detail and know why it is the way it is.


The term is "library UB" versus "language UB". To be clear, you should still avoid it most of the times.


Would you say library undefined behavior is just a violation of a library invariant? Like `&str` and `String` containing valid UTF-8?

I admit that a library invariant is allowed to be violated temporarily under some strict preconditions. One example are bank account money transfers, where the credit and debit don't happen exactly simultaneously. Something similar could happen in a library.

But for language UB this doesn't work. Why? Is it because of compile-time optimizations possibly deleting code restoring the invariant?


Ah yes, thank you.


> You shouldn't misuse unsafe Rust to create `&str` or `String` with invalid UTF-8.

Sorry - I think my comment wasn't clear. The point of from_utf8_unchecked isn't to create strings with invalid UTF8. The point is that if you know that your string has already been validated, you can skip re-validating it.


good to know! I just copied that quote directly from the Rust docs[0] which were not clear it is only possible in unsafe code.

[0] https://doc.rust-lang.org/std/primitive.str.html#invariant


String is utf8, OsString is not (necessarily). Having two separate types mean you can guarantee utf8 when you want to, but don't have to when you can't, such as when dealing with “strings” that an OS produces, which are basically just sequences of bytes (perhaps with zero bytes disallowed). When you have one string type that has to do both jobs, you lose out on the guarantees of the more restrictive type.


I am definitely not a rust expert, but as far as I can tell this is just not true? The quote you copied is from the documentation is in the context of creating a string in an unsafe manner. If you're going to include unsafe behavior, you may as well say that nothing in rust means anything at all.


> Rust: `&str`, 'some bytes, and a length', 'Constructing a non-UTF-8 string slice is not immediate undefined behavior

If you're constructing invalid `str` you already have severely fucked up. And dog is eating your HDD.


> If you're constructing invalid `str` you already have severely fucked up.

Presumably it'd be better to know this immediately on initialization of the variable rather than at some point in the program that actually expects valid utf8—god forbid you transmit or persist the data before this happens.


That's why str::from_utf8 returns Result<&str, FromUtf8Error>. And if you know the string is UTF-8 encoded and don't want to pay the cost, there's `from_utf8_unchecked`, which is marked unsafe.


Ah, I thought you were referring to the benefits of such a system as compared to zig and blithely passing the responsibility of input validation on to the programmer. Rust of course demonstrates the benefit of checking validity at initialization.


> and the stdlib has functions for working with unicode and strings.

Last time I looked I don't think it did. It looks like there is a unicode namespace now, but it is very primative. It doesn't have a simple way to iterate over codepoints, much less graphemes. Nor does it have anything to inspect the category a codepoint belongs to.


`std.unicode` supports validation, codepoint iteration, conversion from UTF8 <-> other encodings, etc.

It doesn't have graphemes (neither does Go or Rust), or other advanced unicode support. Instead that is provided by ziglyph[0]

[0] https://codeberg.org/dude_the_builder/ziglyph


> also `String` and `OsString`

Because these are different types with different semantics.

Why should there be one string type? Languages have to interface with the real world, where string data could be anything, but programs still need to be written for the common cases... hence, &str and String.


The more I've used the language, I'm starting to think that this decision makes sense in the context of Zig. Common string operations involve allocation, non-constant time access, etc. I can definitely understand how strange that sounds without the rest of the language around it as context, though.


Rust's String requires allocation, but str (typically seen as the fat pointer type &str) does not. "The cat sat on the mat".split('a') is an iterator over sub-strings, no more allocation needed here than for 105u32.leading_zeros()

A lot of the Rust string API lives in str, not String.


the inclusion of `bufPrint` in the standard library is really nice, you can just give it a pointer to an array. If you want to make a string on the stack in Rust you have to pull in something with a `Writer` trait like `ArrayVec` and use `write!`.


Well, Zig does have Unicode support in its standard library. A "String" type wrapping that functionality would be slightly safer but, I don't see it as some huge wart of the language. You could always just define your own.


What advantages would there be with a native string type over []const u8. (For those not fluent in zig, a string is typed as a slice constant bytes, where a slice is a pointer + length)


An UTF8 String would have invariant of always having valid UTF8, omitting need for further UTF8 validation.

If you're working on bytes string in Zig the status of them is unknown, so you have to always validate even if it's just Utf8. In Rust you can omit Utf8 validation for bytes from a String.


In Zig strings are assumed to be UTF8 unless documented otherwise, so any function operating on strings is not wasting CPU cycles doing any validation. The person who creates the string is expected to ensure it is valid UTF8 if needed.


This is the "Just don't write bugs" that we laugh at for C and C++.


If there is one thing Ada does better than Rust, is that it encodes the invariants in the type system.

While Rust doesn't goes as far as I would have want it's miles better than what Zig and C are doing - the programming equivalent of "Trust me bro!".


Constructing a runtime-invariant-checked utf8 string is not "omitting utf8 validation".


Everybody says they want a built-in String type.

And then everybody proceeds to write their own String library anyway.

"The only winning move is not to play."


Rust has a nice solution of having a &str slice (equivalent of C++ std::string_view) as the lowest common denominator and a deref operator that can easily coerce various string types to the slice.

This allows Rust to have all kinds of string flavors (fixed-len or NUL-terminated or growable, on stack or heap or in ROM, with SSO, with CoW, interned or refcounted, atomically or not, and so on), but they all coerce to the basic &str, so they have the same basic methods, and are compatible with most functions that just want a string without caring how it's allocated.

You can use your own weird string type if you want, but usually you're not forced to convert or copy it to use it with standard library functions or 3rd party dependencies.


Yeah - although the cost of this is that it makes rust harder to learn. (Try answering the common beginner question: “How is &String different from &str?”)

In rust Strings are usually passed in to functions as a &str, and returned from functions as String. It makes sense once you’re used to it, but I think philosophically Rust is much closer to C++ than C. Rust is missing the elegant minimalism of C or Zig.



OP is not talking about Golang.


The same critique applies


So Zig is not simple but Rust is? The article you ref is not talking about Zig at all by the way.


It just illustrates the point. There is no elegance in simplicity. Zig is not memory safe and its functions „assume” utf8 with UB when they encounter invalid bytes.


C also has &str and &String, but they're "char* that crashes if you free() it" and "char* that leaks if you don't free() it".

Ownership is hard to learn, and it is a barrier to learning Rust. However, don't confuse it with C++. Rust has two+ string types on purpose, not because of carrying 1970's legacy.


> Try answering the common beginner question: “How is &String different from &str?”

This has nothing to do with strings though - how is &Vec<T> different from &[T] or &Box<[T]>? You need to understand ownership, references, and unsized types.


> This has nothing to do with strings

Sort of. It certainly affects strings, and you can't really use strings without understanding the difference.

My point is that its yet another thing in the big bucket of junk you need to learn in order to use rust effectively. Rust is the only language I know of that has separate types for owned strings and a borrowed strings. Its a powerful concept - and its fast and efficient for the computer. But its not "free". The cost is that it makes it harder to learn rust.

While we're on the topic - it also really doesn't help that the names of these types in the standard library are super idiosyncratic. String is to Vec<u8> as &str is to &[u8]. You just have to memorise that. How about Path and PathBuf? One of them is like String, and one is like &str. I swear I have to look it up every single time.

I think the investment is worthwhile - I adore rust. But as I said, rust feels more like a better C++ than a better C. All the pain is front loaded, with the assumption that you'll eventually write enough software in rust to make the investment worthwhile.

Simpler languages like Python or Go get out of your way immediately and let you be productive. I prefer rust, but there's definitely something to be said for that attitude.


I think I would sum this up with saying that strings are not simple, and there is a delicate balance between making simple things simple and complex things possible without shooting yourself in the foot - both as a user, and as a library designer.

Since Rust prefers to have verbosity and correctness over ease, it makes sense to have multiple types to reflect the different requirements of the underlying data.

I sympathize with some of the std library verbosity when it comes to strings (and paths) but I don't think it's really a big criticism of why Rust's learning curve is steep. You have the same issues in C++, and doubly so in C because it doesn't help you at all and when you screw up the program crashes. When you use strings in an unmanaged language and need to manipulate them, you do need to understand the underlying consequences of how those strings are defined and stored. Comparing to managed languages like Python and Go is a bit unfair, because they're working in a different domain.

Python is a great example of what happens when you try and hide complexity too far from the programmer - the python3 ecosystem fracture took millions of dollars and over a decade to settle - because of the inherent complexity in string representations.

> Rust is the only language I know of that has separate types for owned strings and a borrowed strings.

C++ and Objective-C (kinda) have different types for owned and referenced strings. C++ in particular has the same requirements as Rust, where referenced strings need to be able to refer to literals and it's not acceptable to make the default allocate.


Agree that the naming is pretty bad, but ultimately there are like… three or four of these owned/deref pairs you actually have to remember in day to day usage? I do wish they had gone with a consistent naming scheme, but in the grand scheme of things this is one of the more minor things that Rust got wrong.


Yup, I can’t imagine switching to a language that doesn’t have something like that, since in C# you have a similar feature in the form of Span<T> and ReadOnlySpan<T>, to which all manners of sources can be coerced: heap-allocated arrays, stack-allocated buffers, inline arrays, strings, native memory, etc.


> And then everybody proceeds to write their own String library anyway.

Is this true? It was (is!) certainly true for C, but C has an especially emaciated expectation for string processing primitives. Any runtime developed after like 1995 that I can think of has fixed this by providing a sane string implementation people generally agree upon.


If you care about Unicode for real, yes.

Rust and Go both don't have a builtin package for grapheme iteration - and many naively (and incorrectly) think that a 'Go unicode rune == "character"' in Go. I assume the same happens with Rust `char` type.

If you care about unicode-aware string sorting (you should), rather than the naive string sorting the Go and Rust standard libraries provide out of the box... then you probably want a proper Unicode library.

I think the only language that gets Unicode 'right' out of the box is Swift, as it actually provides grapheme iterators, Locale awareness, etc. - but it comes at the cost of the language being tied to the (ever-moving) Unicode standard.


> I think the only language that gets Unicode 'right' out of the box is Swift, as it actually provides grapheme iterators, Locale awareness, etc. - but it comes at the cost of the language being tied to the (ever-moving) Unicode standard.

I think this might (at least partially) be why Rust's stdlib doesn't have this. If it did, then support for it would be tied to Rust's release schedule and which version of Rust you're using. Granted it is every six weeks, and it's usually trivial to update, but that's still a connection that could be an issue.

By having this be in a separate library it means that it can update as and when it needs to, and it's not inherently connected to a specific release of Rust.


I was pessimistic about grapheme-based orientation towards text, deleted it to research more, and I've come to the conclusion that this is simply not a consensus opinion. Can you give me an example where grapheme-based sorting makes a critical difference from codepoint-oriented sorting on a normalized text? Full unicode composition certainly seems to provide a reasonable solution with western languages, CJK characters, and romanization of CJK characters, but that leaves a hell of a lot of scripts that I don't know about.

I mean unicode is incredibly complex, but it doesn't even seem like there's a consensus outside of swift's string implementation of what a grapheme even is.

(Granted, this might support the above concept that people can't even agree on what a string is, but unicode code points seems like a reasonable baseline to expect from a modern language. That said, rust doesn't even include unicode normalization in the standard library, although the common crate for it seems like a reasonable solution.)


The issue I am aware of is with the Thai language that has zero-length unicode codepoints that get superimposed on the preceding non-zero-length unicode codepoint preceding it (or if none is present, an 'empty' non-zero-length placeholder). A non-zero-length unicode codepoint can have multiple zero-length unicode codepoints following it. (In Thai, no more than 2 for morphemically correct words.) For sorting, a normalization needs to happen in the order of these zero-length codepoints in order for unicode codepoint sorting to be correct. The standard practice in Thai is to have vowel signs before tone markers.

In recent years, application support for this has greatly improved.


> it doesn't even seem like there's a consensus outside of swift's string implementation of what a grapheme even is.

Linguistically it's easy, graphemes are the squiggles people actually draw, as distinct from how a machine encodes them. Of course since people aren't a single individual with just one consistent opinion that does mean there's room for nuance - maybe some people think this is two separate squiggles.


Even PL/I has better string handling than C.


Nope. It's not at all fixed because nobody can ever agree on what a "String" is and what performance guarantees the underlying data structure should provide.

Let's just assume a String is UTF-8 to make things "simple".

Is a String mutable or not? Should mutable and immutable Strings have the same underlying structure? If mutable, is a String extensible or not? Can a String be sliced into another String? Can those slices be shared? Should you walk across codepoints or characters (which could be multiple codepoints due to combining)? If you want to insert a codepoint in the middle of a String, what are the performance guarantees?

I can go on and on ...

"String" really has to be a library as there are simply far too many permutations once you step away from "Shove ASCII to tty".


Well sure, people may colloquially refer to a lot of things as "strings"—hell, you could refer to all sequences as strings if you just wanted to argue with people—but the idea of trying to encapsulate this all in the standard library in a single implementation seems confusing semantically and of questionable value. It seems a lot easier to work with a reasonable interpretation of a string with its associated tradeoffs—which again is implied by most standard libraries.

That said, I personally would balk at willing adopting any runtime that didn't enable iterating over a sequence of unicode code points, whether they be stored as utf8 or some 16-bit form, from a string of bytes in 2024 unless I were guaranteed to avoid having to deal with text processing of free-form human input.


I’ve used Java my entire career. I have seen a custom string type nor a post advocating it in Java.


Well, I see I left the word “not” out which makes this a little more confusing. Oops.

It’s supposed to be “I have NOT seen”.


Never seen one for C#, either.


Apologies for the plug haha

https://github.com/U8String/U8String

With that said, implementing it is not easy, especially in terms of ensuring it is as fast as built-in string or faster (because the project goal is to be faster than built-in string and Go implementation, and match or outperform Rust's String where applicable).

I can't imagine something like that being possible in e.g. Java or most other high-level languages - they simply do not expose appropriate low-level primitives when you need a "proper" string type.


Nim is also a strong player as a systems programming language. In terms of memory management, it's configurable, and by default you get ARC (no GC). I've written a hobby kernel (if you can call it that) in Nim[1] as well as Zig[2], and I found Nim to be much more ergonomic and approachable. The fact that Zig requires weaving an allocator through most calls that may allocate gets in the way of what I'm trying to do. I'd rather focus on core logic and have ref counting take care of dropping memory when appropriate.

One thing I wish Nim had though is true sum types with payloads. I think there's an RFC for that, but it's a shame it's not in the language yet.

[1] https://github.com/khaledh/axiom

[2] https://github.com/khaledh/axiom-zig


> The fact that Zig requires weaving an allocator through most calls

It does no such thing. Why didn't you just make a global allocator? Alternatively, you can have your allocator be attached to your objects.


Gleam is another strong candidate too


Gleam runs on the Erlang VM, and I believe it also compiles to WASM. That's pretty different than the other languages here. Cool language though, I am a big fan of the BEAM


> People have been making jokes about node_modules for a decade now, but this problem is just as bad in Rust codebases I've seen

I agree... but I think this is more indicative of a cultural problem than a language design/standard library scoping problem. A compact standard library is more resistant to scope creep. I don't want every language to end up like C++, and I think keeping the scope of the standard library small helps avoid that. On the other hand, popular third-party libraries that provide common functions have too many third-(fourth-?)party dependencies. Keeping dependency trees small should be a priority for such tools, but convenience trumps all right now so it isn't valued as it should be.


>A compact standard library is more resistant to scope creep

I don't see "scope creep" as a problem.

I see "Lack of standard library support for X brought 5 competing libraries for the same X functionality, no de facto standard, and projects using incompatible implementations and versions, maintained in different states of disarray" as a problem.

Never had a problem with Python standard library having "scope creep" for example. Where I do have problem is where there are several competing libraries and no single well supported one you can always depend on for some - which happens for things not in the standard library.


> which happens for things not in the standard library.

So to be clear you don't use Requests right?

Because there is HTTP client handling right in the Python standard library, and yet it sure seems like most Python programmers use Requests, because it's better.


Did you miss the part where I say "[and] no de facto standard"? Requests is a de facto standard, and I do use it.

But besides having a clear, well maintained, de facto standard being OK too (as I wrote), it's also great that I don't have to use Requests, as I can write for urllib3 and know that my project will just work everywhere with no added deps - which I do for simpler stuff.

Heck Requests itself is using urllib under the covers, which also has some benefits of uniform implementation of protocols and such.


You said that the stdlib needs to provide it first, you were very clear about that. So a "de facto standard" can't in fact exist, such claims while they probably feel correct to you make a mockery of your demand that the stdlib should do it first - if it did then Requests can't exist.

And for what it's worth I agree with you about that doing it first, on the whole a stdlib should aim to do it first if they're doing it at all. Take Microsoft's belated decision to ship a JSON implementation for .NET. Obviously meanwhile a de facto standard took hold, Newtonsoft, and of course the Microsoft library isn't a drop-in replacement. Result, more complexity, more training, negligible real benefit. I just disagree that they should aim to do it all.


>You said that the stdlib needs to provide it first, you were very clear about that.

Not clear enough apparently :)

What I said is: (a) I see "Lack of standard library support for X brought 5 competing libraries for the same X functionality, (b) no de facto standard, (c) and projects using incompatible implementations and versions maintained in different states of disarray" as a problem.

That's 3 things, and the 3rd occurs because of a lack of either or both of the first two. So, I didn't say stdlib "needs to provide it first", except if by that you just mean the order in which I wrote it.

To spell it out, what I'm saying is: Not having something in stdlib AND not having a defacto standard for it either, and THUS getting many incompatible implementations - that's a bigger problem than stdlib scope creep.

>And for what it's worth I agree with you about that doing it first, on the whole a stdlib should aim to do it first if they're doing it at all. Take Microsoft's belated decision to ship a JSON implementation for .NET. Obviously meanwhile a de facto standard took hold, Newtonsoft, and of course the Microsoft library isn't a drop-in replacement.

It's good to do it first and get it right, but they can as well also do this: embed the emerged de facto standard as is, if they decide to do it later.

That's what Java should have done with Joda-Time for example.


Maybe the problem is that you don't know what "de facto" means? You seem to imagine that the first prototype of Requests was uploaded and aha - it's a "de facto standard" so now we're fine.

No, the way "de facto" standards happen is it becomes apparent over considerable time that in fact this is what everybody does, it is the standard de facto. This is how the IETF's Internet standards (as opposed to RFCs) work for example. Requests became a de facto standard, meanwhile other people wrote libraries "for the same X functionality" which did not.

So in fact the thing you claim is unacceptable and happens third actually happens second and the state you claim is OK and precludes that, in fact happened later and as a consequence of the thing you think is unacceptable.


one way to approach this with compromise is to keep the shipped standard library small, but have officially maintained packages from the language foundation for exceedingly common functionality.

That gives an avenue for exploration, and tight controls and quality control on packages, without bloating the standard library.


What’s so bad about “bloating the standard library”? Java has an incredible amount of stuff and it’s incredibly useful.

The compiler should be able to eliminate the unused parts during compilation, it’s not like adding JSON to the standard library is going to make a “hello world” program bigger.


>The compiler should be able to eliminate the unused parts during compilation, it’s not like adding JSON to the standard library is going to make a “hello world” program bigger.

I agree. I would argue against keeping standard libraries terse as an appropriate goal for a language because terse (or alternatively, inadequate) standard libraries are exactly what necessitate third party libraries for common functionality, which in turn lead to this dependency hell that so many people dislike so much. Not to say that a standard library should do absolutely everything, but the more capable a language is out of the box, the fewer third party libraries a language ecosystem will come to rely on, meaning that methodologies will be more consistent in more code and dependency management will be simpler.


Mostly because it holds the language back due to backwards compatibility concerns.


I wish languages took this approach, not sure why they largely don't.


The problem is that a typical built-in standard library can't change or remove any APIs, even the ones that turned out to be mistakes or became obsolete, without breaking users. Eventually all the fancy standard parts of it get easier, faster, cleaner 3rd party replacements, and everyone is saddled with a dilemma of using the meh stdlib or a 3rd party dependency anyway.

OTOH functionality shipped as external libraries can be versioned, upgraded, and replaced by users individually, on their own schedule. The obsolete cruft can be pulled in by old projects that still need it, without burdening new projects with the same old APIs (but they still can interoperate, because multiple versions of the same library can coexist!)

For very long-lived languages (which Rust has ambition to be) a one-size-fits-all built-in big standard library is a big risk. The bigger it is, the less obvious and less timeless the APIs are. A basic Vec is easy to get right on the first try, but networking is a moving target, fashions for file formats come and go, threads/green threads/callbacks/promises/await come and go, parsers can be designed in many differently-bad ways, and OSes change — we don't have non-binary files any more, but ossified languages do! WASM/WASI has upturned even Rust's barebones standard library, and Fuchsia would even more.

This risk of maintaining mistakes forever eventually makes maintainers of the standard library overly cautious, and design very defensive opaque non-committal APIs.

IMHO languages should not have any single built-in unversioned library at all. Pull in what you use, in a version that makes sense this year, without the baggage that every API has to remain unchanged forever, and still be sensible and efficient half a century later, even on an OS that doesn't exist yet on hardware that hasn't been invented yet.


If cargo did literally any garbage collection then it wouldn't be a problem. Most of the nonsense in a target/ directory are stale incremental build artifacts.


The problem as I understand it is not about the stale dependency artifacts in xyz directory but with the (transitive) (hell) dependency management itself. cargo makes it incredibly easy to pull in the shitload of dependencies of whom you don't have a slightest idea.


Just a correction: most std C functions don’t allocate. strdup does but it was only recently adopted into the standard, it was previously an extension.

Similarly zig’s stdlib shouldn’t allocate behind your back, except for thread spawn where it does: https://github.com/ziglang/zig/blob/5cd7fef17faa2a40c8da23f0...

Generally speaking, it’s as mentioned just a convention. A zig library might not allow its users to pass allocators for example.

In C++, stl containers can take an allocator as a template parameter. Recent C++ versions also provide several polymorphic allocators in the stdlib. You can also override the global allocator or a specific class’ allocator (override placement new).


For spawning a thread you're literally asking for the stack of the thread to be memory mapped. I fail to see how this is "behind your back".

You also linked specifically to the POSIX threads implementation of thread spawning, which is by definition supposed to play nicely with the libc posix threads API, which expects you to use the libc allocator in combination with POSIX threads API, so that's what it does.

You might as well accuse the mmap() function in the zig standard library of allocating behind your back.


It’s something Zig touts when compared to other languages(1). The idea is that in the end it’s a convention that an allocator needs to be passed to indicate that the function allocates, which not even the stdlib adheres religiously to. I’m fine with it since I do believe a library writer should know best what works with their library.

1. https://ziglang.org/learn/why_zig_rust_d_cpp/#no-hidden-allo...


The behind your back part is probably referring to the Args payload bouncing through a heap allocation. It isn't explicit on the signature it's making an allocation. The function has no choice though unless you leave it up to the user to keep the payload allocation live until the thread terminates.


Exactly


he was pointing out that it does, and he was not applying to it the label "behind your back, that's why he said "except for". the wording makes perfect sense.


> Similarly zig’s stdlib shouldn’t allocate behind your back, except for thread spawn where it does

shouldn't X, exceptor for Y where it does [X]

It makes perfect sense that GP was saying "thread spawn allocates behind your back".


> strdup does but it was only recently adopted into the standard, it was previously an extension.

Does it really matter when people were already writing code with strdup when the zig and rust creators were in middle school ? It was already there in BSD 4.3 (1986) apparently.


GNU obstacks also already exist for as long as I can remember.

Everything old is new again...


It really doesn’t matter and that’s my point.


WRT the standard library. There are tradeoffs. Yes, a big standard library that has everything you need is great, as long as it is well designed, well maintained, and has some mechanism to prevent the whole thing bloating every executable.

But executing well on that is difficult for a number of reasons:

- the release cycle of the standard library is (usually) tied to the compuler, which often means it can't evolve quickly.

- backwards compatibility is a much bigger deal for the standard library. Which means if you have a big library you will eventually have a big pile of deprecated APIs you will probably never be able to actually remove. It also feeds into the next point - new development in stdlib can be hindered by the need to get it right the first time. - the creators of the language probably aren't experts in every area. Which means in order to include say, a good compression library you either need to recruit someone who is an expert to write and maintain the package in your stdlib, or link to a library in another language, and the latter is still really an external depency. - a large stdlib is a large maintenance burden

Python has a large standard library, but it has parts that are deprecated and/or largely abandoned, including packages for technologies that are no longer widely used. And its http packages are rarely used directly because third party packages like requests or urllib3 are better.

Go's standard library that is praised here isn't perfect either. The log and flag packages are often insufficient, and the implementation of net.IP is suboptimal [1].

I think probably the best balance is to start with a fairly small standard library, and when community libraries for key functionality become popular and stable, then pull them in to the standard library or otherwise make them official. And maybe have official libraries that are external to the stdlib and maybe have less rigid backwards compatibility garantees that can move faster than the language itself.

[1]: https://tailscale.com/blog/netaddr-new-ip-type-for-go


Yes. The article also says:

>Having a large standard library doesn't mean that the programmer shouldn't be able to swap out implementations easily as needed. But all that is required is for the standard library to define an interface along with the standard library implementation.

But "Just have the standard library provide an interface" also fails when the interface itself is not guaranteed to be correct. In Rust for example, the original definition of the Future trait in the futures 0.1 crate was incompatibly different from what it eventually became in the futures 0.3 crate, and the futures 0.3 version is what was imported into libstd as `std::future::Future`. If the 0.1 version had been defined in libstd it would not have worked with async-await, so either it would've stifled async-await development, or it would have to be deprecated but still kept around for backward-compatibility alongside the newer trait.


> The log […]

There’s log/slog now. Is that sufficient?

> and flag packages are often insufficient […]

Yes. Flag is really primitive and awkward to use. On the other hand, in a typical “imperative shell” style CLI, sharing flags between packages is not exactly crucial, which is the main reason to put it in std (interop). I’ve never really suffered from importing a 3p flag parser in my own code.

> and the implementation of net.IP is suboptimal [1].

Fixed for a long time. See net/netip (although I am annoyed it’s not universally available in all net functions). It’s based on the implementation from your link.


> People have been making jokes about node_modules for a decade now, but this problem is just as bad in Rust codebases I've seen.

Just w.r.t the issue of size (not of scale) - cargo caches sources in ~/.cargo so they're not shared by every project on the system. Additionally, rustc uses dead code elimination by design so there's not nearly as bad an issue as in JS where it's not possible to tree shake out every unused class or method.

Most of the bloat of target directories are stale incremental build artifacts, because cargo does not do any automatic cleanup.


Automatic GC of the cargo cache is available on nightly. Really looking forward to it landing on stable. For large projects I've seen the cache grow to hundreds of GB, and not noticing it until the compilation fails because the disk is full...

https://blog.rust-lang.org/2023/12/11/cargo-cache-cleaning.h...


I've had the same issue with conan, and realizing only the. That is why our pipeline took so long, copying tens of GB to the runner. Cargo is mild by comparison.


There's also Odin[0] too. I experimented them all and Odin is pretty nice. Nim is also good too but a lot more features.

But - I concluded that language matters a lot less compared to APIs. Yes, the language should have enough good features to let the programmers express themselves, but overall well designed APIs matter a lot more than language. For example -tossing most of the C stdlib and following a consistent coding style (similar to one described here -[1]), with using Arenas for memory allocation, I can be just as productive in C.

[0] - https://odin-lang.org [1] - https://nullprogram.com/blog/2023/10/08/


> Similarly, if you go looking for compression support in Rust, there's none in the standard library.

I have no idea why that is considered a criticism.

Compression support is the kind of thing which is going to evolve over time and compression formats will change and performance and APIs of different libraries will get better.

You avoid the problem where the compression library in the std library is the one that experienced programmers tell all the noobs that they should never use.


> You avoid the problem where the compression library in the std library is the one that experienced programmers tell all the noobs that they should never use.

This is not inevitable. In the .NET world we have great compressors in the standard library. Zip and gzip aren't going anywhere. I totally disagree with the idea that we would have been better off without those compressors in the standard library.


I'd replace "similarly" with "fortunately"


>Zig is practically alone in that if you write the next() method and and don't pass an allocator to any method in the next() body, nothing in that next() method will allocate.

It's not quite as simple as that. This can allocate just fine:

    fn next(self: *Self) {
        self.array_list.append(5);
    }
... where `array_list` is of type `std.ArrayList(u32)`. `std.ArrayList(T)` takes an allocator at construction time (`array_list = std.ArrayList(u32).init(allocator);`) and stores it as a field, so its methods don't need to take an allocator parameter.

(And yes, there is a version of `std.ArrayList(T)` called `std.ArrayListUnmanaged(T)` that does *not* take an allocator in its constructor and does take an allocator in all its methods that need to allocate.)


That would not compile because the append operation can fail, and errors cannot be ignored.


The exact code written won't compile but it gets the idea across - the original claim from the blog post was just false as far as I can tell. We can't in fact be sure that no allocation took place just because we didn't provide an allocator. It's a convention at most, probably for code that's written by Zig programmers who prefer this style, it's true but you could say the same about any convention for any language.


Right. The "will this or won't this allocate" analysis also needs to take into account all values being used by the code because they might be using allocators internally.

I haven't written zig code in a few years so I don't know if people have developed a convention for it. That said, there is a reason types like `ArrayListUnmanaged` exist. If you have some complex type with a bunch of different collections as fields, you probably want to use the same allocator for all of those collections, so there's no reason to have the collections themselves store redundant copies of the allocator. Instead you can have the allocator held by the top-level type and use the `Unmanaged` collections as fields. This also means that you do end up needing to reference `self.allocator` explicitly when calling `.append()` etc, satisfying the original claim of TFA.


Zig does not have a good package manager, does it?


It has one now, but I haven't played around enough to call it "good" or "bad". It feels more like a hack than something like cargo/pip/nuget/npm/etc, but it's a lot better than the normal c/cpp experience.

build.zig.zon discussion: https://zig.news/edyu/zig-package-manager-wtf-is-zon-2-0110-...

Example howto: https://ziggit.dev/t/how-to-package-a-zig-source-module-and-...

Actual docs: https://ziglang.org/download/0.11.0/release-notes.html#Packa...


Are you experienced with vcpkg or just repeating old facts? Considering you are merging c and cpp seems the earlier. I would take vcpkg over npm or pip any day.


The problem with vcpkg is the problem with conan, bazel, meson, or hunter. There isn't just one, and they don't reliably play well with each other so that when I receive or deliver a dependency to some other organization, they work together.


I think the upcoming 0.12 release will be a good time to try it out. It was a major focus during this release cycle.


> Zig does not have a good package manager, does it?

This describes upcoming changes in zig 0.12 https://ziggit.dev/t/build-system-tricks/3531


You shouldn't need one. Use Nix instead, they did 99% of the work for you already.


I don't believe Zig wants to have "package manager" at all.

It has the ability to pull in a package/module. However, I believe that having a "package manager" is explicitly an anti-goal.

I'm in agreement. It's become pretty clear that having any form of "package repository" without also allocating the necessary manpower to curate that repository causes disasters.


> I don't believe Zig wants to have "package manager" at all.

? it's literally one of the project's main milestones: https://github.com/ziglang/zig/projects?type=classic


We're probably talking past one another due to my imprecision:

Zig can manage packages. It can pull in a Zig package using a reference to a Git repository on GitHub, for example. It can pin that to a hash and cache it. It can specify dependencies. This is, strictly speaking, a "package manager" but is not what people normally think of when you mention that.

Zig does not have blessed mechanisms that pull from a central repository of packages. Having a central repository is normally what people think of when they talk about a "package manager" (node, cargo, deno, etc.). Yes, those managers could conceivably not use the central repository and specify everything explicitly, but nobody ever uses them like that.


So same as Golang? Still package manager though.


That sounds like a horrible decision


it has package manager as golang has. Just no central repository of packages.

https://ziggit.dev/t/build-system-tricks/3531


> 3 years? 68 lines of code. Is it not safer at this point to vendor that code?

Why, though? There is no bitrot or build flakiness with unnecessary changes if no changes happen, so what's the actual argument for this? Why is it missing from the article?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: