I see where you're coming from but IMO this "Cheshire cat" idiom to hide the implementation details is not exactly like private, it fact it can do things that private can't do, and doesn't do things private does.
The advantage of hiding your state behind an opaque struct with builders and accessors is that you can change the size and layout of said struct without it being a breaking API change. The code remains binary compatible even, no need for a recompile if you're shipping a shared lib. This is something just using private members doesn't achieve since with private members the compiler still knows and uses the layout of the struct, it just forbids access to it.
That's why you can even find C++ libraries use this idiom even though C++ obviously has `private`. It's about having a stable, opaque API.
On the other hand because of this added indirection, there's usually a greater performance hit to accessing these opaque structs since code can't be inlined. With private since the compiler can still see inside the struct, it's able to more aggressively optimize the code. You can also store the objects directly on the stack without requiring malloc.
IMO the right way to have private members in C structs is... to document that members shouldn't be touched directly, perhaps using a special naming convention or embedding the publicly-accessible members in a dedicated sub-struct to prevent confusion.
> IMO the right way to have private members in C structs is... to document that members shouldn't be touched directly, perhaps using a special naming convention or embedding the publicly-accessible members in a dedicated sub-struct to prevent confusion.
Reminds me of that one time when glibc broke the whole of Debian for s390 architecture by changing the fields in the jmp_buf struct (which is public): [0].
To achieve a reasonable level of encapsulation in C, a header file must be seen as a public-only interface. It should declare only the structs that are relevant for the user of the module. If that's "struct my_module_handle { ... }", declare it and document the corresponding accessor and modifier functions. Everything else must reside in the C source file with internal linkage (static storage class). The whole source file is your implementation.
There is an anti-pattern where header files are used for all the declarations needed internally by the source file. Including (pasting verbatim with the preprocessor) that file from another module would bring in all the unnecessary declarations.
I think what you say makes complete sense at module-level (as in, for a standalone lib for instance) but I never bother segregating things internally within a lib/module/exe and rely on good documentation and coding practices to avoid having member mutations all over the place.
If I code in Rust or C++ I can use namespacing and public/private to give every single object in the codebase a clean interface, but in C doing that is just frustrating, not to mention potentially inefficient.
Opaque pointers usually impose the restriction on the API such that in order to use the handle one has to dynamically allocate the object on heap. That's a quite unfortunate tradeoff IMO.
Why not have the API take an allocator as a parameter? Pass in pointers to malloc, realloc, and free. Then the library can use your static allocator (or some 3rd party malloc such as jemalloc, or your own arena allocator for that matter) or it can default to the system one if you pass null pointers instead.
Callback based APIs are better, but not good enough. The problem remains that the user still has to deal with some potentially undesirable constraints:
* I may be running in an environment where allocating memory is an asynchronous operation. A callback based API forces me to block, which can cause unpleasant side effects like halting an event loop.
* In my experience, some libraries with custom allocation hooks forget to define one or both of the two basics in the callback signature: a "context" or "user data" parameter, and a way to return an error.
The proper solution is to decouple memory allocation from object initialization entirely. There are two different approaches to this:
1. Expose get_foo_size() and get_foo_align() functions that return the size and alignment that the opaque foo struct needs (at runtime, of course). Then I as the user can allocate that memory, and initialize my opaque foo objects in-place:
size_t foo_size = get_foo_size();
void* buf = alloc_aligned_memory(foo_size * 1000, get_foo_align());
for(size_t i = 0; i < 1000; ++i) {
int err = foo_init(buf + i * foo_size, /* params here */);
}
2. Define foo_init(void* buf, size_t len, ...) which attempts to initialize an opaque foo object in the buffer defined by [buf, buf+len). If the buffer does not have enough space, return an error. Otherwise, return the number of bytes actually used by the object.
That second method works fine if you just want a buffer of a bunch of foo’s and they all happen to be the same size. Not so fine if foo is a data structure in its own right with growable capacity.
That is possible and reason why I said "usually". Still, it's an unfortunate complication because you need to manage the pool of your (static) objects now. Also, it's not possible to use stack with this scheme.
// in <opaque_foo.h>
typedef struct opaque_foo opaque_foo_t;
size_t opaque_foo_sz(void);
void opaque_foo_init(opaque_foo_t* foo)
// in your code, which you could write a helper macro for if you were so inclined
char opaque_foo_mem[opaque_foo_sz()];
opaque_foo_t * my_foo = (opaque_foo_t*) opaque_foo_mem;
opaque_foo_init(my_foo);
This approach violates alignment requirements causing misaligned accesses that on some architectures is mitigated by generating extra instructions while on some other architectures it's a violation (e.g. segfault).
There is no way for a compiler to infer the alignment requirement of a struct because it does not see its definition. You would always have to align the char buffer by hand but you cannot do it for the same reason compiler cannot.
What you can do is to always greedily align the char buffer to the strictest (largest) fundamental requirement for that platform - in other words alignas(max_align_t).
yes, though you can fix that with compiler flags (or, #pragma if you want strict aliasing elsewhere in your code)
alternatively, gcc supports VLAs in unions, but I don't think clang does, but that makes it extra annoying to do.
edit: apparently you can probably apply the may_alias attribute to the type? Or you could try using transparent_union. No idea if clang supports either...
At that point, it is IMO better to obtain that block from alloca()[1]—it’s not standard, but where it’s available the compiler will treat the result as untyped for the purposes of aliasing. (If you’re on GCC/LLVM, __builtin_alloca_with_align is also an option, although note that the memory it returns may not outlive the current block—similar to a standard automatic variable, but unlike memory from traditional alloca.) ISO C has a gigantic hole when it comes to obtaining and recycling untyped memory, pretending the hole is not there isn’t going to help.
// in your code, which you could write a helper macro for if you were so inclined
opaque_foo_t *my_foo = alloca(opaque_foo_sz());
opaque_foo_init(my_foo);
Are unit tests common in C? In the mid-2000's, I worked on an "enterprise" system, written in C and C++. There were about 300,000 lines of code, maybe 10 tests. This thing was the core of a billion dollar business
That's very broken; more usual practice at the time was to have large numbers of tests, aiming for good code coverage. I've sometimes seen too many full-tool tests and few true unit tests (tests that just test functions for correctness), but "maybe 10 tests" is frightening.
I like the way that Windows does it, where they have as the first element of the struct a double-word size (dwSize) element that records in 32-bits the size of the structure. The size essentially acts as a version identifier, as long as you never rearrange the fields and only append fields for new versions. The opaque functions test the value of the dwSize element to see what actions can be performed on the object.
The code that you develop can still access the member fields directly, and those accesses can be inlined and optimized aggressively by the compiler.
Link-time optimization means that you're probably not going to take that much of a performance hit.
But yes, opaque structs do enforce that it will be treated as a plain pointer, and the compiler (usually) cannot treat it as an aggregate of variables.
If a linker did that, changing the layout would be an ABI-breaking change. I think this opaque struct design is most common for dynamically loaded libraries, where link time optimisation does not occur (unless dynamic linkers got a lot more fancy recently).
The Java analogue of opaque structures is factory methods. Their only purpose is to hide the new keyword. That's necessary precisely because new introduces a hard dependency on the constructed type at the binary interface level, it gets literally emitted into the byte code. The supertypes returned by those methods are just there to serve as compatible pointers to the real types.
That seems like a straw man to me. He never said it was exactly equivalent - he said it provided encapsulation and isolation. 'private' is another language's mechanism which does that but they're obviously not identical.
> The code remains binary compatible even, no need for a recompile if you're shipping a shared lib. This is something just using private members doesn't achieve since with private members the compiler still knows and uses the layout of the struct, it just forbids access to it.
There is somewhat common PIMPL idiom to work around the binary compat issue. Iirc there were some macros floating around to make it easier to manage.
What do the macros do? Isn't this as easy as forward declaring the impl class, adding std::unique_ptr<Impl> as a private field and have public methods refer to the field? I'm struggling to understand why macros would help here.
Perhaps the macros help you forward methods on the outer class to methods on the impl class? While your approach of having public methods refer to the field also works, it’s nice to have public and private methods in the same place (the impl class’s definition) and using the same syntax (neither having to go through the impl field).
I'm also only familiar with this idiom in C++, but based on the description in the parent comment, I suspect that this is sometimes used in C too, in which case you obviously can't use unique_ptr or private fields; maybe macros might be a way to avoid having to write a bunch of boilerplate to achieve a similar effect?
> On the other hand because of this added indirection, there's usually a greater performance hit to accessing these opaque structs since code can't be inlined.
Is this still true? Link-time optimization, which includes inlining, seems to be all the rage these days.
For static linking quite possibly, although you could still have un-optimizeable sources of overhead because of having to indirect through pointers everywhere, for instance in nested structs. Something like a->b->c->d is going to be more expensive than a.b.c.d, for instance.
I'm also not sure that dynamic linkers are typically smart enough to do this type of optimization, but I admit that I'm not familiar with the state of the art in dynamic linking technology.
I wouldn't call this equivalent to the private keyword:
* It does not work on a field, but on the whole struct. Either all fields are public or all are private. The latter case forces you to write getters/setters for properties you want users to access, and that in C can be even more cumbersome as you need to write the definition in the .h and the implementation in the .c.
* It breaks, without an explicit and specific and error message, several actions. As it's mentioned in the article, malloc isn't possible, but neither are copies by value, sizeof() breaks... And those will break with an "incomplete type" error message, not a "this type is intentionally made private", which can add confusion in some situations.
* Completely incompatible with inlining code. Considering a lot of people still use C precisely for its performance, I think this can be a drawback in a lot of usecases.
I honestly think that hiding struct declarations should be done sparingly, and preferably limiting it to cases where it's actually necessary (for example, a library that doesn't expose internal struct fields so the same executable works with different versions of the dynamic library; or proprietary libraries that want to expose as little as possible). In the end it's still easy to bypass, and the distinction between header and code files already provide an indication of which functions you should use and which ones you shouldn't.
I guess I'm reading the article a different way, because what I'm getting is that the author is suggesting the C analogue of a class is an entire header/source "module".
That would make more sense, since you can use the header to craft an interface in which some components are public and some are private.
They called this "modular programming" where "classes" were represented by opaque pointers and actions could only be performed on those pointers using the functions defined in the module.
In the production C code I've written that makes use of modules with interfaces that define public and private variables and functions, I tend to avoid using pointers where reasonable. I'd prefer to either give access to the variable or provide getter and setter functions.
What's the benefit of "opaque" pointers?
For background, my C code is almost entirely on microcontrollers. So I'm looking at it from that point of view. If you're talking about event-based applications running inside a full operating system, I've always stepped up to something like C#, so I don't have much experience with function pointers for that kind of work.
For those accustomed to always using open source, the idea of hiding the implementation must seem odd.
But consider the position of developer who implements a shared library that is distributed in binary form only. In this case, the benefit of opaque pointers for the development of a library is that the implementation remains private at the source level. One could, of course, reverse-engineer the binary but few people would do it.
If you define your structures in a public header -- and this includes C++ classes and templates with private members -- one can easily see the implementation and, with a few casts, start munging the guts of your objects and baking in a hard requirement on a specific layout and / or version of your library.
yes, there is no benefit on micrcontrollers where everything is usually compiled together as one image (ok, I imagine you COULD design a microcontroller firmware with dynamically loadable sections... but let's not think about that).
But for dynamic linking, this is how you avoid breaking ABI while maintaining forward flexibility.
Yes, that's usually how it goes, each header/source is like a "class". And usually, what you have is a "main" struct that represents the "class" itself. In this case that main struct can't have both private and public fields.
> And usually, what you have is a "main" struct that represents the "class" itself.
Yes that tracks with the work I've done. It could even be so simple that only one value needs to be exposed through a getter (along with a couple "methods").
An ADC in an embedded system could operate like that.
> Yes, that's usually how it goes, each header/source is like a "class". And usually, what you have is a "main" struct that represents the "class" itself. In this case that main struct can't have both private and public fields.
You can, in a defined way (i.e., no invoking of UB). I just didn't put that in.
Out of curiosity, how would you do it? The ways I've seen require using another "public" struct and either casting public to private or using a nested pointer, each with their set of problems. In either case, it's still a bit hacky and still each struct is all-or-nothing public or private.
This is what Niklaus Wirth was getting at, in his "push back" on conventional class-based OOP. In Pascal, they are called "units", as oppose to "modules". In his Modula/Oberon, they are called modules.
Many of the newer languages have modules or packages, but interface well with C, thus give the benefits of what the article was referring to. To include that various newer ones don't use class-based OOP, just as C doesn't.
> malloc isn't possible, but neither are copies by value, sizeof() breaks
Those are implementation details which are deliberately being hidden.
> Completely incompatible with inlining code.
They are as incompatible with code inlining as public/private modifiers in C++ are. That is, LTO is your best friend here. Also, have you ever tried to maintain binary compatibility with several versions of a third-party C++ library that keeps adding/removing private fields to/from its classes?
The code sample still isn't equivalent. Now you have yet another pointer, which implies another allocation, another source of possible memory leaks and mistakes, and a separate memory space that will hurt the cache.
> Those are implementation details which are deliberately being hidden.
I know they're being hidden deliberately, but in C++ "private" doesn't break malloc (new), copies by value or sizeof. Or stack allocation, to add to the list.
> They are as incompatible with code inlining as public/private modifiers in C++ are. That is, LTO is your best friend here.
public/private aren't incompatible with inlining in C++. That is, you can call class functions that access private members and the compiler can inline those functions. Also, LTO is not always enabled by default, and doesn't always inline the things you want it to inline.
> Also, have you ever tried to maintain binary compatibility with several versions of a third-party C++ library that keeps adding/removing private fields to/from its classes?
I mentioned binary compatibility as one of the reasons one might want to do this. However, if you have a third party that doesn't care about API compatibility I doubt struct fields are the only thing they're going to change constantly.
Maybe have your public fields defined as a second struct, and then you can cast the pointer to your struct to the concrete struct that has all the public fields. This has the restriction that all public fields must be at the start, and you must make sure to maintain the same order between the two structs.
At this point though, I think I honestly would prefer setters/getters.
struct MyClassPublic {
int x;
int y;
...
}
/* using it */
MyClass *myclass = myclass_create();
((MyClassPublic *)myclass)->x = 5;
Hiding members in this way is only possible with pointer indirection, which isn't satisfying.
However, having only a boolean private/public access state isn't generally satisfying either. It often leads to violation of the principle of separation of concerns when all the functions (methods) acting on certain "private" fields need to live in the same class.
In simple classes, like std::vector, it's possible to get away with private. But in many cases that are more complex than that, it seems to me that the best approach is still to expose the data and to be just very clear about the exact purpose of each member.
> I wouldn't call this equivalent to the private keyword:
I didn't mean to imply it is, I lead with:
>> All too often someone, somewhere, on some forum … will lament the lack of encapsulation and isolation in the C programming language. This happens with such regularity that I now feel compelled to address the myth once and forever.
It's only about the myth that C doesn't have any level of encapsulation or isolation.
Nearly every rebuttal on the internet starts with misinterpretation of what's being said. Good reading skills are extremely rare, it seems.
To be clear: you tried to say "C offers encapsulation/isolation". People read "This solution is equivalent to 'private'", an almost completely unrelated statement, and then respond to that.
That could typically be classified as a "Straw man fallacy"[0], but I believe people who do this in many cases simply do not have the necessary reading skills to understand what proposition has been made, and therefore honestly believe themselves to be reasoning correctly (i.e. without fallacy).
Reading comprehension used to be a topic at school when I was a child. I suppose that's no longer the case??
>To be clear: you tried to say "C offers encapsulation/isolation". People read "This solution is equivalent to 'private'", an almost completely unrelated statement, and then respond to that.
>That could typically be classified as a "Straw man fallacy"[0], but I believe people who do this in many cases simply do not have the necessary reading skills
Fyi... the author's article that this thread is about has in bold heading: "Myth: C has no equivalent to “private”"
So, a reasonable interpretation of the text following that headline is how to use C Language constructs to dispel that myth.
As the other commenter said, and I want to reiterate, what led me to start with the talk about the private keyword is the big bold header that says "Myth: C has no equivalent to “private”" and the first code snippet that shows how you can have "private" fields in C++ and the following ones that show an "equivalent" implementation in C. So I think it's reasonable to infer that the author is talking about encapsulation and "the private keyword" as somewhat interchangeable (and I don't disagree, for this discussion they're practically the same). Not only that, but the points I made are about the implementation shown in the article, which is independent of whether the talk is about "private equivalency" or "lack of encapsulation": the gist of it is "yes, you 'encapsulate' things but not in the same way and it comes with disadvantages that aren't really there in other languages".
With all that said, I don't think all that condescending talk about the lack of reading comprehension or skills, without actually going into the arguments themselves, is really necessary or positive.
I mean, there's a big header that says "Myth: C has no equivalent to “private”" and a code example about the private keyword, so that's why I started saying that. Even then, my points still apply: this isn't really equivalent to how encapsulation works in other languages due to the lack of granularity and the "extras" of all the usual language behavior that stops being supported.
> I mean, there's a big header that says "Myth: C has no equivalent to “private”" and a code example about the private keyword, so that's why I started saying that.
You are correct; I should change the heading to "Myth: C provides no encapsulation". I don't want to do it now, as the discussion is still ongoing and making this change now while people are commenting on the 'private equivalence" aspect would be gaslighting those people.
> Even then, my points still apply: this isn't really equivalent to how encapsulation works in other languages due to the lack of granularity and the "extras" of all the usual language behavior that stops being supported.
So? The myth the article addresses is "C has no encapsulation", not "C has great encapsulation".
That C provides stronger encapsulation and stronger guarantees for upgrade migration is the specific myth that I am trying to address.
Yes, you can do something in C that somewhat resembles encapsulation. You could also do it in Python with a decorator that inspects the caller. You can also do inheritance and virtual functions in C. You could do anything in a Turing complete language. But usually when talking about "X language has Y" refers to whether the language has Y as part of the specification. In this case it isn't encapsulation as part of the specification (that's why it comes with all those downsides/extras) but as an artifact of other aspects of the language, that's why I say it's not really equivalent.
To the author : your explanation can be interpreted as "correct" but also be aware that -- for some readers -- your argument is a variation of the Turing Tarpit: https://en.wikipedia.org/wiki/Turing_tarpit
In other words, the 2 different possible receptions to your post:
- YES, file-level modularity with opaque structs is _equivalent_ to class private members --> for those mindsets already sympathetic to C Language
- NO, using file-scoping rules and structs is not equivalent to class private members because it's a bunch of extra ceremonial syntax to implement a workaround. (The "Turing Tarpit"). It's using the opaque struct as a "design pattern" and as Peter Norvig famously said, "Design patterns are bug reports against your programming language."
I would argue that C++ "pimpl" design pattern brings more "ceremonial syntax" than the C equivalent.
C++ style (without "pimpl") requires recompilation of the whole dependent tree when adding a new private member function. It's encapsulation only in a formal sense
I think there’s a better example, but whether it applies it depends on one of two major divisions of C code: that designed to run on systems with a MMU (as typically used for Linux and other large OSes) — where virtual memory makes dynamic momory allocation practical — and those without — which today is primarily the very large world of embedded devices.
For the latter, the industry best practice is to avoid malloc(), except maybe at init time, and instead allocate memory statically. And in that use case, you break your code into modules, which can contain private data, public data, private functions, and public functions.
In other words, building an app out of C modules is a lot like building an app in a more modern language just using static classes, with no instantiation. And that design pattern — which is extremely common in the embedded world — we have a direct equivalent to the “private” qualifier, which is “static”, which restricts the rest of the app from accessing so-marked file-scope variables and functions.
Where this breaks down — as always with C — is when you need multiple instantiations of a module, which modern programming languages refer to as an object. The closest we can get in C is to pass the module’s public functions a struct with some sort of data structure containing the object’s n9n-static data. And the author explains, there are standard ways make that data structure opaque to calling code, but those are definitely workarounds to language shortcomings.
But the bottom line is that those language shortcomings — the lack of objects and a private qualifier for its members — are only shortcomings if you need those features, and in the embedded world, most applications don’t, they only require all the advantages offered by C. So as always, this is about picking the right language for the project, there’s no one size fits all.
When I first encountered Object Orient Programming around '97 or so, all of the literature focused on the three pillars of OO: encapsulation, inheritance and polymorphism. I was struck at the time at how OO mostly just formalized and gave specific names to what good programmers were already doing informally.
> I was struck at the time at how OO mostly just formalized and gave specific names to what good programmers were already doing informally.
I mean, that's the point of everything, isn't it? giving names and defining good practices so that everyone can benefit from them? because even today you can find a LOT of codebases which are an imperative mess
A lot of comments/articles complaining about Object Oriented Programming -- especially the style implemented by C++ and similar languages -- start with premise that OOP is some academic prescription declared from up high like Moses and 10 Commandments.
But the reality is that the best practices of imperative programming where already much like object oriented programming and that OOP is a formalization of those practices.
Strongly agree. Good C developers pretty much instinctively do structured programming[1]. While it's superficially dated, the core concepts are still all very much applicable to the working programmer today and even largely paradigm independent.
Indeed, I see many of the concepts as extensible modules, as one would use from Modula-2 or Object Pascal.
That is why Oberon takes the spartan approach of only having extensible types, everything else is just like in Modula-2. Later descendants adopted a more mainstream approach.
Likewise how OOP is done in Ada or Modula-3 isn't quite like in mainstream approach.
Or when modules can be manipulated like variables, and given type signatures, we get Standard ML functors, with overlapping capabilities to OOP.
“Hidden” from whom and how? “Same fields in beginning” doesn’t characterize multiple inheritance. “Tagged structs” are just one implementation strategy for one kind of polymorphism, and “tagged” and “struct” are themselves jargon. Using and knowing the precise terminology is important when communicating technical concepts.
>> Just to be clear, C is an old language lacking many, many, many modern features. One of the features it does not lack is encapsulation and isolation.
The OP article seems to be someone trying to cram modern concepts into a language that explicitly rejects them. A symptom that the OP should've probably just used another language.
Sure, but it still goes against the grain of the language. You could write a similar article explaining why everyone who thinks C doesn't have automatic memory management are wrong. Opaque structs are relatively annoying to work with especially if sibling modules in the same package have good reasons to manipulate the private parts directly. It is not nearly as convenient as it is in a language with better support for encapsulation (e.g Java). Most of the C code I write do not encapsulate anything. It's not worth the bother. Especially not when unit-testing for which encapsulation would force you to write lots of redundant getters and setters just for the unit tests themselves. My view is that you simply shouldn't use C if you need encapsulation.
I'm appreciative for the person who wrote the article.
These are my favorite types of debates and while I disagree with it for much better articulated reasons already mentioned, primarily the exploitative nature of header files used as security through obscurity, I think it stirs up a lot of debate and keeps the spaghetti meatball of knowledge around patterns/anti-patterns, "best-practices", etc. moving forward in a much more passionate way.
I think this is the truest sense of the term "hacker" which lives up to the title of the site pushing us into these debates. Putting stuff together that doesn't always work as intended or expected but arguing for and against it.
A long winded thank you to everyone, OP and all the threads responding.
I've seen code that prefixes private members with an underscore and adds a comment saying that its private. Not saying that's great but it does send a message.
This is a much stronger form of encapsulation than public/private. You basically have to do this if you have a generic API that wraps system specific APIs to avoid polluting your code with system specific includes/types/functions/etc.
> My Makefiles already have rules to automatically generate the interface so that the C code I write in this manner is callable from within Android applications.
I don’t know a lot about C but this interests me. Can anyone point me to where I might learn more about how this works?
The funny thing is each time when you find C has some “hacky” way of doing things other programming languages can officially support, then you find that hacky is probably the only thing you actually need.
These will have different sizes though, so it’s only safe to cast when used off the heap and where it’s memory is allocated with the larger variant, right?
I guess my point is that there’s a whole bunch of caveats in regards to safety that need to be considered in your solution.
That seemed a complicated way of reminding us that C++ (at least originally) is/was compiled down to pure C and thus can't do anything that C can't do.
Modern PL features are more or less wrappers around comparatively complex C code. Object inheritance is actually one that isn't too difficult or complex to implement: https://www.youtube.com/watch?v=443UNeGrFoM&t=4275s
> Modern PL features are more or less wrappers around comparatively complex C code. Object inheritance is actually one that isn't too difficult or complex to implement: https://www.youtube.com/watch?v=443UNeGrFoM&t=4275s
One of the advantages C has over C++ here is that the addition of templates in C++ makes this kind of encapsulation more difficult. See the proliferation of header-only libraries where nearly everything is templatized, as well as the pimpl idiom. You still get some encapsulation in the form of private, but with "C-style" encapsulation, if the header file doesn't change and the source files implementing what's declared in that header don't change, then none of the files need to be recompiled when rebuilding. This makes recompiling much faster.
Of course, you could do much the same thing if you didn't use templates in C++, or if you were very disciplined and limited with your use of templates, but this seems to go against the grain of how I've seen C++ used.
Thanks for the downvote...? Not sure what the point of that was.
I haven't used C++ much since modules were introduced. From doing some quick reading, it seems unclear whether they significantly improve compilations times in practice. I wasn't able to find anything which addressed how they relate to the ABI... I suspect this is a pretty complex topic. If you have any good references related to either of these points, I'd be very eager to read them.
Either way, "C++ with modules" appears to me to be unlikely to clear the bar set by "C with opaque types" (which, for all intents and purposes, can be done in C++) in terms of 1) ABI stability and 2) encapsulation. I consider point (1) to be related to point (2), since details which are leaked into the ABI are not encapsulated.
Importing the whole standard library as defined by C++23 import std; is quicker than doing a plain #include <iostream>, as shared per Microsoft employees in some talk they have done regarding upcoming C++23 support.
Unfortunely I can't remeber which one it was, but someone else can glady share the link.
Second by having template details marked as private on the module metadata, that isn't directly exposed to consumers.
As for the ABI specifically, that is compiler dependent anyway.
> Hell, they cannot even malloc() their own StringBuilder instance, because even the size of the StringBuilder is hidden. They have to use creation and deletion functions provided in the implementation as specified in the interface.
And you've just made it impossible for the users of your StringBuilder to pass it around by-value. Every instance has to be malloc'ed by your library, even though it's just a tiny, word-sized struct. Awesome! And each access needs to go through an additional pointer indirection. All this just to pretend that C supports proper encapsulation. Hooray!
I'm sorry that I'm targeting your blog post specifically, but it's just so stereotypical of C proponents, that can't (or won't?) realize that their favorite programming language is inherently limiting and limited along several very important dimensions. It makes me think that although some of them might be excellent programmers, they make for terrible software engineers.
The term Software Engineer is something that in plenty of countries is validated by the engineering organization and is a legal title, not something one feels like calling themselves.
Which also validates that any university teaching software engineering has a certain quality level, and portofolio of lectures, to create a general background across all subjects of engineering practices besides writing code.
In my country, "engineer" is an academic degree and therefore not "validated by the engineering organization" but conferred on you by a university. Having said that, "software engineer" is NOT understood as having any connection to the academic title of "engineer", as it's a job description and you can even be a degreed software engineer simply having just a bachelor's degree in the field anyway.
Could you please give some examples of countries where "software engineer" is a protected title, together with the organizations responsible for conferring it? I think I looked it up in the past but wasn't able to find one. All I was finding was things like civil engineering certification (basically search engine garbage in the case of my search -- the quality of search engine results has really gone down the drain in the past 15 years or so, sadly).
If someone doesn't recognize when and how their tools limit the quality of their work, they can't be good craftsmen. There might still be good reasons to use those tools (e.g., no better alternatives, or an existing ecosystem), but if you don't realize that your programming language is fundamentally limiting in ways that other languages are not, then you'll never know how to build better software.
> And you've just made it impossible for the users of your StringBuilder to pass it around by-value. Every instance has to be malloc'ed by your library, even though it's just a tiny, word-sized struct. Awesome! And each access needs to go through an additional pointer indirection.
...and?
I'm guessing the implication here is that you'll trash performance by doing this. How can you assume that? The thing about optimizing code is, you don't know where your hot paths are until you profile your code. And, the one thing experience has taught me, my intuitions about what the hot spots will be rarely match reality. There's nothing wrong with that, complex systems are complex, and we have incredible profiling tools to eat through that complexity and highlight the hot spots for us.
Now, I know you'll probably go on about a death by a thousand cuts etc. The thing is, well-crafted modules typically don't encapsulate on a fine-grained level. You usually have larger systems that hide details. These systems are usually used a fraction of the time the rest of your program is. So the indirections end up usually being a very insignificant cost to the overall program.
And if you are coding in such a way that copying a string builder by value and/or the indirection imposed by encapsulating that information is a bottleneck, I highly doubt that "fixing" this by copying by value and/or removing the indirection will suddenly make your entire program performant.
> It makes me think that although some of them might be excellent programmers, they make for terrible software engineers.
You haven't actually highlighted any issues here and then go on to finish your argument with an ad hominem. Instead of attacking the competence of C programmers, you should illustrate the actual real world impact that this design philosophy results in. I know plenty of really slow Java libraries, and plenty of really fast C libraries that use this method of encapsulation. So if your argument is that using this method trashes performance, it's a poor argument that doesn't have many real world examples (unless you know of some off the top of your head).
> The thing about optimizing code is, you don't know where your hot paths are until you profile your code.
Absolutely! So, you profile your program, and it turns out that 95% of the runtime is caused by malloc/free in tight loops, which you can't get rid of, because they're hidden behind an API which had to choose between encapsulation and efficiency.
> And if you are coding in such a way that copying a string builder by value and/or the indirection imposed by encapsulating that information is a bottleneck [...]
You don't seem to realize that the StringBuilder was just an example to illustrate this style of encapsulation? Oftentimes you want to encapsulate actual "value structs", where it is sensible to create millions of them in an array. In C, you're forced to choose between following good software engineering practices (=> encapsulation) and getting good performance.
> So, you profile your program, and it turns out that 95% of the runtime is caused by malloc/free in tight loops, which you can't get rid of, because they're hidden behind an API which had to choose between encapsulation and efficiency.
I have literally never run into a library that was written so badly that using the library encouraged you to use the API to create millions of small objects. That's what I'm saying. Sure, this can happen, but in reality I've never seen it. Can you show me where this hypothetical scenario is occurring and trashing people's performance? We probably want to avoid using those libraries.
Instead, I usually see encapsulation used like it is in GLFW, or libcurl, or stbi. The encapsulation covers systems and not tiny objects, which encourages the user of the library to not make API calls millions of times or construct millions of tiny objects.
> You don't seem to realize that the StringBuilder was just an example to illustrate this style of encapsulation? Oftentimes you want to encapsulate actual "value structs", where it is sensible to create millions of them in an array.
I did realize this. Encapsulation is typically useful on larger systems. Once you get to the point of millions of objects, you usually have a larger system managing those millions of objects. And ideally, those millions of objects should be POD. If they're POD, encapsulating the data makes no sense at that point, because it makes more sense to encapsulate whatever is managing that data.
> In C, you're forced to choose between following good software engineering practices (=> encapsulation) and getting good performance.
This is a false dichotomy. There are plenty of large C projects that follow good software engineering practices (which is entirely subjective, what is "good"?). Look at any OS kernel, or the libraries I mentioned above.
So, once again, I'm curious if you know of any C libraries (ab)using encapsulation in the hypothetical scenario you've laid out. If there aren't any libraries that do this, then this is a non-issue and attacking the competence of C developers is entirely unwarranted since you've built up a strawman that doesn't exist in reality.
Plain Old Data. It's the same as POJOs in Java or POCOs in C#. It's essentially a class with no logic associated with it (so no member functions or private/public mixed data). Technically everything in C is POD. The Cpp docs have a more formal definition as well here https://en.cppreference.com/w/cpp/named_req/PODType .
What do you mean? The fact that C can do this is an example of how it's not limited. A lot of other languages instead require you to allocate everything in the heap and there's no possibility of passing things by copy, or of not accessing things through anything other than a pointer. C at least is capable of allocating and accessing at least some things directly on the stack.
> The fact that C can do this is an example of how it's not limited.
No, C is limited because it's mutually exclusive: either encapsulation, or zero overhead by-value passing. Other languages, like C++ or Rust, allow both at the same time.
There's a third vertex to the triangle here: C++ and Rust allow both at the same time by dropping separate compilation. More thoroughly so in Rust than in C++, but header-focused libraries move C++ further toward whole-program compilation compared to C (maybe you could call it “large-overlapping-chunks-of-program compilation”).
Very valid point. I keep fond memories of Modula-2, which has DEFINITION modules and IMPLEMENTATION modules, such that you can compile the former and the latter seprately.
In Modula-2, I can specify an API in its DEFINITION module, and after compiling it, client applications can use such an API without the implementation being ready yet, and still I can compile the client and check if it is free of syntax errors.
OK, I understand what you’re doing now. Thanks for clarifying.
The big downside here is that you’re leaking the size of the details into your ABI which wouldn’t happen with a fully opaque type… I could see some uses for it but haven’t felt a strong enough need to reach for it before, although it has occurred to me.
At that point you might as well just name your fields like DONTTOUCHTHIS_foo if you’re having to keep the private definition with fields in sync with the opaque public definition (and making sure the alignment and sizing are always in sync with the private one…)
Don't let Perfect be the enemy of Good. The quality of encapsulation you can achieve in C++ is miles ahead of that of C, even if it isn't all that could ever be.
Passing around potentially large objects by value is wasteful and prone to move / copy semantics.
Most languages pass objects by reference (C# and Java chief among them).
Still, if you really want to pass by value -- even though you'll likely end up with pointer ownership problems -- you just add a few functions to the API to do so.
Creating an opaque type on the stack can be done, you just need a little more work.
The advantage of hiding your state behind an opaque struct with builders and accessors is that you can change the size and layout of said struct without it being a breaking API change. The code remains binary compatible even, no need for a recompile if you're shipping a shared lib. This is something just using private members doesn't achieve since with private members the compiler still knows and uses the layout of the struct, it just forbids access to it.
That's why you can even find C++ libraries use this idiom even though C++ obviously has `private`. It's about having a stable, opaque API.
On the other hand because of this added indirection, there's usually a greater performance hit to accessing these opaque structs since code can't be inlined. With private since the compiler can still see inside the struct, it's able to more aggressively optimize the code. You can also store the objects directly on the stack without requiring malloc.
IMO the right way to have private members in C structs is... to document that members shouldn't be touched directly, perhaps using a special naming convention or embedding the publicly-accessible members in a dedicated sub-struct to prevent confusion.