The C Programming Language: Myths and Reality

simias · on July 17, 2023

I see where you're coming from but IMO this "Cheshire cat" idiom to hide the implementation details is not exactly like private, it fact it can do things that private can't do, and doesn't do things private does.

The advantage of hiding your state behind an opaque struct with builders and accessors is that you can change the size and layout of said struct without it being a breaking API change. The code remains binary compatible even, no need for a recompile if you're shipping a shared lib. This is something just using private members doesn't achieve since with private members the compiler still knows and uses the layout of the struct, it just forbids access to it.

That's why you can even find C++ libraries use this idiom even though C++ obviously has `private`. It's about having a stable, opaque API.

On the other hand because of this added indirection, there's usually a greater performance hit to accessing these opaque structs since code can't be inlined. With private since the compiler can still see inside the struct, it's able to more aggressively optimize the code. You can also store the objects directly on the stack without requiring malloc.

IMO the right way to have private members in C structs is... to document that members shouldn't be touched directly, perhaps using a special naming convention or embedding the publicly-accessible members in a dedicated sub-struct to prevent confusion.

Joker_vD · on July 17, 2023

> IMO the right way to have private members in C structs is... to document that members shouldn't be touched directly, perhaps using a special naming convention or embedding the publicly-accessible members in a dedicated sub-struct to prevent confusion.

Reminds me of that one time when glibc broke the whole of Debian for s390 architecture by changing the fields in the jmp_buf struct (which is public): [0].

[0] https://lwn.net/Articles/605607/

bluetomcat · on July 17, 2023

To achieve a reasonable level of encapsulation in C, a header file must be seen as a public-only interface. It should declare only the structs that are relevant for the user of the module. If that's "struct my_module_handle { ... }", declare it and document the corresponding accessor and modifier functions. Everything else must reside in the C source file with internal linkage (static storage class). The whole source file is your implementation.

There is an anti-pattern where header files are used for all the declarations needed internally by the source file. Including (pasting verbatim with the preprocessor) that file from another module would bring in all the unnecessary declarations.

simias · on July 17, 2023

I think what you say makes complete sense at module-level (as in, for a standalone lib for instance) but I never bother segregating things internally within a lib/module/exe and rely on good documentation and coding practices to avoid having member mutations all over the place.

If I code in Rust or C++ I can use namespacing and public/private to give every single object in the codebase a clean interface, but in C doing that is just frustrating, not to mention potentially inefficient.

menaerus · on July 17, 2023

Opaque pointers usually impose the restriction on the API such that in order to use the handle one has to dynamically allocate the object on heap. That's a quite unfortunate tradeoff IMO.

chongli · on July 18, 2023

Why not have the API take an allocator as a parameter? Pass in pointers to malloc, realloc, and free. Then the library can use your static allocator (or some 3rd party malloc such as jemalloc, or your own arena allocator for that matter) or it can default to the system one if you pass null pointers instead.

10000truths · on July 18, 2023

Callback based APIs are better, but not good enough. The problem remains that the user still has to deal with some potentially undesirable constraints:

* I may be running in an environment where allocating memory is an asynchronous operation. A callback based API forces me to block, which can cause unpleasant side effects like halting an event loop.

* In my experience, some libraries with custom allocation hooks forget to define one or both of the two basics in the callback signature: a "context" or "user data" parameter, and a way to return an error.

The proper solution is to decouple memory allocation from object initialization entirely. There are two different approaches to this:

1. Expose get_foo_size() and get_foo_align() functions that return the size and alignment that the opaque foo struct needs (at runtime, of course). Then I as the user can allocate that memory, and initialize my opaque foo objects in-place:

  size_t foo_size = get_foo_size();
  void* buf = alloc_aligned_memory(foo_size * 1000, get_foo_align());
  for(size_t i = 0; i < 1000; ++i) {
    int err = foo_init(buf + i * foo_size, /* params here */);
  }

2. Define foo_init(void* buf, size_t len, ...) which attempts to initialize an opaque foo object in the buffer defined by [buf, buf+len). If the buffer does not have enough space, return an error. Otherwise, return the number of bytes actually used by the object.

chongli · on July 18, 2023

That second method works fine if you just want a buffer of a bunch of foo’s and they all happen to be the same size. Not so fine if foo is a data structure in its own right with growable capacity.

menaerus · on July 18, 2023

That is possible and reason why I said "usually". Still, it's an unfortunate complication because you need to manage the pool of your (static) objects now. Also, it's not possible to use stack with this scheme.

cozzyd · on July 17, 2023

there are ways around this, if VLAs are allowed.

  // in <opaque_foo.h> 
  typedef struct opaque_foo opaque_foo_t;
  size_t opaque_foo_sz(void); 
  void opaque_foo_init(opaque_foo_t* foo) 

  // in your code, which you could write a helper macro for if you were so inclined
  char opaque_foo_mem[opaque_foo_sz()]; 
  opaque_foo_t * my_foo = (opaque_foo_t*) opaque_foo_mem;
  opaque_foo_init(my_foo);

menaerus · on July 18, 2023

This approach violates alignment requirements causing misaligned accesses that on some architectures is mitigated by generating extra instructions while on some other architectures it's a violation (e.g. segfault).

There is no way for a compiler to infer the alignment requirement of a struct because it does not see its definition. You would always have to align the char buffer by hand but you cannot do it for the same reason compiler cannot.

What you can do is to always greedily align the char buffer to the strictest (largest) fundamental requirement for that platform - in other words alignas(max_align_t).

And this is actually what alloca() does for you by default. From https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

> The object is aligned on the default stack alignment boundary for the target determined by the BIGGEST_ALIGNMENT_macro.

kopecs · on July 17, 2023

Doesn't this violate strict aliasing?

cozzyd · on July 17, 2023

yes, though you can fix that with compiler flags (or, #pragma if you want strict aliasing elsewhere in your code)

alternatively, gcc supports VLAs in unions, but I don't think clang does, but that makes it extra annoying to do.

edit: apparently you can probably apply the may_alias attribute to the type? Or you could try using transparent_union. No idea if clang supports either...

mananaysiempre · on July 17, 2023

At that point, it is IMO better to obtain that block from alloca()[1]—it’s not standard, but where it’s available the compiler will treat the result as untyped for the purposes of aliasing. (If you’re on GCC/LLVM, __builtin_alloca_with_align is also an option, although note that the memory it returns may not outlive the current block—similar to a standard automatic variable, but unlike memory from traditional alloca.) ISO C has a gigantic hole when it comes to obtaining and recycling untyped memory, pretending the hole is not there isn’t going to help.

[1] https://nullprogram.com/blog/2019/10/28/, discussed at the time at https://news.ycombinator.com/item?id=21374863

jenadine · on July 17, 2023

And alignment?

kazinator · on July 18, 2023

Just use alloca.

  // in your code, which you could write a helper macro for if you were so inclined

  opaque_foo_t *my_foo = alloca(opaque_foo_sz());
  opaque_foo_init(my_foo);

cozzyd · on July 17, 2023

yes, you may need an alignas depending on platform (though you probably want it even if unaligned access is supported).

bensecure · on July 18, 2023

char* can alias anything

cozzyd · on July 18, 2023

The strict aliasing optimization in gcc in theory can cause problems (which is why many, including the Linux kernel, disable that)

hbossy · on July 17, 2023

This is how it's supposed to be done but you always end-up moving them to header just to make writing unit tests less painful.

10000truths · on July 17, 2023

This is a smell. Your unit tests should not have to rely on internal implementation details.

gpderetta · on July 17, 2023

And in the worst case you can have module-private headers. No need to pollute your interface.

icedchai · on July 17, 2023

Are unit tests common in C? In the mid-2000's, I worked on an "enterprise" system, written in C and C++. There were about 300,000 lines of code, maybe 10 tests. This thing was the core of a billion dollar business

not2b · on July 17, 2023

That's very broken; more usual practice at the time was to have large numbers of tests, aiming for good code coverage. I've sometimes seen too many full-tool tests and few true unit tests (tests that just test functions for correctness), but "maybe 10 tests" is frightening.

icedchai · on July 18, 2023

This particular company was very heavy on manual testing, unfortunately. (This was roughly 2003 - 2006 ish.)

coldtea · on July 17, 2023

They don't have to and shouldn't, but it's convenient. That's the parent's point ("to make writing unit tests less painful").

c-linkage · on July 17, 2023

I like the way that Windows does it, where they have as the first element of the struct a double-word size (dwSize) element that records in 32-bits the size of the structure. The size essentially acts as a version identifier, as long as you never rearrange the fields and only append fields for new versions. The opaque functions test the value of the dwSize element to see what actions can be performed on the object.

The code that you develop can still access the member fields directly, and those accesses can be inlined and optimized aggressively by the compiler.

bluejekyll · on July 17, 2023

This implies a branch statement in every function call, doesn’t it?

veltas · on July 17, 2023

There's zero cost abstractions and then there's zero features abstractions.

speed_spread · on July 17, 2023

The cost of that branch is insignificant next to that of the syscall you opted to make.

Dwedit · on July 17, 2023

Link-time optimization means that you're probably not going to take that much of a performance hit.

But yes, opaque structs do enforce that it will be treated as a plain pointer, and the compiler (usually) cannot treat it as an aggregate of variables.

Athas · on July 17, 2023

If a linker did that, changing the layout would be an ABI-breaking change. I think this opaque struct design is most common for dynamically loaded libraries, where link time optimisation does not occur (unless dynamic linkers got a lot more fancy recently).

matheusmoreira · on July 18, 2023

The Java analogue of opaque structures is factory methods. Their only purpose is to hide the new keyword. That's necessary precisely because new introduces a hard dependency on the constructed type at the binary interface level, it gets literally emitted into the byte code. The supertypes returned by those methods are just there to serve as compatible pointers to the real types.

zik · on July 17, 2023

> is not exactly like private

That seems like a straw man to me. He never said it was exactly equivalent - he said it provided encapsulation and isolation. 'private' is another language's mechanism which does that but they're obviously not identical.

zokier · on July 17, 2023

> The code remains binary compatible even, no need for a recompile if you're shipping a shared lib. This is something just using private members doesn't achieve since with private members the compiler still knows and uses the layout of the struct, it just forbids access to it.

There is somewhat common PIMPL idiom to work around the binary compat issue. Iirc there were some macros floating around to make it easier to manage.

maleldil · on July 17, 2023

What do the macros do? Isn't this as easy as forward declaring the impl class, adding std::unique_ptr<Impl> as a private field and have public methods refer to the field? I'm struggling to understand why macros would help here.

comex · on July 17, 2023

Perhaps the macros help you forward methods on the outer class to methods on the impl class? While your approach of having public methods refer to the field also works, it’s nice to have public and private methods in the same place (the impl class’s definition) and using the same syntax (neither having to go through the impl field).

maleldil · on July 17, 2023

> it’s nice to have public and private methods in the same place (the impl class’s definition)

In my experience, the impl class is usually defined in the main class's cpp file.

saghm · on July 17, 2023

I'm also only familiar with this idiom in C++, but based on the description in the parent comment, I suspect that this is sometimes used in C too, in which case you obviously can't use unique_ptr or private fields; maybe macros might be a way to avoid having to write a bunch of boilerplate to achieve a similar effect?

rewmie · on July 18, 2023

> That's why you can even find C++ libraries use this idiom even though C++ obviously has `private`. It's about having a stable, opaque API.

In C++ this is a popular idiom and a standard technique covered by multiple references under the name pointer to implementation (pimpl).

This is not a C thing beyond the point that C has pointers.

juunpp · on July 18, 2023

> On the other hand because of this added indirection, there's usually a greater performance hit to accessing these opaque structs since code can't be inlined.

Is this still true? Link-time optimization, which includes inlining, seems to be all the rage these days.

simias · on July 18, 2023

For static linking quite possibly, although you could still have un-optimizeable sources of overhead because of having to indirect through pointers everywhere, for instance in nested structs. Something like a->b->c->d is going to be more expensive than a.b.c.d, for instance.

I'm also not sure that dynamic linkers are typically smart enough to do this type of optimization, but I admit that I'm not familiar with the state of the art in dynamic linking technology.

gjulianm · on July 17, 2023

I wouldn't call this equivalent to the private keyword:

* It does not work on a field, but on the whole struct. Either all fields are public or all are private. The latter case forces you to write getters/setters for properties you want users to access, and that in C can be even more cumbersome as you need to write the definition in the .h and the implementation in the .c.

* It breaks, without an explicit and specific and error message, several actions. As it's mentioned in the article, malloc isn't possible, but neither are copies by value, sizeof() breaks... And those will break with an "incomplete type" error message, not a "this type is intentionally made private", which can add confusion in some situations.

* Completely incompatible with inlining code. Considering a lot of people still use C precisely for its performance, I think this can be a drawback in a lot of usecases.

I honestly think that hiding struct declarations should be done sparingly, and preferably limiting it to cases where it's actually necessary (for example, a library that doesn't expose internal struct fields so the same executable works with different versions of the dynamic library; or proprietary libraries that want to expose as little as possible). In the end it's still easy to bypass, and the distinction between header and code files already provide an indication of which functions you should use and which ones you shouldn't.

falcrist · on July 17, 2023

I guess I'm reading the article a different way, because what I'm getting is that the author is suggesting the C analogue of a class is an entire header/source "module".

That would make more sense, since you can use the header to craft an interface in which some components are public and some are private.

c-linkage · on July 17, 2023

Brings back memories of CS101 back in 1992.

They called this "modular programming" where "classes" were represented by opaque pointers and actions could only be performed on those pointers using the functions defined in the module.

falcrist · on July 17, 2023

In the production C code I've written that makes use of modules with interfaces that define public and private variables and functions, I tend to avoid using pointers where reasonable. I'd prefer to either give access to the variable or provide getter and setter functions.

What's the benefit of "opaque" pointers?

For background, my C code is almost entirely on microcontrollers. So I'm looking at it from that point of view. If you're talking about event-based applications running inside a full operating system, I've always stepped up to something like C#, so I don't have much experience with function pointers for that kind of work.

c-linkage · on July 17, 2023

For those accustomed to always using open source, the idea of hiding the implementation must seem odd.

But consider the position of developer who implements a shared library that is distributed in binary form only. In this case, the benefit of opaque pointers for the development of a library is that the implementation remains private at the source level. One could, of course, reverse-engineer the binary but few people would do it.

If you define your structures in a public header -- and this includes C++ classes and templates with private members -- one can easily see the implementation and, with a few casts, start munging the guts of your objects and baking in a hard requirement on a specific layout and / or version of your library.

falcrist · on July 17, 2023

Ok I see what you're saying. Closed source libraries can make use of this idea.

cozzyd · on July 17, 2023

yes, there is no benefit on micrcontrollers where everything is usually compiled together as one image (ok, I imagine you COULD design a microcontroller firmware with dynamically loadable sections... but let's not think about that).

But for dynamic linking, this is how you avoid breaking ABI while maintaining forward flexibility.

gjulianm · on July 17, 2023

Yes, that's usually how it goes, each header/source is like a "class". And usually, what you have is a "main" struct that represents the "class" itself. In this case that main struct can't have both private and public fields.

falcrist · on July 17, 2023

> And usually, what you have is a "main" struct that represents the "class" itself.

Yes that tracks with the work I've done. It could even be so simple that only one value needs to be exposed through a getter (along with a couple "methods").

An ADC in an embedded system could operate like that.

lelanthran · on July 17, 2023

> Yes, that's usually how it goes, each header/source is like a "class". And usually, what you have is a "main" struct that represents the "class" itself. In this case that main struct can't have both private and public fields.

You can, in a defined way (i.e., no invoking of UB). I just didn't put that in.

gjulianm · on July 17, 2023

Out of curiosity, how would you do it? The ways I've seen require using another "public" struct and either casting public to private or using a nested pointer, each with their set of problems. In either case, it's still a bit hacky and still each struct is all-or-nothing public or private.

lelanthran · on July 17, 2023

Pretty much. All the casting and hackiness isn't visible to the caller, and the implementation still maintains its ABI when the private stuff changes.

Tozen · on July 18, 2023

This is what Niklaus Wirth was getting at, in his "push back" on conventional class-based OOP. In Pascal, they are called "units", as oppose to "modules". In his Modula/Oberon, they are called modules.

Many of the newer languages have modules or packages, but interface well with C, thus give the benefits of what the article was referring to. To include that various newer ones don't use class-based OOP, just as C doesn't.

ricardo81 · on July 17, 2023

That's how I read it.

Clearly you can abstract away the parts of data that the API should not see.

Joker_vD · on July 17, 2023

> It does not work on a field, but on the whole struct.

    struct PrivateFieldsOfMyShinyClass;

    struct MyShinyClass {
        int somePublicData;
        double morePublicData;
        struct PrivateFieldsOfMyShinyClass *p;
    };

> malloc isn't possible, but neither are copies by value, sizeof() breaks

Those are implementation details which are deliberately being hidden.

> Completely incompatible with inlining code.

They are as incompatible with code inlining as public/private modifiers in C++ are. That is, LTO is your best friend here. Also, have you ever tried to maintain binary compatibility with several versions of a third-party C++ library that keeps adding/removing private fields to/from its classes?

gjulianm · on July 17, 2023

The code sample still isn't equivalent. Now you have yet another pointer, which implies another allocation, another source of possible memory leaks and mistakes, and a separate memory space that will hurt the cache.

> Those are implementation details which are deliberately being hidden.

I know they're being hidden deliberately, but in C++ "private" doesn't break malloc (new), copies by value or sizeof. Or stack allocation, to add to the list.

> They are as incompatible with code inlining as public/private modifiers in C++ are. That is, LTO is your best friend here.

public/private aren't incompatible with inlining in C++. That is, you can call class functions that access private members and the compiler can inline those functions. Also, LTO is not always enabled by default, and doesn't always inline the things you want it to inline.

> Also, have you ever tried to maintain binary compatibility with several versions of a third-party C++ library that keeps adding/removing private fields to/from its classes?

I mentioned binary compatibility as one of the reasons one might want to do this. However, if you have a third party that doesn't care about API compatibility I doubt struct fields are the only thing they're going to change constantly.

skribanto · on July 17, 2023

Maybe have your public fields defined as a second struct, and then you can cast the pointer to your struct to the concrete struct that has all the public fields. This has the restriction that all public fields must be at the start, and you must make sure to maintain the same order between the two structs.

At this point though, I think I honestly would prefer setters/getters.

    struct MyClassPublic {
        int x;
        int y;
        ...
    }

    /* using it */
    MyClass *myclass = myclass_create();
    ((MyClassPublic *)myclass)->x = 5;

dfawcus · on July 17, 2023

    struct public_stuff {
        ...
    }

    struct private_stuff {
        struct public_stuff public;
        ...
    }

    struct public_stuff *make_public(/* ... */) {
        struct private *prv = malloc(sizeof *prv);
        /* ... */
        return &prv->public;
    }

Then when passed in to functions, cast the passed in struct pointer to the private one.

The public struct doesn't even have to be at the start if one make appropriate use of offsetof and ensuring valid alignment.

Nothing new under the sun...

jstimpfle · on July 17, 2023

Hiding members in this way is only possible with pointer indirection, which isn't satisfying.

However, having only a boolean private/public access state isn't generally satisfying either. It often leads to violation of the principle of separation of concerns when all the functions (methods) acting on certain "private" fields need to live in the same class.

In simple classes, like std::vector, it's possible to get away with private. But in many cases that are more complex than that, it seems to me that the best approach is still to expose the data and to be just very clear about the exact purpose of each member.

lelanthran · on July 17, 2023

> I wouldn't call this equivalent to the private keyword:

I didn't mean to imply it is, I lead with:

>> All too often someone, somewhere, on some forum … will lament the lack of encapsulation and isolation in the C programming language. This happens with such regularity that I now feel compelled to address the myth once and forever.

It's only about the myth that C doesn't have any level of encapsulation or isolation.

brabel · on July 17, 2023

Nearly every rebuttal on the internet starts with misinterpretation of what's being said. Good reading skills are extremely rare, it seems.

To be clear: you tried to say "C offers encapsulation/isolation". People read "This solution is equivalent to 'private'", an almost completely unrelated statement, and then respond to that.

That could typically be classified as a "Straw man fallacy"[0], but I believe people who do this in many cases simply do not have the necessary reading skills to understand what proposition has been made, and therefore honestly believe themselves to be reasoning correctly (i.e. without fallacy).

Reading comprehension used to be a topic at school when I was a child. I suppose that's no longer the case??

[0] https://en.wikipedia.org/wiki/Straw_man

jasode · on July 17, 2023

>To be clear: you tried to say "C offers encapsulation/isolation". People read "This solution is equivalent to 'private'", an almost completely unrelated statement, and then respond to that.

>That could typically be classified as a "Straw man fallacy"[0], but I believe people who do this in many cases simply do not have the necessary reading skills

Fyi... the author's article that this thread is about has in bold heading: "Myth: C has no equivalent to “private”"

So, a reasonable interpretation of the text following that headline is how to use C Language constructs to dispel that myth.

Doesn't seem like "straw man" applies here.

gjulianm · on July 17, 2023

As the other commenter said, and I want to reiterate, what led me to start with the talk about the private keyword is the big bold header that says "Myth: C has no equivalent to “private”" and the first code snippet that shows how you can have "private" fields in C++ and the following ones that show an "equivalent" implementation in C. So I think it's reasonable to infer that the author is talking about encapsulation and "the private keyword" as somewhat interchangeable (and I don't disagree, for this discussion they're practically the same). Not only that, but the points I made are about the implementation shown in the article, which is independent of whether the talk is about "private equivalency" or "lack of encapsulation": the gist of it is "yes, you 'encapsulate' things but not in the same way and it comes with disadvantages that aren't really there in other languages".

With all that said, I don't think all that condescending talk about the lack of reading comprehension or skills, without actually going into the arguments themselves, is really necessary or positive.

gjulianm · on July 17, 2023

I mean, there's a big header that says "Myth: C has no equivalent to “private”" and a code example about the private keyword, so that's why I started saying that. Even then, my points still apply: this isn't really equivalent to how encapsulation works in other languages due to the lack of granularity and the "extras" of all the usual language behavior that stops being supported.

lelanthran · on July 18, 2023

> I mean, there's a big header that says "Myth: C has no equivalent to “private”" and a code example about the private keyword, so that's why I started saying that.

You are correct; I should change the heading to "Myth: C provides no encapsulation". I don't want to do it now, as the discussion is still ongoing and making this change now while people are commenting on the 'private equivalence" aspect would be gaslighting those people.

> Even then, my points still apply: this isn't really equivalent to how encapsulation works in other languages due to the lack of granularity and the "extras" of all the usual language behavior that stops being supported.

So? The myth the article addresses is "C has no encapsulation", not "C has great encapsulation".

That C provides stronger encapsulation and stronger guarantees for upgrade migration is the specific myth that I am trying to address.

gjulianm · on July 18, 2023

I think that at this point this is a discussion about semantics, just as this other commenter pointed out: https://news.ycombinator.com/item?id=36758407

Yes, you can do something in C that somewhat resembles encapsulation. You could also do it in Python with a decorator that inspects the caller. You can also do inheritance and virtual functions in C. You could do anything in a Turing complete language. But usually when talking about "X language has Y" refers to whether the language has Y as part of the specification. In this case it isn't encapsulation as part of the specification (that's why it comes with all those downsides/extras) but as an artifact of other aspects of the language, that's why I say it's not really equivalent.

throwawayiddqd2 · on July 17, 2023

With plan9-extensions gcc flag:

    struct Thing { int i; };

In an implementation c file:

    struct _Thing
    {
        struct Thing;
        int b;
    };

    struct Thing *new_thing()
    {
        struct _Thing *thing = malloc..
        thing->i = 0;
        thing->b = 1;
        return thing;

dmitrygr · on July 17, 2023

  >* Completely incompatible with inlining code

You were right 10 years ago. Today we have LTO

gjulianm · on July 18, 2023

LTO isn’t always going to inline things you want it to inline, that’s the main issue I have with it.

jasode · on July 17, 2023

To the author : your explanation can be interpreted as "correct" but also be aware that -- for some readers -- your argument is a variation of the Turing Tarpit: https://en.wikipedia.org/wiki/Turing_tarpit

In other words, the 2 different possible receptions to your post:

- YES, file-level modularity with opaque structs is _equivalent_ to class private members --> for those mindsets already sympathetic to C Language

- NO, using file-scoping rules and structs is not equivalent to class private members because it's a bunch of extra ceremonial syntax to implement a workaround. (The "Turing Tarpit"). It's using the opaque struct as a "design pattern" and as Peter Norvig famously said, "Design patterns are bug reports against your programming language."

alpaca128 · on July 17, 2023

I don't think boilerplate code has much to do with Turing Tarpits. It might be annoying but not Brainfuck-grade insanity.

colonwqbang · on July 17, 2023

I would argue that C++ "pimpl" design pattern brings more "ceremonial syntax" than the C equivalent.

C++ style (without "pimpl") requires recompilation of the whole dependent tree when adding a new private member function. It's encapsulation only in a formal sense

mojosam · on July 17, 2023

I think there’s a better example, but whether it applies it depends on one of two major divisions of C code: that designed to run on systems with a MMU (as typically used for Linux and other large OSes) — where virtual memory makes dynamic momory allocation practical — and those without — which today is primarily the very large world of embedded devices.

For the latter, the industry best practice is to avoid malloc(), except maybe at init time, and instead allocate memory statically. And in that use case, you break your code into modules, which can contain private data, public data, private functions, and public functions.

In other words, building an app out of C modules is a lot like building an app in a more modern language just using static classes, with no instantiation. And that design pattern — which is extremely common in the embedded world — we have a direct equivalent to the “private” qualifier, which is “static”, which restricts the rest of the app from accessing so-marked file-scope variables and functions.

Where this breaks down — as always with C — is when you need multiple instantiations of a module, which modern programming languages refer to as an object. The closest we can get in C is to pass the module’s public functions a struct with some sort of data structure containing the object’s n9n-static data. And the author explains, there are standard ways make that data structure opaque to calling code, but those are definitely workarounds to language shortcomings.

But the bottom line is that those language shortcomings — the lack of objects and a private qualifier for its members — are only shortcomings if you need those features, and in the embedded world, most applications don’t, they only require all the advantages offered by C. So as always, this is about picking the right language for the project, there’s no one size fits all.

commandlinefan · on July 17, 2023

When I first encountered Object Orient Programming around '97 or so, all of the literature focused on the three pillars of OO: encapsulation, inheritance and polymorphism. I was struck at the time at how OO mostly just formalized and gave specific names to what good programmers were already doing informally.

jcelerier · on July 17, 2023

> I was struck at the time at how OO mostly just formalized and gave specific names to what good programmers were already doing informally.

I mean, that's the point of everything, isn't it? giving names and defining good practices so that everyone can benefit from them? because even today you can find a LOT of codebases which are an imperative mess

wvenable · on July 17, 2023

A lot of comments/articles complaining about Object Oriented Programming -- especially the style implemented by C++ and similar languages -- start with premise that OOP is some academic prescription declared from up high like Moses and 10 Commandments.

But the reality is that the best practices of imperative programming where already much like object oriented programming and that OOP is a formalization of those practices.

User23 · on July 17, 2023

Strongly agree. Good C developers pretty much instinctively do structured programming[1]. While it's superficially dated, the core concepts are still all very much applicable to the working programmer today and even largely paradigm independent.

[1] https://dl.acm.org/doi/book/10.5555/1243380 (lousy interface, but PDF download available)

pjmlp · on July 17, 2023

Indeed, I see many of the concepts as extensible modules, as one would use from Modula-2 or Object Pascal.

That is why Oberon takes the spartan approach of only having extensible types, everything else is just like in Modula-2. Later descendants adopted a more mainstream approach.

Likewise how OOP is done in Ada or Modula-3 isn't quite like in mainstream approach.

Or when modules can be manipulated like variables, and given type signatures, we get Standard ML functors, with overlapping capabilities to OOP.

rightbyte · on July 17, 2023

The words are also confusing and too much latin.

Encapsulation - Hidden

Inheritance - Same fields in beginning

Polymorphism - Tagged struct

slavapestov · on July 17, 2023

“Hidden” from whom and how? “Same fields in beginning” doesn’t characterize multiple inheritance. “Tagged structs” are just one implementation strategy for one kind of polymorphism, and “tagged” and “struct” are themselves jargon. Using and knowing the precise terminology is important when communicating technical concepts.

nvy · on July 17, 2023

Jumping through all these hoops just to, what? Avoid the garbage collector?

It's okay to admit C has shortcomings.

lelanthran · on July 17, 2023

> It's okay to admit C has shortcomings.

I did that, didn't I?

>> Just to be clear, C is an old language lacking many, many, many modern features. One of the features it does not lack is encapsulation and isolation.

zer8k · on July 17, 2023

The OP article seems to be someone trying to cram modern concepts into a language that explicitly rejects them. A symptom that the OP should've probably just used another language.

lelanthran · on July 17, 2023

The first in a series of blogs posts I am putting together around the C programming language (Not the book!)

bjourne · on July 17, 2023

Sure, but it still goes against the grain of the language. You could write a similar article explaining why everyone who thinks C doesn't have automatic memory management are wrong. Opaque structs are relatively annoying to work with especially if sibling modules in the same package have good reasons to manipulate the private parts directly. It is not nearly as convenient as it is in a language with better support for encapsulation (e.g Java). Most of the C code I write do not encapsulate anything. It's not worth the bother. Especially not when unit-testing for which encapsulation would force you to write lots of redundant getters and setters just for the unit tests themselves. My view is that you simply shouldn't use C if you need encapsulation.

mcnichol · on July 17, 2023

I'm appreciative for the person who wrote the article.

These are my favorite types of debates and while I disagree with it for much better articulated reasons already mentioned, primarily the exploitative nature of header files used as security through obscurity, I think it stirs up a lot of debate and keeps the spaghetti meatball of knowledge around patterns/anti-patterns, "best-practices", etc. moving forward in a much more passionate way.

I think this is the truest sense of the term "hacker" which lives up to the title of the site pushing us into these debates. Putting stuff together that doesn't always work as intended or expected but arguing for and against it.

A long winded thank you to everyone, OP and all the threads responding.

ape4 · on July 17, 2023

I've seen code that prefixes private members with an underscore and adds a comment saying that its private. Not saying that's great but it does send a message.

0xfedbee · on July 17, 2023

Great article lelanthran. It’s always refreshing to see someone going against the popular opinion here and piss everyone off. Keep it up!

variadix · on July 19, 2023

This is a much stronger form of encapsulation than public/private. You basically have to do this if you have a generic API that wraps system specific APIs to avoid polluting your code with system specific includes/types/functions/etc.

bryancoxwell · on July 18, 2023

> My Makefiles already have rules to automatically generate the interface so that the C code I write in this manner is callable from within Android applications.

I don’t know a lot about C but this interests me. Can anyone point me to where I might learn more about how this works?

oneeyedpigeon · on July 17, 2023

Great article, OP, but you forgot to populate the href on your "swig" link.

lelanthran · on July 17, 2023

Thanks, fixed.

up2isomorphism · on July 18, 2023

The funny thing is each time when you find C has some “hacky” way of doing things other programming languages can officially support, then you find that hacky is probably the only thing you actually need.

ReflectedImage · on July 17, 2023

Actually, you can just have a public version of a struct and a private version of a struct with more fields.

You can give callers the public version and then cast it to the private version for internal usage.

bluejekyll · on July 17, 2023

These will have different sizes though, so it’s only safe to cast when used off the heap and where it’s memory is allocated with the larger variant, right?

I guess my point is that there’s a whole bunch of caveats in regards to safety that need to be considered in your solution.

ReflectedImage · on July 25, 2023

It's not my solution, it's just something I've seen done in commercial C software.

zh3 · on July 17, 2023

That seemed a complicated way of reminding us that C++ (at least originally) is/was compiled down to pure C and thus can't do anything that C can't do.

throaway23423 · on July 17, 2023

In practice, please don't do this... it breaks inlining and adds unneeded allocations.

3cats-in-a-coat · on July 17, 2023

So let's do polymorphic virtual methods now.

ryu2k2 · on July 17, 2023

>You don’t have inheritance [..]

Modern PL features are more or less wrappers around comparatively complex C code. Object inheritance is actually one that isn't too difficult or complex to implement: https://www.youtube.com/watch?v=443UNeGrFoM&t=4275s

jll29 · on July 17, 2023

> Modern PL features are more or less wrappers around comparatively complex C code. Object inheritance is actually one that isn't too difficult or complex to implement: https://www.youtube.com/watch?v=443UNeGrFoM&t=4275s

True, and if you implement it as a preprocessor, that's exactly how C++ started in 1979 (Stroustrup's "C with classes" Cfront pre-processor: https://en.cppreference.com/w/cpp/language/history).

sfpotter · on July 17, 2023

One of the advantages C has over C++ here is that the addition of templates in C++ makes this kind of encapsulation more difficult. See the proliferation of header-only libraries where nearly everything is templatized, as well as the pimpl idiom. You still get some encapsulation in the form of private, but with "C-style" encapsulation, if the header file doesn't change and the source files implementing what's declared in that header don't change, then none of the files need to be recompiled when rebuilding. This makes recompiling much faster.

Of course, you could do much the same thing if you didn't use templates in C++, or if you were very disciplined and limited with your use of templates, but this seems to go against the grain of how I've seen C++ used.

cozzyd · on July 17, 2023

It is different, but provides much better encapsulation. Keeping ABI compatibility is much easier this way.

pjmlp · on July 17, 2023

No longer the case since C++20 modules.

sfpotter · on July 17, 2023

Thanks for the downvote...? Not sure what the point of that was.

I haven't used C++ much since modules were introduced. From doing some quick reading, it seems unclear whether they significantly improve compilations times in practice. I wasn't able to find anything which addressed how they relate to the ABI... I suspect this is a pretty complex topic. If you have any good references related to either of these points, I'd be very eager to read them.

Either way, "C++ with modules" appears to me to be unlikely to clear the bar set by "C with opaque types" (which, for all intents and purposes, can be done in C++) in terms of 1) ABI stability and 2) encapsulation. I consider point (1) to be related to point (2), since details which are leaked into the ABI are not encapsulated.

pjmlp · on July 17, 2023

First of all, who told you I downvoted you?!?

Importing the whole standard library as defined by C++23 import std; is quicker than doing a plain #include <iostream>, as shared per Microsoft employees in some talk they have done regarding upcoming C++23 support.

Unfortunely I can't remeber which one it was, but someone else can glady share the link.

Second by having template details marked as private on the module metadata, that isn't directly exposed to consumers.

As for the ABI specifically, that is compiler dependent anyway.

adwn · on July 17, 2023

> Hell, they cannot even malloc() their own StringBuilder instance, because even the size of the StringBuilder is hidden. They have to use creation and deletion functions provided in the implementation as specified in the interface.

And you've just made it impossible for the users of your StringBuilder to pass it around by-value. Every instance has to be malloc'ed by your library, even though it's just a tiny, word-sized struct. Awesome! And each access needs to go through an additional pointer indirection. All this just to pretend that C supports proper encapsulation. Hooray!

I'm sorry that I'm targeting your blog post specifically, but it's just so stereotypical of C proponents, that can't (or won't?) realize that their favorite programming language is inherently limiting and limited along several very important dimensions. It makes me think that although some of them might be excellent programmers, they make for terrible software engineers.

xbar · on July 17, 2023

The best software engineers I ever worked with were all C programmers.

Further, and quite separate, the term "software engineer" was coined on behalf of assembly programmers, whose language lacks even further features.

Finally, I am not sure what you mean when you say they make bad software engineers. Perhaps we have different definitions of software engineer.

pjmlp · on July 17, 2023

The term Software Engineer is something that in plenty of countries is validated by the engineering organization and is a legal title, not something one feels like calling themselves.

Which also validates that any university teaching software engineering has a certain quality level, and portofolio of lectures, to create a general background across all subjects of engineering practices besides writing code.

jhgb · on July 18, 2023

In my country, "engineer" is an academic degree and therefore not "validated by the engineering organization" but conferred on you by a university. Having said that, "software engineer" is NOT understood as having any connection to the academic title of "engineer", as it's a job description and you can even be a degreed software engineer simply having just a bachelor's degree in the field anyway.

Could you please give some examples of countries where "software engineer" is a protected title, together with the organizations responsible for conferring it? I think I looked it up in the past but wasn't able to find one. All I was finding was things like civil engineering certification (basically search engine garbage in the case of my search -- the quality of search engine results has really gone down the drain in the past 15 years or so, sadly).

adwn · on July 17, 2023

If someone doesn't recognize when and how their tools limit the quality of their work, they can't be good craftsmen. There might still be good reasons to use those tools (e.g., no better alternatives, or an existing ecosystem), but if you don't realize that your programming language is fundamentally limiting in ways that other languages are not, then you'll never know how to build better software.

_gabe_ · on July 17, 2023

> And you've just made it impossible for the users of your StringBuilder to pass it around by-value. Every instance has to be malloc'ed by your library, even though it's just a tiny, word-sized struct. Awesome! And each access needs to go through an additional pointer indirection.

...and?

I'm guessing the implication here is that you'll trash performance by doing this. How can you assume that? The thing about optimizing code is, you don't know where your hot paths are until you profile your code. And, the one thing experience has taught me, my intuitions about what the hot spots will be rarely match reality. There's nothing wrong with that, complex systems are complex, and we have incredible profiling tools to eat through that complexity and highlight the hot spots for us.

Now, I know you'll probably go on about a death by a thousand cuts etc. The thing is, well-crafted modules typically don't encapsulate on a fine-grained level. You usually have larger systems that hide details. These systems are usually used a fraction of the time the rest of your program is. So the indirections end up usually being a very insignificant cost to the overall program.

And if you are coding in such a way that copying a string builder by value and/or the indirection imposed by encapsulating that information is a bottleneck, I highly doubt that "fixing" this by copying by value and/or removing the indirection will suddenly make your entire program performant.

> It makes me think that although some of them might be excellent programmers, they make for terrible software engineers.

You haven't actually highlighted any issues here and then go on to finish your argument with an ad hominem. Instead of attacking the competence of C programmers, you should illustrate the actual real world impact that this design philosophy results in. I know plenty of really slow Java libraries, and plenty of really fast C libraries that use this method of encapsulation. So if your argument is that using this method trashes performance, it's a poor argument that doesn't have many real world examples (unless you know of some off the top of your head).

adwn · on July 17, 2023

> The thing about optimizing code is, you don't know where your hot paths are until you profile your code.

Absolutely! So, you profile your program, and it turns out that 95% of the runtime is caused by malloc/free in tight loops, which you can't get rid of, because they're hidden behind an API which had to choose between encapsulation and efficiency.

> And if you are coding in such a way that copying a string builder by value and/or the indirection imposed by encapsulating that information is a bottleneck [...]

You don't seem to realize that the StringBuilder was just an example to illustrate this style of encapsulation? Oftentimes you want to encapsulate actual "value structs", where it is sensible to create millions of them in an array. In C, you're forced to choose between following good software engineering practices (=> encapsulation) and getting good performance.

_gabe_ · on July 17, 2023

> So, you profile your program, and it turns out that 95% of the runtime is caused by malloc/free in tight loops, which you can't get rid of, because they're hidden behind an API which had to choose between encapsulation and efficiency.

I have literally never run into a library that was written so badly that using the library encouraged you to use the API to create millions of small objects. That's what I'm saying. Sure, this can happen, but in reality I've never seen it. Can you show me where this hypothetical scenario is occurring and trashing people's performance? We probably want to avoid using those libraries.

Instead, I usually see encapsulation used like it is in GLFW, or libcurl, or stbi. The encapsulation covers systems and not tiny objects, which encourages the user of the library to not make API calls millions of times or construct millions of tiny objects.

> You don't seem to realize that the StringBuilder was just an example to illustrate this style of encapsulation? Oftentimes you want to encapsulate actual "value structs", where it is sensible to create millions of them in an array.

I did realize this. Encapsulation is typically useful on larger systems. Once you get to the point of millions of objects, you usually have a larger system managing those millions of objects. And ideally, those millions of objects should be POD. If they're POD, encapsulating the data makes no sense at that point, because it makes more sense to encapsulate whatever is managing that data.

> In C, you're forced to choose between following good software engineering practices (=> encapsulation) and getting good performance.

This is a false dichotomy. There are plenty of large C projects that follow good software engineering practices (which is entirely subjective, what is "good"?). Look at any OS kernel, or the libraries I mentioned above.

So, once again, I'm curious if you know of any C libraries (ab)using encapsulation in the hypothetical scenario you've laid out. If there aren't any libraries that do this, then this is a non-issue and attacking the competence of C developers is entirely unwarranted since you've built up a strawman that doesn't exist in reality.

skinner927 · on July 18, 2023

What’s POD? I’m having trouble searching the term.

_gabe_ · on July 18, 2023

Plain Old Data. It's the same as POJOs in Java or POCOs in C#. It's essentially a class with no logic associated with it (so no member functions or private/public mixed data). Technically everything in C is POD. The Cpp docs have a more formal definition as well here https://en.cppreference.com/w/cpp/named_req/PODType .

deadbeeves · on July 17, 2023

What do you mean? The fact that C can do this is an example of how it's not limited. A lot of other languages instead require you to allocate everything in the heap and there's no possibility of passing things by copy, or of not accessing things through anything other than a pointer. C at least is capable of allocating and accessing at least some things directly on the stack.

adwn · on July 17, 2023

> The fact that C can do this is an example of how it's not limited.

No, C is limited because it's mutually exclusive: either encapsulation, or zero overhead by-value passing. Other languages, like C++ or Rust, allow both at the same time.

dasyatidprime · on July 17, 2023

There's a third vertex to the triangle here: C++ and Rust allow both at the same time by dropping separate compilation. More thoroughly so in Rust than in C++, but header-focused libraries move C++ further toward whole-program compilation compared to C (maybe you could call it “large-overlapping-chunks-of-program compilation”).

jll29 · on July 17, 2023

Very valid point. I keep fond memories of Modula-2, which has DEFINITION modules and IMPLEMENTATION modules, such that you can compile the former and the latter seprately.

In Modula-2, I can specify an API in its DEFINITION module, and after compiling it, client applications can use such an API without the implementation being ready yet, and still I can compile the client and check if it is free of syntax errors.

rightbyte · on July 17, 2023

You can do dummy defines of structs with the same size as the real one if you want the struct on the stack and encapsulation.

e4m2 · on July 17, 2023

I've done this before, works quite well actually, but isn't very popular for some reason.

> You can do dummy defines of structs with the same size

Don't forget alignment. The general pattern is: https://godbolt.org/z/6je9Yb3rf.

sfpotter · on July 17, 2023

Maybe I'm missing the idea, but I'm not sure how this idea is supposed to work without using something like alloca.

If you have:

    struct foo;

in foo.h and:

    struct foo {
        int a;
        double x;
        ...
    };

in foo.c, you won't have sizeof(struct foo) available from bar.c, so your construction won't work in bar.c. You could define a function:

    size_t sizeof_foo(void);

which just returns sizeof(struct foo) from inside foo.c, but since this size is now only known at runtime, you'll need to resort to alloca or VLAs...

gpderetta · on July 17, 2023

You define public_foo in the header with the public members and a appropriately sized byte array for the private members.

In the .c file you define private_foo, same a public_foo except that the byte array is replaced with the actual members.

You static assert that size and alignment match and cast at function boundaries.

You hope not to have violated strict aliasing rules.

This is not completely unlike type erasure with small buffer optimization done by some c++ classes like std::function.

sfpotter · on July 17, 2023

OK, I understand what you’re doing now. Thanks for clarifying.

The big downside here is that you’re leaking the size of the details into your ABI which wouldn’t happen with a fully opaque type… I could see some uses for it but haven’t felt a strong enough need to reach for it before, although it has occurred to me.

gpderetta · on July 17, 2023

Of course there is no way around that. A partial mitigation is the same as done for network protocols: reserve some space for future extensions.

vore · on July 17, 2023

At that point you might as well just name your fields like DONTTOUCHTHIS_foo if you’re having to keep the private definition with fields in sync with the opaque public definition (and making sure the alignment and sizing are always in sync with the private one…)

deadbeeves · on July 17, 2023

Eh. Arguably in C++ you don't get proper encapsulation just with private members, because changing the layout for those members changes the ABI.

adwn · on July 17, 2023

Don't let Perfect be the enemy of Good. The quality of encapsulation you can achieve in C++ is miles ahead of that of C, even if it isn't all that could ever be.

pjmlp · on July 17, 2023

For example, user defined types that behave like built-ins, while preserving invariants.

Specially great in IoT instead of macros accessing directly IO ports.

c-linkage · on July 17, 2023

Passing around potentially large objects by value is wasteful and prone to move / copy semantics.

Most languages pass objects by reference (C# and Java chief among them).

Still, if you really want to pass by value -- even though you'll likely end up with pointer ownership problems -- you just add a few functions to the API to do so.

Creating an opaque type on the stack can be done, you just need a little more work.