- *"If I was lucky it would be strcpy (opposed to something like strncpy)"* it r...

david2ndaccount · on Aug 7, 2024

The correct thing to do is to use memcpy and to know the size of both the destination buffer and the source buffer. If the source buffer won’t fit, then you need to take an application-specific action (is truncation ok? do you have to abort the whole operation? Do you re-alloc the destination buffer? etc.) strncpy almost always does the wrong thing.

imron · on Aug 8, 2024

Agree with the general principle of knowing your buffer sizes, but the issue with memcpy (evidenced over many years with various CVEs) is that someone invariably takes a string length and forgets to plus one, leading to non-null-terminated strings.

kragen · on Aug 7, 2024

strncpy is definitely not safer; it produces unterminated strings when it hits n

basically you should almost never use strncpy; it's specifically for fixed-size fields like this:

    struct dirent { unsigned short inode; char name[14]; };

and in those cases more often than not the pad byte should be a space rather than a nul

strncpy should never have been added to the standard library

sidewndr46 · on Aug 7, 2024

What is the preferred solution here? I usually just use "memset" to zeroize the whole destination string, then tell "strncpy" that my destination is 1 byte shorter than what it really is.

The real issue I've ran into is that "strncpy" assumes the source is null-terminated.

pjmlp · on Aug 7, 2024

Use a proper C string library like SDS.

Or move up from the 1970's Bell Labs, adopt C++ with the respective compiler switches to have bounds checking enabled for operator[]().

Better yet, use something else instead of one of those two, pick whatver is your fancy.

Sesse__ · on Aug 7, 2024

The sanest solution is, surprisingly, snprintf(dst, sizeof(dst), "%s", src).

kragen · on Aug 8, 2024

please don't fill your program with fifty zillion string buffers of arbitrarily chosen sizes and then try to separately pass the right size in seventy zillion string-processing function calls. your code will be hard to read, buggy, and probably insecure

Sesse__ · on Aug 8, 2024

I agree with that statement, but it has nothing to do with snprintf() versus e.g. strcpy_s(), where you have exactly the same requirement to pass the right size.

(Separately, there's a discussion about how many bytes you are allowed to read from the _source_, but to fix that, you need something like the Linux kernel's strscpy(), which isn't really widely supported in userspace.)

kragen · on Aug 8, 2024

i agree

david2ndaccount · on Aug 7, 2024

Use memcpy and do the size check yourself beforehand (taking the appropriate action if it doesn’t fit). Avoid any function starting with str except for strlen. Prefer pointer+length instead of relying on nul-terminated strings.

nrclark · on Aug 7, 2024

You mean strnlen.

1over137 · on Aug 8, 2024

strlcpy() is my favourite, alas the GNU folks stubbornly refuse to embrace it, last I checked.

jabl · on Aug 8, 2024

It's in glibc as of 2.38: https://sourceware.org/git/?p=glibc.git;a=commit;h=454a20c87...

1over137 · on Aug 8, 2024

Cool, didn't know, thanks for sharing. Well, literally 25 years after OpenBSD added strlcpy, but better late than never I guess.

jandrese · on Aug 8, 2024

strlcpy is still braindamaged. The need to return the length of the source string for compatibility with old code means it suffers from the some of the same issues strncpy did.

1over137 · on Aug 8, 2024

Sure, but strlcpy is better than strcpy and strncpy (for strings). I almost never see code that uses the return value of any of them.

It is a simple refactoring to change strcpy/strncpy to strlcpy and, though it doesn't solve truncation issues, it's a solid improvement by eliminating memory overruns and lack of null termination.

It was added to OpenBSD in 1998 and then in FreeBSD, Mac OS X, Solaris, IRIX, etc. but its adoption was hampered by glibc stubbornly refusing to add it (until 2023 apparently).

jandrese · on Aug 8, 2024

It is frustrating because while it is better, it is still flawed in an easily fixed way.

What I wish the standard library had:

    ssize_t strscpy(char* dst, const char* src, ssize_t dsize);

Copies src into dst, stopping when it either reaches a \0 byte in src or on copying dsize - 1 bytes, whichever happens first. dst is then null terminated.

If the copy is not truncated the strscpy returns the number of bytes copied. If the copy is truncated dsize is returned.

Returns a negative value and sets errno if either dst or src is NULL or dsize is < 1.

kragen · on Aug 8, 2024

the strlcpy paper explains why strlcpy isn't designed the way you suggest: https://www.usenix.org/legacy/event/usenix99/full_papers/mil...

they actually started out with your design and then fixed it:

> The return values started out as the number of characters copied, since this was trivial to get as a side effect of the copy or concatenation. We soon decided that a return value with the same semantics as snprintf()’s was a better choice since it gives the programmer the most flexibility with respect to truncation detection and recovery.

basically they wanted to treat string truncation due to insufficient space as an error condition, so they designed the interface to make it easy to check (code from the paper, with syntax corrections):

    len = strlcpy(path, homedir, sizeof(path));
    if (len >= sizeof(path)) return (ENAMETOOLONG);
    len = strlcat(path, "/", sizeof(path));
    if (len >= sizeof(path)) return (ENAMETOOLONG);

your proposal does permit such simple error checking for strscpy, although it is marginally less efficient:

    len = strscpy(path, homedir, sizeof(path));
    if (len <= strlen(homedir)) return (ENAMETOOLONG);

but i can't think of anything your corresponding strscat could return to permit a similarly simple check. is there anything?

jandrese · on Aug 8, 2024

Basically, it seems like they wanted to facilitate the use case of "if the string is truncated, then realloc() the buffer to make it fit using the value convienently returned from strlcpy()".

(note: the following code is probably buggy with off by one errors and doesn't check return value properly)

   copied_bytes = strlcpy(dst, src, size);
   if ( copied_bytes >= size )
   {
       realloc(dst, copied_bytes);
       size = strlcat(dst, src + size, copied_bytes);
   }

I can appreciate that sentiment, but I think it was a mistake. That behavior is still possible where it makes sense by using strlen() first, but I'd argue that in most cases if this is possible then strdup() was the better solution all along. Basically they made a tradeoff that makes one relatively uncommon use case easier at the expense of making the function explode in other cases.

You don't need strlen() to check for truncation:

   if ( strscpy(path, homedir, sizeof(path) == sizeof(path) ) return (ENAMETOOLONG);

strscat() would have a similar syntax. Return values would be the same, including returning the size parameter in the case of truncation, making it easy to check.

   if ( strscat(path, "/", sizeof(path) == sizeof(path) ) return (ENAMETOOLONG);

On a side note I always cringe when I see people using sizeof() on strings in C. I left them in here to make the comparison easier, but I wouldn't normally do it this way. That's a gun pointed directly at your foot when this bit of code gets refactored out to a function and that string degrades to a pointer.

kragen · on Aug 8, 2024

i think they wanted to facilitate the use case of 'if the string is truncated, then close the connection and log an error message', or 'if the string is truncated, then return an error code', as in the example code i quoted from the paper

strdup() is not helpful in examples like the one i quoted from the paper, where you are building up a string by concatenating substrings, but something like stralloc is. (see other subthread) the paper recommends the libmib astring functions, which are something like stralloc: http://www.mibsoftware.com/libmib/astring/. they definitely were not recommending that people copy and paste those six lines of code with slight changes every time they wanted to copy a string

i don't agree that it makes the function explode in other use cases. if you're okay with truncation then strlcpy() will silently truncate your strings if you don't check its return value

your strscpy() example has a parse error; i think you meant

   if ( strscpy(path, homedir, sizeof(path)) == sizeof(path) ) return (ENAMETOOLONG);

which leads me to think that you mean that if strlen(homedir) is 12 and sizeof(path) is 13, strscpy copies 12 characters (not counting the nul) and returns 12, not 13, while if strlen(homedir) is 13 in that case, it also copies 12 characters, but returns 13. i agree that that would work; it is so similar to the flawed design rejected in the strlcpy paper that i thought you meant the same thing, but you evidently meant something subtly different. i agree that that design would also work for strscat

at that point, though, it might be better to return -1 or INT_MAX rather than dsize on truncation; you can't use the return value you've specified for anything before you check whether it's equal to dsize or not. (this is also true of strlcpy!) actually you also specified to return a negative value on certain other errors, which means you have to check the return value twice before using it for anything; possibly this was a mistake

i also agree that using sizeof on arrays is a footgun for exactly the reason you say, although in this case the most likely result would be that you'd notice the bug and fix it, since pointers are too short for most strings

kevin_thibedeau · on Aug 8, 2024

As discussed a few weeks ago, strlen() + memcpy() is faster than strlcpy() on superscalar platforms with branch prediction. Iterating over the string twice is not a penalty if the alternative hobbles the hardware with more complex code.

https://nrk.neocities.org/articles/cpu-vs-common-sense

kragen · on Aug 8, 2024

agreed

saagarjha · on Aug 7, 2024

memccpy, then use the return value to terminate.

connicpu · on Aug 7, 2024

C11 adds `strcpy_s` which takes (dest, destsz, src) and returns an errno_t which will report an error if the src string is longer than destsz, as silent truncation is often not a desirable behavior. It also assigns dest[0]='\0' on error so you don't get an unterminated garbage string.

david2ndaccount · on Aug 7, 2024

Only msvc provides strcpy_s and they don’t conform to the standard. Other libcs don’t provide it. Ignore everything from Annex K and write your own wrappers around memcpy. You should always know the size of your buffers.

connicpu · on Aug 7, 2024

Ah that sucks. Guess C is just stuck like this for the long term. Writing your own functions is still the best advice :'(

kragen · on Aug 8, 2024

on the plus side, c is good at writing your own functions

paulryanrogers · on Aug 7, 2024

strlcpy?

kragen · on Aug 8, 2024

for general-purpose string handling in software where failure is an option, i like the qmail stralloc approach

    if (!addrparse(arg)) { err_syntax(); return; }
    flagbarf = bmfcheck();
    seenmail = 1;
    if (!stralloc_copys(&rcptto,"")) die_nomem();
    if (!stralloc_copys(&mailfrom,addr.s)) die_nomem();
    if (!stralloc_0(&mailfrom)) die_nomem();
    out("250 ok\r\n");

basically you have a struct with buffer-pointer, length, and capacity fields, like a golang slice, and you modify it with a small number of functions which reallocate the buffer if it isn't big enough. the ones you see here are stralloc_copys, which sets the buffer contents to the contents of a nul-terminated string, and stralloc_0, which appends a nul to the buffer. there are also functions for appending an arbitrary byte, for copying one stralloc to another, for copying counted strings into strallocs, and for concatenation, for determining whether one is a prefix of another, etc., but depending on the application, you may or may not need to implement these

the whole stralloc library is 97 lines of k&r c, so reimplementing the part you need for a given program is pretty trivial. it's in the public domain

for most programs, a disadvantage of the particular way that stralloc is implemented in qmail is that you have to check every single copy or concatenation operation for an out-of-memory error, as you see above. this makes your code a lot longer. many applications are better off just aborting inside the memory allocation function if they run out of memory; getting out-of-memory handling correct is very difficult, especially if you don't devote a massive amount of effort to testing out-of-memory conditions (because they won't occur often enough just by chance to test your error-handling path)

(another disadvantage of the way stralloc is implemented is that you probably don't really want to use unsigned int for the two length fields on lp64 platforms)

for some applications you might prefer just using strdup() (or xstrdup()) or non-owning string-view types (a pointer and a length, perhaps into an input file you've mapped into memory), or lisp-style symbol interning (plus some kind of buffer management probably). arena allocation, if you can afford it, makes dynamic memory allocation for strings a much more reasonable thing to do: no risk of a memory leak, fast allocation, instant deallocation. but again some applications do poorly with arena allocation

but please don't fill your program with fifty zillion string buffers of arbitrarily chosen sizes and then try to separately pass the right size in seventy zillion string-processing function calls. your code will be hard to read, buggy, and probably insecure. factor string buffer length handling into a small part of your program so that most of your code never has to think about string buffer lengths

lelanthran · on Aug 8, 2024

> for most programs, a disadvantage of the particular way that stralloc is implemented in qmail is that you have to check every single copy or concatenation operation for an out-of-memory error, as you see above. this makes your code a lot longer. many applications are better off just aborting inside the memory allocation function if they run out of memory; getting out-of-memory handling correct is very difficult, especially if you don't devote a massive amount of effort to testing out-of-memory conditions (because they won't occur often enough just by chance to test your error-handling path)

You could do that. Or you could put a field in the struct that stores an error flag. If flag is set, all `stralloc` functions return immediately. When they fail, they set the flag and then return.

This lets you do:

    stralloc_copys(&rcptto,""));
    stralloc_copys(&mailfrom,addr.s));
    stralloc_0(&mailfrom));

    if (stralloc_error(rcptto) || stralloc_error(mailfrom)) {
        die_nomem();
    }

I'd go one further and make the error checker function take variable arguments, so that the last line looks like this:

    if (stralloc_error (rcptto, mailfrom, NULL)) {
        die_nomem();
    }

IME, forgetting to terminate the parameter list with a NULL almost always causes the program to blow up on the very first execution.

> but please don't fill your program with fifty zillion string buffers of arbitrarily chosen sizes and then try to separately pass the right size in seventy zillion string-processing function calls. your code will be hard to read, buggy, and probably insecure. factor string buffer length handling into a small part of your program so that most of your code never has to think about string buffer lengths

I agree, but after years and years of looking at and writing idiomatic safe C code, I am now of the opinion that a string library is, while a better approach to slinging around raw strings, still very much the wrong approach.

Nothing stops the developer from doing Parse, Don't Validate! in C, and this means that seeing C strings being used anywhere other than at the boundaries to the system evokes my code-smell senses.

kragen · on Aug 8, 2024

these are very good ideas; thank you! by coincidence yesterday i was looking at some code i wrote in golang six years ago which uses this same approach to error handling for i/o errors. i wonder if you might be better off putting the error flag in the allocator rather than the individual string objects?

i do think parse, don't validate is much more difficult in c; c's type system is not strong enough to give you the kinds of soundness guarantees you get from ocaml or haskell. if you forget a type case in a switch, there's not a whole lot you can do to get the compiler to complain about it

the code i quoted above is from qmail-smtpd.c, which is 373 lines of code like the above and contains all of qmail's smtp input logic except for ip_scanbracket, which parses strings like [127.0.0.1] and is shared with dns.c. it's not clear to me that a parse, don't validate approach would consist of much more than just the parser

maybe using a parser generator for all your input and output handling would help? i'm still skeptical that something like a text editor or a macro processor is going to have a large body of code that is free of string handling

lelanthran · on Aug 8, 2024

> i do think parse, don't validate is much more difficult in c; c's type system is not strong enough to give you the kinds of soundness guarantees you get from ocaml or haskell. if you forget a type case in a switch, there's not a whole lot you can do to get the compiler to complain about it

Sure, I agree, but we're talking about strings here. Instead of a function taking or returning an email address in a generic string type, it can take or return an email address type.

For example, construction of a value of type `email_t` can take a parameter of raw string. Then any function in the rest of the code that receives an email would receive an `email_t`, not a `char ` or some other generic string type.

> it's not clear to me that a parse, don't validate approach would consist of much more than just the parser

It might often be nothing but* a parser, such as `email_t`, but it means that no `str()` function would then be used by the caller - any operation on the `email_t`, if `email_t` is an opaque pointer, would used the `email_type_()` functions, because the users of any `email_t` value cannot access, or even see, the fields inside an `email_t` value.

This means that passing an email to a function expecting a name would cause a compiler error.

For the IP address example you mention, that definitely should be parsed only once into the quad-byte or quad-quad integer fields.

I mean, I'm looking over my previous projects: every single instance of a string I am using is actually not just "generic string"; there's an associated type with it (name, description, comment, whatever). Making those into different types with their own operations means that the compiler will generate errors if I try to use a `description_t` where a `name_t` is expected.

kragen · on Aug 8, 2024

possibly you need to \ your *s

indeed it is the case in qmail that ip_scanbracket populates a struct ip_address. but rcptto, where the destination email address goes, is just a byte buffer in a very simple ad-hoc format which, if i understand correctly, gets written to a pipe; qmail's privilege separation design, which its author to a significant extent came to regret, adds some extra difficulties here by requiring things to run in separate processes

what you read from or write to a pipe is, at that point, necessarily just a generic string. you could write some kind of generic serialization layer, but doing that in c requires a preprocessor, and unmarshaling things in a statically type-safe way really requires compiler support for sum types, which c doesn't have

aside from that, i think it's pretty likely that trying to parse the email address in the qmail-smtpd process would have made the code more bug-prone rather than less so

lelanthran · on Aug 8, 2024

> is just a byte buffer in a very simple ad-hoc format which, if i understand correctly, gets written to a pipe; qmail's privilege separation design, which its author to a significant extent came to regret, adds some extra difficulties here by requiring things to run in separate processes

> ...

> what you read from or write to a pipe is, at that point, necessarily just a generic string.

Aren't the reader and writer of the pipe part of the same software package?

If they are, then safe[1] functions for that type make sense:

    bool email_to_bytes(email_t *src, uint8_t **dst, size_t *len);
    bool email_from_bytes(email_t *dst, uint8_t *src, size_t len);

This still means that you're only ever passing around `email_t` values, not `char *` values.

On the other hand, if the reader and writer of the pipe are in different packages, then the pipe is the boundary for each of them, and you wouldn't be passing language native types without first serialising to a language independent representation anyway.

[1] By "safe" I mean that they don't overflow and that the actual binary format allows the `from_bytes` function to determine when the input could be malicious.

kragen · on Aug 8, 2024

yes, they are part of the same software package. i guess qmail-smtpd does have to parse the email address somewhat in order to match the domain against rcpthosts so it can reject attempts to relay mail? and yeah, that's what the addrparse() call does in the code i quoted—but it just stores the email in addr as a canonicalized string. so it may end up rewriting things like lelanthran@[10.1.2.3] as lelanthran@lelanthran.com, for example, and also has code to strip out explicit smtp source routes (@foo.com:lelanthran@lelanthran.com) which were still in theory required when qmail came out

so when i implied that 'trying to parse the email address in the qmail-smtpd process' was not a thing that was already being done, i was wrong, so plausibly your recommendation is in fact applicable; maybe it would have been better to parse the email address into a struct with user and host fields, then have email_to_bytes represent the email <kragen@gentle.dyn.ml.org> as T6:kragen,17:gentle.dyn.ml.org, instead of as Tkragen@gentle.dyn.ml.org\0. i mean you could generate email_to_bytes with a code generator (an idl compiler) instead of writing it

then you wouldn't have to worry about the possibility that you'd accidentally left an @ in one part or the other—unless you did relay the mail over smtp to some other host, in which case you would have to worry about it anyway

cobbal · on Aug 8, 2024

strncpy is fine as long as it's not used in isolation. My preferred pattern (when I want the truncation) is to use it, and then unconditionally set the last byte of the buffer to null. This will always result in a valid C string.

StressedDev · on Aug 8, 2024

If you are doing Windows C/C++ development, you can use the strsafe.h functions (https://learn.microsoft.com/en-us/windows/win32/api/strsafe/). When I wrote C/C++, I found them easier to use than the standard C functions because they handled all of the usual failure cases (buffer too small, integer overflow, etc.). It was also easy to check if there was a failure because all of the functions returned a failure code if something went wrong.

In this case, StringCchCopyW(), or StringCbCopyW() would be a better choice than strcpy.

thekevan · on Aug 7, 2024

I read that he used to drive around and when he saw a skateboarder, he'd yell "do an ollie" and then give them a new helmet.

dfex · on Aug 8, 2024

"Do a kick-flip"

lloeki · on Aug 8, 2024

That's on Eric Koston though although Tony Hawk did participate.

https://m.youtube.com/watch?v=ob0dI05Xz8s

Koston did not invent the thing but has been a major popularity contributor to it.

https://www.surfertoday.com/skateboarding/why-do-skateboarde...