Collaborator8.2-Blog-CTA-Demo

C11: A New C Standard Aiming at Safer Programming

Thirteen years after the ratification of the C99 standard, a new C standard is now available. Danny Kalev, a former member of the C++ standards committee, shares an overview of the goodies that C11 has to offer including multithreading support, safer standard libraries, and better compliance with other industry standards.

constexpr_to_improve_security

C11 is the informal name for ISO/IEC 9899:2011, the current standard for the C language that was ratified by ISO in December 2011. C11 standardizes many features that have already been available in common contemporary implementations, and defines a memory model that better suits multithreading. Put differently, C11 is a better C.

See also: The Biggest Changes in C++11 (and Why You Should Care)

Problems with the C99 Standard

C99, the previous C standard, brought about many new features including:

  • Variable length arrays
  • Designated initializers
  • Type-generic math library
  • New datatypes: long long, _Complex, _Bool
  • restrict pointers
  • Intermingled declarations of variables
  • Inline functions
  • One-line comments that begin with //

Alas, it hasn’t been a huge success. Finding C99-compliant implementations is a challenge even today.

Where did C99 go awry? Some of its mandatory features proved difficult to implement in some platforms. Other C99 features were considered questionable or experimental, to such an extent that certain vendors even advised C programmers to replace C with C++.

Politics also played a role in the lukewarm reception of C99. It’s no secret that the cooperation between the C and C++ standards committees in the late 1990s was lacking, to say the least. The good news is that today, the cooperation between the two committees is much better, and that the design mistakes of C99 were avoided in C11.

A New Standard, a New Hope?

C’s security has always been a matter of concern. Insecure features – such as string manipulation functions that don’t check bounds and file I/O functions that don’t validate their arguments – have been a fertile source of malicious code attacks.

C11 tackles these issues with a new set of safer standard functions that aim to replace the traditional unsafe functions (although the latter are still available in C11). Additionally, C11 includes Unicode support, compliance with IEC 60559 floating-point arithmetic and IEC 60559 complex arithmetic, memory alignment facilities, anonymous structs and unions, the _Noreturn function specifier, and most importantly – multithreading support. Yes, I said the m-word!

Let’s look at some of these features and others more closely.

Multithreading

For the typical C programmer, the biggest change in C11 is its standardized multithreading support. C of course has supported multithreading for decades. However, all of the popular C threading libraries have thus far been non-standard extensions, and hence non-portable.

The new C11 header file <threads.h> declares functions for creating and managing threads, mutexes, condition variables, and the _Atomic type qualifier. Another new header file, <stdatomic.h>, declares facilities for uninterruptible objects access. Finally, C11 introduces a new storage class specifier, _Thread_local (the C equivalent of C++11’s thread_local). A variable declared _Thread_local isn’t shared by multiple threads. Rather, every thread gets a unique copy thereof.

As an anecdote, if you’re looking for someone to blame for the unwieldy keyword _Thread_local, blame Yours Truly. In the early 2000s, when the C++ standards committee began working on multithreading support, the original proposal for thread-local storage used the keyword __thread which I considered dangerous and opaque as it didn’t clearly express the intent of the keyword (after all, __thread didn’t create threads!), and might have conflicted with legacy code that happened to use __thread for user-declared identifiers. My proposal to change __thread to thread_local was accepted. thread_local has since percolated into other programming languages, including C11. Donations and hate mail alike are welcome!

Another C11 thread-related feature is the quick_exit() function that lets you terminate a program when exit() won’t work, e.g., when cooperative cancellation of threads is impossible. The quick_exit() function ensures that functions registered with at_quick_exit() are called in the reverse order of their registration. After that, at_quick_exit() calls _Exit(), which doesn’t flush the process’s file buffers, as opposed to exit().

Anonymous structs and unions

An anonymous struct or union is one that has neither a tag name nor a typedef name. It’s useful for nesting aggregates, e.g., a union member of a struct. The following C11 code declares a struct with an anonymous union and accesses the union’s data member directly:

struct T //C++, C11
{
int m;
union //anonymous
{
  char * index;
  int key;
};
};
struct T t;
t.key=1300; //access the union's member directly

Type-Generic Functions

C11 doesn’t have templates yet but it does have a macro-based method of defining type-generic functions. The new keyword _Generic declares a generic expression that translates into type-dependent “specializations.”

In the following example, the generic cubic root calculation macro cbrt(X) evaluates to the specializations cbrtl(long double), cbrtf(float) and the default cbrt(double), depending on the actual type of the parameter X:

//C11 only
#define cbrt(X) _Generic((X), long double: cbrtl, 
                              default: cbrt, 
                              float: cbrtf)(X)

How does it work? The parameter X translates into the specific type of the function argument. The compiler then selects the matching variant of cbrt(): cbrtl() if X is long double, cbrtf() for float, and cbrt() otherwise.

Memory Alignment Control

Taking after C++11, C11 introduces facilities for probing and enforcing the memory alignment of variables and types. The _Alignas keyword specifies the requested alignment for a type or an object. The alignof operator reports the alignment of its operand. Finally, the aligned_alloc() function.

void *aligned_alloc(size_t algn, size_t size);

allocates size bytes of memory with alignment algn and returns a pointer to the allocated memory.

The alignment features of C11 are declared in the new header file <stdalign.h>.

The _Noreturn Function Specifier

_Noreturn declares a function that does not return. This new functions specifier has two purposes: suppressing compiler warnings on a function that doesn’t return, and enabling certain optimizations that are allowed only on functions that don’t return.

_Noreturn void func (); //C11, func never returns

Unicode Support

The Unicode standard defines three encoding formats: UTF-8, UTF-16, and UTF-32. Each has advantages and disadvantages. Currently, programmers use char to encode UTF-8, unsigned short or wchar_t for UTF-16, and unsigned long or wchar_t for UTF-32. C11 eliminates these hacks by introducing two new datatypes with platform-independent widths: char16_t and char32_t for UTF-16 and UTF-32, respectively (UTF-8 encoding uses char, as before). C11 also provides u and U prefixes for Unicode strings, and the u8 prefix for UTF-8 encoded literals. Finally, Unicode conversion functions are declared in <uchar.h>.

Static Assertions

Unlike the #if and #error preprocessor directives, static assertions are evaluated at a later translation phase, when the type of the expression is known. Therefore, static assertions let you catch errors that are impossible to detect during the preprocessing phase.

Bounds-Checking Functions

Technical Report 24731-1, which is now an integral part of C11, defines bounds-checking versions of standard C library string manipulation functions. The bounds-checking versions have the _s suffix appended to the original function names.

For example, the bounds-checking versions of strcat() and strncpy() are strcat_s() and strncpy_s(), respectively. Most of the bounds-checking functions take an additional parameter indicating the size of the buffer they process. Many of them also perform additional runtime checks to detect various runtime exceptions.

Let’s look at two famous string manipulation functions:

//C11, safe version of strcat
errno_t strcat_s(char * restrict s1, 
                 rsize_t s1max, 
                 const char * restrict s2);

strcat_s() copies no more than s1max bytes to s1. The second function, strcpy_s() requires that s1max should be bigger than the length of s2 (more precisely, s1max should be be greater than strnlen_s(s2, s1max)) in order to prevent an out-of-bounds write::

//C11, safe version of strcpy
errno_t strcpy_s(char * restrict s1, 
                rsize_t s1max, 
                const char * restrict s2);

Originally, all of the bounds-checking libraries were developed by Microsoft’s Visual C++ team. The C11 implementation is similar but not identical.

gets() Removed

gets() (declared in <stdio.h>) reads a line from the standard input and stores it in a buffer provided by the caller. gets() doesn’t know the actual size of its buffer. Malicious software tools and crackers have often exploited this security loophole for generating buffer overflow attacks. Consequently, gets() was deprecated in C99. C11 removed it entirely, replacing it with a safer version called gets_s():

char *gets_s(char * restrict buffer, size_t nch);

gets_s() reads at most nch characters from the standard input.

New fopen() Interface

fopen(), a widely-used file I/O functions, gets a facelift in C11. It now supports a new exclusive create-and-open mode (“...x“). The new mode behaves like O_CREAT|O_EXCL in POSIX and is commonly used for lock files. The “...x” family of modes includes the following options:

  • wx create text file for writing with exclusive access.
  • wbx create binary file for writing with exclusive access.
  • w+x create text file for update with exclusive access.
  • w+bx or wb+x create binary file for update with exclusive access.

Opening a file with any of the exclusive modes above fails if the file already exists or cannot be created. Otherwise, the file is created with exclusive (non-shared) access. Additionally, a safer version of fopen() called fopen_s() is also available.

In Conclusion

C11 attempts to fix what was broken in C99. It makes some of the mandatory features of C99 (variable length arrays, complex types and more) optional, and introduces new features that were already available in various implementations. Not less important, C11 designers worked closely with the C++ standards committee to ensure that the two languages should remain compatible as much as possible. Chances are good that unlike its predecessor, C11 will receive a warm reception. As a bonus, software written in C11 will be more robust against security loopholes and malware attacks.

See also:

Danny Kalev is a certified system analyst by the Israeli Chamber of System Analysts and software engineer specializing in C++. Kalev has written several C++ textbooks and contributes C++ content regularly on various software developers’ sites. He was a member of the C++ standards committee and has a Master’s degree in general linguistics.







subscribe-to-our-blog

Comments

  1. Aaron Davies says:

    you forgot to close your TT tag in “New fopen() Interface” (twice, actually)

  2. Dario Niedermann says:

    I have noproblems with what I’ve seen so far EXCEPT the _Noreturn specifier. That is a GROSS deviation from the standard C syntax. It’s HORRIBLE. It just shows that the people who approved such a thing cannot be trusted to do the right thing. 
     
    ‘_Noreturn’ should have been a function type, like ‘register’ is for variables.

  3. I find it moronic to keep trying to massage C into a safe language when there are already safe languages out there, use them instead, let C and it’s ilk die – and no Java isn’t it either. 
     
    It’s also pretty stupid to state: 
     
    C of course has supported multithreading for decades. 
     
    When it hasn’t and then follow that statement with: 
     
    However, all of the popular C threading libraries have thus far been non-standard extensions, and hence non-portable. 
     
    Basically rendering the former statement false. 
     
    We all know C can use libs to do threading, doesn’t make it part of the language, i.e. supporting it. 

    • Depends how you define “support”. I would have said “specified” or “required” to imply it was part of the standard. Just referring to it as “C” is totally ambiguous too.

  4. Dennis Farr says:

    Imagine if they reconsidered ipv6 every 13 years. Ipv6 is like trying to change out the engines in-flight on a jumbo jet carrying more than 10% of the global economy, and ending up with 20-year-old ideas and no better security, even after the hardest possible transition.

  5. testman says:

    You wrote “C11 introduces a new storage class specifier, _Thread_local (the C equivalent of C++11’s thread_local)” 
     
    FYI, oo my understanding thread_local is the equivalent of Java’s ThreadLocal ;-) 
     
    Anyway, lots of good news on the C side, not sure it is not too late.

  6. Jeffrey Carter says:

    At this rate, in 50 years C might get to where Ada was in 1983.

    • At this point, C is much more than just a language. It’s also the native ABI for all things built on top of it (basically everything).

  7. Lots of languages are rubbish for lots of purposes. That’s why lots exist! The trick is picking the right one for the job. C is a brilliant language for lots of applications. Why troll about engineering?

  8. Paulo Pinto says:

    To be honest, I fail to see how Technical Report 24731-1, improves C security. 
     
    You still need to let the function know how big the buffer is, s1max in your examples. 
     
    So if s1max is bigger than the real size of target buffer, the exploit still happens. 
     
    The only way to fix C fertile ground for exploits, is to deprecate arrays decaying into pointers, which is not feasible. 
     
    So the way forward to fix C fertile ground for exploits is to replace it with a more type safe language, not by library technical reports. 
     
    Or do I miss something?

    • Sure, but if *you* allocated the buffer, then it’s *your* problem if you overflow it. How is that not an improvement over null-termination?

      The point is that the C library now gives you a knife to cut the rope you were hanging with but it doesn’t see fit to lock you in a padded cell and confiscate your shoelaces.

  9. David Kastrup says:

    Fortran lovers should actually pay the C standard committee for their 
    work on keeping C irrelevant for serious numeric work. 
     
    C99 finally introduces variable-sized multidimensional arrays about 50 
    years after Fortran (the important thing for numeric libraries is 
    having variable-sized arrays as function parameters; the lack of 
    variable-sized automatic arrays is easier to work around), and C11 
    makes them optional again? Way to go. 
     
    Another shot in the foot are restrict pointers: if the optimizer is 
    supposed to have an inkling of a chance at pipelining and strength 
    reduction, numerical subroutines must be able to assume independent 
    structures in their arguments. 
     
    This is the case for naively programmed Fortran functions, written by 
    good mathematicians and mediocre programmers. 
     
    For C, you get equivalent conditions only for genius programmers. The 
    non-aliasing assumption is _lethally_ important when working with 
    arrays as objects. 
     
    Now what does the standard do? It provides a non-aliasing promise in 
    the form of the restrict keyword that does not work for array 
    declarations. It _only_ works for pointer declarations. And the case 
    where it is really important is when your pointer is just used for 
    passing an _array_. 
     
    The _sane_ way to get naively written code work optimally would have 
    been to say “array declaration as function parameter -> restrict is 
    implied”. This would have made array and pointer declarations 
    slightly different by default, in a manner matching actual use of 
    those constructs. 
     
    Instead, the standard has chosen the path “if you want to tell the 
    compiler that you indeed want your function arguments to be treated as 
    whole non-aliased objects, you can’t declare them as whole non-aliased 
    objects but rather need to write them explicitly as pointers to object 
    parts, with an additional qualification”. 
     
    Way to go. Good preparation for being at one time able to celebrate 
    100 years of actively using 
     
    extern “FORTRAN” …; 
     
    What is it that makes C standard committees obsessed with making C 
    arrays useless for serious, optimizer-supported work? 
     
    C++ is not actually better. They say “C arrays are too low-level to 
    be useful”. But then it is not like the language offers anything that 
    the compiler could better work with. 

  10. Well, it is moronic to only deprecate gets(), when well known functions like strcpy, scanf can all be exploited….  
     
    Also, none of the existing thread ABIs seem to be compatible with the new standard – which makes the new feature such a nuisance! 
     
    I really think that making the standard, for the heck of it, is a BAD idea.

  11. Signal31 says:

    Aside from, perhaps, the note about eliminating certain functions that, no matter how they’re used, are not safe, this article and, indeed, the notion of protecting programmers from a programming language is pure nonsense. 
     
    All languages I’ve used over the decades allow a programmer to employ techniques that result in “unsafe” operations. Changing the language will not change the programmer, and introducing more checks in current functions rely on the compiler developer to keep things running quickly (i.e. develop efficient code themselves). 
     
    C++, for instance, introduces extra toys that result in slower programs. Benchmarking hashes, vector and the like against their C equivalents results, every time, in slower, bulkier code in C++. The exception to this are classes, which thankfully are just as quick as passing a struct to a function. 
     
    Ultimately, the idea is to teach programmers how to manage their code properly – not to change a language to “protect” programmers. 
     
    This idea is one big fail. 

  12. If you polish a turd, it’s still a turd! C is a horrible language for engineering applications!

    • Nope, it’s just a sharp knife to be wielded for a specific purpose. You on the other hand, are not very sharp at all, and definitely seem to be on the wrong end of the Dunning-Kruger effect.

    • By the way, what languages would you consider not “horrible”?

      Python? It’s quite a large, highly engineered program in itself, and is implemented in C.

      Perl/Ruby/Lua/JavaScript/Every-other-language? Yep, they’re large programs implemented in C (or C++) too.

      Come back when you get a clue please, junior.

      • python is not implemented in C. Cpython is implemented in C. Jython in Java. Ironpython in .NET. PyPy in python (and rpython) itself. ClPython in common lisp. Pyjamas in javascript. and the list goes on. All the above are python.

        Python is a language not an implementation.

        Same applies to at lease some of the others candidates you offered. C is popular but not that popular.

        Also please note even in the case of Cpython only a 40% of the implementation is written in C and a 50% is written in python. I take these number from ohloh page about python. I mention it so people have a good idea of what is what.

        Saying that , yes C is the most popular language. And to tell you the truth I enjoy it a lot more than I enjoy C++. If I want OO one can use GObject or another programming language like python.

        • chasesan says:

          But what is java implemented in? C, how about .Net/Mono? C, of course. Lisp? C (well most of them, there might be an ASM or C++ one). Javascript? C, hah of course.

          So while you say that Python is not implemented in C, in the long view you would be wrong.

  13. Eric Miller says:

    I think Dario’s comment refers to an error in the text. 
     
    _Noreturn is syntactically similar to inline, so a line of code using _Noreturn would actually look more like: 
     
    _Noreturn void myfunc(void);

  14. Danny Kalev says:

    Eric and Dario: you’re both right, _Noreturn is a specifier, not an attribute and therefore it appears before the function’s name, as inline, extern etc. Not that it’s going to make _Noreturn more appetizing in any way…

  15. not clear

  16. Guys, it’s not like anyone is holding a gun to your head, forcing you to use C. If you don’t like the language, then don’t use it. Most people in the world code C, even a great many of the people who think they code C++. So as for C being ‘irrelevant’… yeah. right.. sure… Now, go stand in the corner.

    • C provides the low level ABI for basically every other language implementation in existence. Anyone who claims C is “irrelevant” is so laughably misinformed, that it’s almost not worth the effort of pointing out their stupidity.

      • chasesan says:

        I still write in C every day, even for numerous “new” programs. Low level, fast, simple and clean.

  17. The new standard is certainly better than the previous one, but the changes are too timid. 
     
    If you want safe string manipulation, you should have high level functions such as string_append and string_replace that reallocate buffers when necessary. 
     
    C should also include hash maps and resizable arrays to just start thinking of being on par with other, safer, languages. 
     
    _Generic is cool, and it fixes the “too magic” tgmath.h introduced in C99, finally giving the ability to overload functions to normal programmers rather than requiring compiler magic to implement tgmath.h

  18. Aren’t these low level functions what make C the C that it is. If anyone thinks s/he wants high level (safe? :eek:) functions, then why don’t you use C++, Java, etc. Nothing can stop a stupid programmer from writing while(1);

    • chasesan says:

      You can write while(1) is almost any language. Though in this case, C11 now provides the _Noreturn specifier, so that we can tell the compiler that we did it on purpose.

  19. I think there is an error in your post when you write strcpy_s() requires that s1max isn’t bigger than the size of s2 in order to prevent an out-of-bounds read. According to the standard http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf chapter K.3.7.1.3 The strcpy_s function the paragraph Runtime-constraints says that s1max shall be greater than strnlen_s(s2, s1max) and that is the opposite of your sentence :).

  20. “Low level” is not the same as “broken” (such as gets, sprintf, which are almost impossible to use safely). 
     
    People do need high level constructs in C, and they rewrite them from everyday. If the best alternative is moving to C++ (which brings its own problems), we are in trouble. Java is not an alternative at all.

    • Exactly. Java is completely reliant on the native ABI. Some language still has to provide that.

      I think many people hold the naive view that Java can “replace” C, when it is (and probably always will be) built *on top of* C.

  21. James Gosling says:

    What’s with the Java haters?

  22. Danny Kalev says:

    Thanks Halex. I revised the description of strcpy_s according to the text of the C11 standard. It should be C11-compliant now.

  23. chasesan says:

    I hope that the next C, maybe C14, or C16 will provide some of the nice sexy extensions we have as default language features. Such as anonymous struct’s (MS) so we can do struct xyz { struct abc; }, and access abc’s values as if they were native to xyz. I also wouldn’t default anonymous functions (GNU C, Clang, etc), or at the very least, default nested functions (GNU C).

  24. These naive arguments about C are very amusing!.Please do not display your shallow knowledge of IT this way.C was written in machine code and assembler and to make it readable by humans.OOP was therefore out of the question.The speed of C is directly linked to assembler/machine code.See?Every other derivative inherits this property of C but not everything.Can you imagine OOP languages written completely in Machine code or even Assembler? What would their size be?

  25. Delphi Nostalgic says:

    Delphi had all of these features implemented a long time ago. I don’t understand how a ghastly language like C++ ever survived.

Speak Your Mind

*