Meeting 148 - April mailing, UB, static analysis

Media

Video

Podcast

WG21 April mailing

The April committee mailing is out (Reddit). There are several updates for papers that we talked about previously, and a few new proposals. Let’s quickly go through some of them.

Proxy: A Polymorphic Programming Library

P0957R7 by Mingxin Wang, Microsoft, introduces a template-based notion of a “proxy” that combines OOP and functional programming to provide an efficient implementation of polymorphism. It is supposedly capable of replacing virtual function mechanism entirely, while being more efficient. There is a header-only test implementation for C++20. The author illustrates the proposal by comparing it against a standard polymorphic class hierarchy with the following abstract base class defining an interface:

 1class IDrawable {
 2public:
 3  virtual void Draw() const = 0;
 4};
 5
 6class Rectangle : public IDrawable {
 7  void Draw() const override;
 8  <...>
 9}
10
11class Circle : public IDrawable {
12  void Draw() const override;
13  <...>
14}
15
16void DoSomethingWithDrawable(IDrawable* p);

With “proxy” you would define a “dispatch” “Draw” as a type:

1struct Draw : std::dispatch<void()> {
2  template <class T>
3  void operator()(const T& self) { self.Draw(); }
4};

Then you define a “facade”:

1struct FDrawable : std::facade<Draw> {};

The actual implementation classes can then be defined without using any virtual functions:

 1class Rectangle {
 2public:
 3  void Draw() const;
 4  <...>
 5};
 6
 7class Circle {
 8public:
 9  void Draw() const;
10  <...>
11};

A function that uses these would be defined as follows:

1void DoSomethingWithDrawable(std::proxy<FDrawable> p)
2{
3    p.invoke<Draw>();
4}

Now let’s say we need to add another function to the Drawable interface. Instead of adding a new virtual function to the base class and overriding it in all the derived classes, with “proxy” we only need to add another “dispatch”:

1struct Area : std::dispatch<double()> {
2  template <class T>
3  double operator()(const T& self) { return self.Area(); }
4};
5struct FDrawable : std::facade<Draw, Area> {};

The author also describes a factory function use case.

Regarding his motivation, the author writes:

Currently, the standard polymorphic wrapper types, including std::function and std::any, are based on value semantics. Polymorphic wrappers based on value semantics has certain limitations in lifetime management comparing to pointer semantics. Designing the “proxy” library based on pointer semantics decouples the responsibility of lifetime management from the “proxy”, which provides more flexibility and helps consistency in API design without reducing runtime performance.

Interesting technique, and according to the author, the generated assembly is also better than that of virtual functions. The implementation uses type erasure and stores function pointers in std::unique_ptr (which unfortunately means there is a cost of heap allocation). The author compares “proxy” to other similar libraries, like Dyno which uses value semantics, and “Dynamic Generic Programming with Virtual Concepts” (DGPVC).

Even if this is not accepted into C++, it could be implemented as a library, since it doesn’t require any new language features. The test implementation doesn’t currently build in Clang, as it lacks support of Conditionally Trivial Special Member Functions (P0848).

Structured Bindings can introduce a Pack

P1061R2 by Barry Revzin and Jonathan Wakely proposes to allow the following syntax:

1std::tuple<X, Y, Z> f();
2
3auto [x,y,z] = f(); // OK today
4auto [...xs] = f(); // proposed: xs is a pack of length 3 containing an X, Y, and a Z

There is a difficulty with implementation needed to support the following usage:

1auto sum_non_template(SomeConcreteType tuple) {
2    auto [...elems] = tuple;
3    return (... + elems);
4}

The authors write:

We have not yet in the history of C++ had this notion of packs outside of dependent contexts. This is completely novel, and imposes a burden on implementations to have to track packs outside of templates where they previously had not.

There is a test implementation in Clang and Compiler Explorer.

`#embed` - a scannable, tooling-friendly binary resource inclusion mechanism

P1967R5 by JeanHeyd Meneide resurfaced after a pause, raising hope to be able to easily include binary data in programs:

1const unsigned char icon_display_data[] = {
2    #embed "art.png"
3};

To remind you of the history of this proposal, it is an evolution of the initial std::embed feature that was supposed to be a library function. But the author decided to go with a preprocessor directive-based feature as a start.

`std::execution`

The P2300R5 Senders/Receivers paper has been updated, addressing some of the issues and review comments.

Just look at this example code implementing recursive file copying. Beautiful.

`= delete("should have a reason");`

P2573R0 by Yihe Li proposes to add a message to = delete so that in case of usage the error message is more meaningful. I guess it would be better than a comment.

C++ Modules Discovery in Prebuilt Library Releases

PP2577R0 by Daniel Ruoso, Bloomberg, proposes a mechanism to allow a pre-built library to specify which modules it provides to clients by distributing metadata files alongside library binaries to use them in client linker commands.

This specification may become obsolete by a wider scale convergence in the area of package management in the C++ ecosystem.

I’m not holding my breath.

An interesting comment from the thread:

Man, reflection really fell off the map. There was a lot of activity for a while.

To which Niall Douglas replies:

There should be a whole bunch more activity soon. The Standard C++ Foundation have been funding a dedicated developer in this area since last year I think, just like how the development of Ranges was funded. It just takes time to bake the cookie, that’s all.

Of all the future C++ features Reflection is my most anticipated one.

A Rust-style borrow checker for C++

Shuai Mu from New York created a Rust-style borrow checker for C++: GitHub, Reddit. The author says:

Initially all the checks are at runtime, which already eases some debugging issues for me. <…> I also tried many static analysis tools, including cppcheck, clang/clang-tidy, and MSVC (the most recent one with lifetime support). I had high hopes for them but then I found they mainly support single function/file level check. <…> Or in other cases (like MSVC) the checker would mark everything as (false) positives. <…> The other day I came across Facebook’s Infer and it seems to have implemented a Rust-like lifetime checker. So I tested it with my borrow-cpp and it seems to work well. It can accurately tells which line of code violates the rule.

When a run-time borrow check fails, the library triggers a null pointer dereference to cause run-time error. Unfortunately, as a redditor suggested, this is a problem: null pointer dereference is undefined behaviour (UB) in C++, which means compilers are free to interpret and optimise away code that causes it as they please. When this issue was raised, the author promised to use abort() instead.

The issue of handling potential UB by compilers spawned quite a discussion in the Reddit thread. The author suggested that compilers warn about UB they find in code:

So, I think, just from the case of memory management, many UB are not intentional and maybe they are just bugs. The right thing the compilers should do is to warn them instead of optimize on them?

The following reply addressed this idea and explained why it’s impossible:

There are switches in compilers to try and do that: search for mention of “hardening”, or for sanitizers. Some checks are relatively cheap, most are not however.

Warning about UB, however, is otherwise nigh impossible. In the middle layer of a compiler UB is normal; there’s an assumption that the front-end will have created an IR where UB is only in paths that cannot be reached during execution – which the front-end knows from higher-level semantics. <…> optimizers are incredibly dumb. They’re composed of hundreds of very simple, very focused analysis and transformation passes and faced by the emerging behavior of the pipeline it may look like they’re smart (or annoying), but really each pass is fairly dumb and so is the whole.

Chris Lattner on UB

The commenter suggested reading articles by Chris Lattner (the main author of LLVM) on how UB helps optimization and how big of a minefield it creates.

Part 1:

Undefined behavior exists in C-based languages because the designers of C wanted it to be an extremely efficient low-level programming language. In contrast, languages like Java (and many other ‘safe’ languages) have eschewed undefined behavior because they want safe and reproducible behavior across implementations, and willing to sacrifice performance to get it. While neither is “the right goal to aim for,” if you’re a C programmer you really should understand what undefined behavior is.

Chris provides the following examples of UB.

Use of an uninitialised variable being UB helps optimisation as Java-like zero-initialisation guarantee would be too costly.

Signed integer overflow being UB helps optimisation, e.g. loops. It can be treated as defined by using -fwrapv in Clang and GCC.

Oversized shift amounts is UB as it behaves differently on various CPUs.

Dereferencing bad pointers/out-of-bounds array access is another example of unavoidable UB. To prevent this each array access would have to be checked, and each pointer would have to carry size information alongside it, thus breaking C ABI.

Dereferencing a NULL pointer is UB and not necessarily a crash. If you want a crash, dereference a volatile NULL pointer when using Clang.

Violating type rules is UB, like type punning using any type other than char*. Chris illustrates this with the following example optimisation:

Before:

1float *P;
2void zero_array() {
3  int i;
4  for (i = 0; i < 10000; ++i)
5    P[i] = 0.0f;
6}

After:

1memset(P, 0, 40000);

Part 2, or as Chris calls it, “Why undefined behavior is often a scary and terrible thing for C programmers”:

Reordering different optimizations can produce baffling results when you want to rely on, say, a NULL check, and the compiler decides: “Nah, I’m good, don’t need it”:

Before:

1void contains_null_check(int *P) {
2  int dead = *P;
3  if (P == 0)
4    return;
5  *P = 4;
6}

After Redundant Null Check Elimination (RNCE) and subsequent Dead Code Elimination (DCE):

1void contains_null_check_after_RNCE_and_DCE(int *P) {
2  //int dead = *P; // dead code removed (2)
3  //if (false)     // redundant check removed - pointer is dereferenced (1)
4  //  return;
5  *P = 4;
6}

UB-dependent optimization can allow security exploits due to buffer overflows, like in the following code, where the check is optimised out:

1void process_something(int size) {
2  // Catch integer overflow.
3  if (size > size+1) // UB => "can't happen" => check eliminated
4    abort();
5  char *string = malloc(size+1);

Solution:

1void process_something(int size) {
2  // Catch integer overflow.
3  if (size == MAX_INT)
4    abort();
5  char *string = malloc(size+1);

Some hard-code developers debug optimised code, which often doesn’t make sense. In this case it’s advisable to disable optimisations with -O0 to still be able to debug release builds.

Then there is a worrisome aspect of UB that changing or upgrading compilers can expose new latent bugs because of changing memory layout or different compiler behaviour. Even worse, there is no reliable way to determine if a codebase contains UB. There are tools that can help.

Valgrind (pronounced “Val-grind” as “grinned”, not “Val-grind” as in “find”) memcheck tool:

Valgrind is limited because it is quite slow, it can only find bugs that still exist in the generated machine code (so it can’t find things the optimizer removes), and doesn’t know that the source language is C (so it can’t find shift-out-of-range or signed integer overflow bugs).

Clang has an experimental switch that I didn’t know about: -fcatch-undefined-behavior. It inserts runtime checks for certain types of UB, but slows down execution.

Clang also has the switch -ftrapv which makes signed integer overflows trap at run time.

The Clang Static Analyzer can detect many bugs and is built into Apple Xcode. It is also available as a separate tool.

An experimental project Klee from LLVM can produce a test case for a piece of code.

The C-Semantics tool can detect some UB at run time.

In Part 3 Chris explains why warning about UB at compile time is impossible.

The challenges with this approach are that it is 1) likely to generate far too many warnings to be useful - because these optimizations kick in all the time when there is no bug, 2) it is really tricky to generate these warnings only when people want them, and 3) we have no good way to express (to the user) how a series of optimizations combined to expose the opportunity being optimized.

He presents a hypothetical example UB warning:

“warning: after 3 levels of inlining (potentially across files with Link Time Optimization), some common subexpression elimination, after hoisting this thing out of a loop and proving that these 13 pointers don’t alias, we found a case where you’re doing something undefined. This could either be because there is a bug in your code, or because you have macros and inlining and the invalid code is dynamically unreachable but we can’t prove that it is dead.”

and then says:

Unfortunately, we simply don’t have the internal tracking infrastructure to produce this, and even if we did, the compiler doesn’t have a user interface good enough to express this to the programmer.

Given this sad state of things, Chris suggests to use warning flags -Wall -Wextra as a way to detect more bugs at compile time. But his conclusion is not very uplifting:

Ultimately, the real problem here is that C just isn’t a “safe” language and that (despite its success and popularity) many people do not really understand how the language works.

Infer

Facebook Infer is a static analyzer for C, Objective-C, C++, and Java.

There is a short intro video about Infer:

Infer supports many build systems and can be included in the build process. For C++ it requires that your code compiles with Clang, but will also work with GCC as its front-end. It doesn’t support Windows at this time. Infer is open-source on GitHub under MIT licence and is written in OCaml.

Twitter: elderly programmers

I am doing a project about elderly programmers. If you are a programmer and over 25 please DM
— AKHIL 🪡 hiring C++ engineers (dm) (@fkasummer) April 28, 2022