151. JWST, Dogbolt, May-June ‘22 mailings, developer survey, errors, Unicode, mold

Media

Video

Podcast

Powered by RedCircle

JWST images

The first images from James Webb Space Telescope have arrived, and they are absolutely mind-blowing. I’m mentioning this here not just because this is an amazing achievement and I like science, but also because JWST runs on C++. In the related thread someone commented:

Struggling to see why thats special. So do millions of other things

And I mean, technically yes? But can you imagine being this person, looking at JWST and going, “meh, my microwave is also powered by C++”. Have we lost the sense of awe and wonder?

Dogbolt, the decompiler explorer

Everyone knows Godbolt Compiler Explorer by now. Enter Dogbolt: the decompiler explorer! Dogblt uses several open-source decompilers to try and produce C-like source code for a binary program uploaded by the user (up to 2 MB in size). It’s not a trivial task, and the success/readability rate of these tools varies widely, but some results could actually be useful. The website offers a few example binaries to decompile as a showcase.

Dogbolt is open source and is available on GitHub.

WG21 May and June mailings

The May 2022 Committee mailing is available. The Reddit thread is here.

The June 2022 Committee mailing is also available (I did take my time didn’t I). The Reddit thread is here.

Let’s look at some papers.

N4912: 2022-11 Kona hybrid meeting information

N4912

The 2022 November meeting in Kona will be the first in-person committee meeting since the pandemic started. It will also be the first (?) hybrid meeting, with remote participation via Zoom, which is probably going to be challenging, according to some feedback I heard. Also, COVID is still here, despite what the authorities say, and I hope that the in-person participants will all be vaccinated and boosted, there will be good ventilation in the venue, and face masks will be used sensibly, so that the meeting doesn’t turn into a super-spreader event.

P0543R1: Saturation arithmetic

P0543R1 by Jens Maurer, GitHub

The first version of the paper was written in January 2017!

Signed integer operations have undefined behavior when the result is not representable. <…> In order to implement some algorithms, the use of saturation arithmetic is necessary, where an operation yielding a result whose absolute value is too large instead returns the smallest or largest representable number. For example, when determining the color of a pixel, it would not make sense that brightening a white pixel suddenly turns it black or dark-grey. Instead, brightening a white pixel should simply yield a white pixel.

This paper proposes to add simple free functions for basic saturating operations on all signed and unsigned integer types. Further, a saturate_cast<T> is provided that can convert from any of those types to any other, saturating the value as needed.

Is it an interesting new way to make signed integer overflow a defined operation? The author mentions that a lot of SIMD instruction sets already have special instructions for saturating arithmetics. The paper proposes that the new saturating functions have short names, like add_sat and sub_sat as these are basic low-level operations. Wouldn’t it be better to have a set of special operators for that instead? One can dream…

The paper was accepted by SG6 and forwarded to LEWG.

P2580R0: Tuple protocol for C-style arrays

P2580R0 by Paolo Di Giglio

This paper proposes to make C-style arrays of known size behave like tuples, which should improve their usability in cases where C-style arrays can’t be avoided, like when using C-style interfaces.

P2581R0: Specifying the Interoperability of Binary Module Interface Files

P2581R0 by Daniel Ruoso (Bloomberg)

This paper specifies a mechanism to allow build systems to identify if a binary module interface shipped with a pre-built library can be used directly, or if the build system needs to produce its own version of the binary module interface file.

Binary modules need to have some sort of metadata included, so that a build system can determine if the pre-built binary module interface files are compatible with the currently used toolchain. I can see how this could work in an enterprise setting, where compilers are upgraded across the board but this upgrade doesn’t happen very often, and so projects that depend on other libraries could often re-use pre-built module information files shipped with their internal dependencies.

P2589R0: Static operator[]

P2589R0

This paper proposes to enable operator[] to be static, in line with an existing proposal P1169R4 for static operator().

P2590R1: Explicit lifetime management

P2590R1 by Timur Doumler and Richard Smith.

Since C++20 you can use certain ‘blessed’ standard library functions (malloc, bit_cast, memcpy) to start object lifetime:

1struct X { int a, b; };
2X* make_x() {
3    X* p = (X*)malloc(sizeof(struct X));
4    p->a = 1;
5    p->b = 2;
6    return p;
7}

For memory allocated using any other function, including user-defined allocator (for example, a memory pool), the above code snippet is UB.

This paper proposes a set of library functions that would start object lifetime given arbitrary memory block. This is only being proposed for implicit-lifetime types, like aggregates, as no constructor is actually being called:

1X* make_x() {
2    X* p = std::start_lifetime_as<X>(user_malloc(sizeof(struct X));
3    p->a = 1;
4    p->b = 2;
5    return p;
6}

The proposed functions are std::start_lifetime_as and std::start_lifetime_as_array, although in my humble opinion they could have been called something like std::create or, say, std::bless. Or better yet, std::evolve! From a lowly flat memory buffer into a real actual object! On the other hand, I guess, being explicit about your intent is probably better. Also, naming is hard.

P0447R20 std::hive

P0447R20 by Matthew Bentley addresses quite a few issues raised by the reviewers, including improvements to the technical specification, addition of C++20 Ranges overloads, and API extension and clarification.

P2505R4 Monadic Functions for std::expected

P2505R3 by Jeff Garland proposes to add monadic functions available for std::optional to std::expected. Those are:

  • and_then: compose a chain of functions returning std::expected;
  • or_else: returns if the std::expected has value, or calls a function with the error;
  • transform: apply a function to change value/type;

Additional functions have been proposed:

  • transform_error, which applies a function to change value/type, or calls a function with the error type;
  • error_or, which returns a value when there is no error.

To illustrate, here is a snippet of code before this proposal:

 1// ts: a time string in ISO format
 2// from_iso_str: a function returning an expected of time or an error string
 3// next_day: computes next day date given a date
 4// print_error: prints an error string
 5time_expected d = from_iso_str(ts);
 6if (d)
 7{
 8    ptime t = next_day(*d);
 9}
10else
11{
12    print_error(d.error());
13}

This is what this code would look like with this proposal:

1// chain a series of functions until there's an error
2auto d = from_iso_str(ts)
3    .or_else(print_error)
4    .transform(next_day);

This would make std::expected even more like Rust’s Result type.

The available implementations are:

  • Sy Brand’s tl::expected (GitHub);
  • Jonathan Wakely’s libstdc++ implementation;
  • C++20 implementation by Rishabh Dwivedi (GitHub).

P2608R0 Allow multiple init-statements

P2608R0 by Justin Cooke proposes to allow multiple init-statements wherever an init statement is currently allowed, specifically in for, if, and switch statements. Currently you can only declare more than one variable there if all declared variables are of the same type.

Before (C++20):

1for (int i = 0, j = 0; i < m; i++) { ... }

After:

1for (int k = 0; double s = 0.; k < n; k++) { ... }

Many redditors have a problem with this. They say that it makes the statement change its meaning depending on the number of semicolons in it.

P1689R5 Format for describing dependencies of source files

P1689R5 by Ben Boeckel and Brad King of Kitware describes a format for discovery of source file and module dependencies, to be generated consumed by build systems. The proposed format is JSON, and I just can’t. I wish developers would stop being so obsessed with JSON and trying to use it for anything remotely related to structured text or configuration information (I’m so old I remember when the same thing was happening with XML). I guess it’s OK as an intermediate data-exchange format, but it’s not very human readable, and it doesn’t even support dates (“just use a string in ISO format”) or comments! I would prefer TOML over JSON. Not YAML though! I had my share of working with YAML, didn’t like it one bit. (Significant whitespace, ugh.)

C++ Annual Developer Survey ‘Lite’

The survey closed 7 June.

This is probably the 1st time in several years when I didn’t finish filling a developer survey. The questions were subjective and seemed biased towards a particular understanding of the development world by the survey authors. As an example, when asking about IDEs and compilers, the only choices for usage were ‘Primary, ‘Secondary’, and ‘Occasional’. I often use more that 3 IDEs and they take different priority on different platforms. Another problem was the multiple-checkbox question at the beginning asking where I use C++, with the following choices: at work, at school, and in personal time. It should be clear that these settings may be significantly different to the point that the subsequent questions should be separate for each of the ticked settings, but instead the authors just joined everything together.

It appeared to me that extracting any meaningful results from such a survey would be impossible.

The corresponding Reddit thread starts with this:

I think an important missing question is “How much do you care about ABI stability of C++”. The answer of that should guide many decisions of the standard committee.

Yes, let’s use surveys to guide the Committee, because as we all know (especially my UK colleagues) referendums work really well for making important decisions.

And now the results are in (Reddit thread) and they confirm my fears. The results themselves are in PDF format and oh my word. Clipped text, bars instead of pie charts, mysterious 100% x-scale for all the questions. The data also wasn’t sanitised: just look at the word clouds at pages 33 and 38. Given the questions, I had low expectations, but damn.

How to handle errors

This old chestnut again. A redditor wants to get an idea what people use for error handling in C++ these days.

This redditor says:

For me, it’s exceptions alone until I can see, through the measurement and with a profiler, that they too costly. Then, it’s still exceptions alone except for the parts that show “hot” in the measurement.

I understand that some domains cannot use exceptions, but I rather think they are a few and between, too many people think they are “special” when they are not. HFT people working with exceptions tells me I have leeway.

Their response to the statement “Exceptions break control flow” was:

Early return also breaks control flow and is considered a correct way to do it, so… 🤷‍♂️

Someone said:

my problem with exceptions is less about performance and more about being very anxious of a random exception from a random function that I didn’t think can throw exceptions.

To that the same redditor replied:

Oh. I absolutely don’t worry about that! What you say is quite common and IMNSHO just needs a small bit of understanding:

  • code must be designed with exception safety guarantees in mind; note that early return and goto are similar to exceptions, point being, this kind of thinking is far from specific to exceptions
  • the no-throw guarantees exist for an exceedingly small number of functions and other code artifacts, notably “extern C” calls, non-throwing swap, POD type assignment, things like that, they are easily visible.

Another reply to the same post:

That’s the wrong mindset with exceptions. You should assume every line can throw, and write your code so that the cleanup is done automatically by default (so you have the basic exception guarantee always), and only when it matters for you to have stronger guarantees, you use constructs that you know can’t throw in order to build the guarantees you need at that point.

It should never be a matter of being worried that a random function may throw. You either know for sure or don’t care.

The general vibe of the thread seems to be to just use exceptions and not worry too much about their cost, which is perhaps a bit surprising, given the number of new error handling mechanisms proposed recently, and widely published perceived problems with exceptions. Some posters in the thread state quite correctly that there is no universal error handling solution that is going to suit every need and use case, and in some cases you may want to use std::expected or similar class as a function result; however, the problem with this is that you’ll have to either handle the errors locally or propagate them manually, which exceptions give you automatically.

While waiting for C++23 you can use Sy Brand’s tl::expected which is std::expected with functional extensions. It is available on GitHub, where the author put it in public domain. It’s available via Conan or vcpkg, has nice documentation, works with C++11/14/17 and compiles with GCC, Clang and MSVC.

And another implementation of std::expected was just announced on Reddit. This one also supports monadic extensions (see P2505) but requires C++20.

As one redditor said:

There’s no single solution that fits every case because “what an error is” and “how to handle an error” can have different answers even inside a single application. So you will generally want different approaches at different points, depending on your needs at that point.

So that’s settled then.

inline_try

No, wait, this is what settles it. A redditor created a library called inline_try:

I decided to do a thing and solve this issue once and for all! With inline_try you can turn any exception based function into an expected based function!

The library wraps function calls in try...catch block and returns std::expected, thus reducing exceptions down to mere return codes that you check after each function call.

The funniest thing about this whole thing is that the author clearly meant this as a joke, but the redditors in the thread seem to have completely missed it! As expected (see what I did there?) the commenters descended into the usual discussions of exceptions vs. no-exceptions, herbceptions, how this is similar to Boost.LEAF, and efficiency of the proposed code.

Stay tuned for more error handling discussions, I guess.

Oh, BTW the library is under GPL, so now you can’t wrap exceptions and return expected without open-sourcing your entire program.

STL’s Core C++ videos

An old (C++11, 2016) but still useful 10-lecture course by Stephan T. Lavavej on YouTube.

Unicode in C++ sources

Tom Honermann (the Chair of SG16 Unicode and text processing study group) posted a quiz on Twitter:

Most people, including me, chose the answer “-123” which was wrong! But why?!

See that Hebrew character used for parameter name? It’s called Tav and is pronounced as voiceless /t/. But more importantly, its Unicode bidirectional class is right-to-left, and its mere presence causes nearby characters to be interpreted in the right-to-left order. So the expression

1x - 3'2'1 - ת

is seen by the compiler as

1x - 1'2'3 - ת

And so the correct answer to the quiz is 75. Some text editors, like VSCode, try to mitigate this by inserting a special Unicode character called left-to-right mark after each token. By the way, trying to paste this code snippet and then editing it in VSCode was an exercise in frustration, as the cursor was moving all over the place on the line containing the ’tav’. Tom writes:

SG16 plans to propose allowances for implicit directional marks to appear in conjunction with other whitespace characters in a future C++ standard.

In the meantime, if you value your sanity try not to use non-left-to-right characters in your source code.

Nicolai Josuttis is writing a book on C++20

It’s 95% complete and you can buy it on Leanpub for a suggested price of $44.90 (minimum price $22.90) + VAT. Updates are free so that you’ll be able to download new versions of the book as it is being completed. The table of contents suggests that the book is very detailed and thorough. It looks like a great addition to your C++ library.

Martin Richtarsky, a developer from Germany, wrote a blog post on his Productive C++ blog called Using the mold linker for fun and 3x-8x link time speedups. (Reddit thread is here, and there is also one on HackerNews).

It’s a very interesting article, starting with a quick and very high level introduction to the C++ build process.

Best practices for writing C++ code and a distributed build system can go a long way in reducing compile times. <…> But in this post we want to focus on speeding up the linking step, which comes after building the object files of a library or executable.

One of my current projects suffers from long linking times, so naturally I was interested. One tip I intend to try right away was a linker switch I didn’t know about: -gsplit-dwarf. The author says:

This outsources the debug info from the object file into an adjacent file and therefore reduces the work the linker has to perform

More on this option from Martin in a separate article.

What’s most interesting though is the author’s real-world experience using mold, which is going to be very useful to me really soon (I hope). There is even a solution for using mold with ICC-compiled objects.

The provided benchmarks show marked improvement in link times when using mold.

An interesting related tweet by Rui Ueyama, the creator of mold:

Twitter

Quote

Kevin Farzad @KevinFarzad:

Sure, I made mistakes when I was younger. But now that I’m older I’ve learned how to make different, often far more serious mistakes.

Reddit

Via James O’Gorman @jogbert:

This person wins Reddit for this answer on “How to mock DBs”

I usually start by saying, “Ooo look at me I’m a database! I could be replaced with a text file, but I’m oh so important,” in a really sarcastic way.