152. Carbon

Media

Video

Podcast

Powered by RedCircle

Retrofitting Temporal Memory Safety on C++

As a prelude to the main topic we are going to discuss this article posted on the Google Security Team blog.

The Google Chrome Security Team posted an article on their progress with improving memory safety in Chrome. They talk about temporal safety improvements, which mitigates use-after-free errors, as opposed to spatial safety which is about dealing with out-of-range access like buffer overflows.

The current state-of-the-art ways for ensuring temporal safety are:

  • static analysis - doesn’t see all the errors
  • sanitizers - slow down program execution significantly
  • smart pointers - you should use them
  • Chrome has a garbage collector called Oilpan - changes C++ semantics
  • MiraclePtr ensures deterministic crash on dangling pointer access

The new mitigations discussed in the article are memory quarantine, heap scanning, and memory tagging.

The article explains memory quarantine as follows:

The basic idea is to put explicitly freed memory into quarantine and only make it available when a certain safety condition is reached.

The main idea behind assuring temporal safety with quarantining and heap scanning is to avoid reusing memory until it has been proven that there are no more (dangling) pointers referring to it. To avoid changing C++ user code or its semantics, the memory allocator providing new and delete is intercepted. Upon invoking delete, the memory is actually put in a quarantine, where it is unavailable for being reused for subsequent new calls by the application. At some point a heap scan is triggered which scans the whole heap, much like a garbage collector, to find references to quarantined memory blocks. Blocks that have no incoming references from the regular application memory are transferred back to the allocator where they can be reused for subsequent allocations.

So this is sort of like a GC but not really, as you still have to manage memory manually. There are several versions of these memory scanning algorithms, collectively called StarScan, or *Scan (try googling that!). There is a small problem: when used in Chrome, StarScan slows it down by 12%.

Hardware memory tagging comes to the rescue. It’s a new extension of ARM architecture to help detect spatial (out-of-bounds access) or temporal (use-after-free) memory errors. This is how it works:

Every 16 bytes of memory are assigned a 4-bit tag. Pointers are also assigned a 4-bit tag. The allocator is responsible for returning a pointer with the same tag as the allocated memory. The load and store instructions verify that the pointer and memory tags match. In case the tags of the memory location and the pointer do not match a hardware exception is raised.

Combined with StarScan, memory tagging lowers the performance regression to just 1%.

So, these are the latest memory error mitigations from Google. Pretty ground-breaking. But they seem to be oriented towards fixing memory errors in code that uses new and delete, which are discouraged in modern C++. We’ve had smart pointers since C++11 (auto_ptr doesn’t really count). I’m sure this is a simplistic view of things, and I don’t know all the details, of course, but what if, and bear with me here, Google used smart pointers in their C++ codebase, instead of coming up with elaborate ways to fix C-like C++ code?

Here is a quote from a tweet by Titus Winters:

<…> there’s a load bearing 10BLoC in C++, and it can’t be usefully improved.

As the next item will demonstrate, improving the existing C++ codebase is not an acceptable solution to Google. What they chose to do instead was to create an entire new language.

Carbon

The new experimental language Carbon was unveiled by Chandler Carruth in his CppNorth conference talk. I think it’s a terrible name for a language. Google couldn’t choose a name that you can actually google? Oh wait, I know – it’s a pun in itself, a play on C (the language), which is also the chemical symbol for carbon. It is so clever it overflows and becomes stupid.

Also, apparently:

Carbon, in the context of software, is trademarked by Apple Inc. Seems like a foot bullet right out of the gate.

Apple Carbon is the name of a legacy MacOS application programming interface.

Chandler Carruth

Here are some quotes from Chandler’s talk.

C++ is not doing as well as we would like.

C++ is very unsafe.

So what is the root cause for C++ falling short of its goals?

  • Accumulating decades of technical debt.
  • Prioritizing backwards compatibility.

Ah, I know where this is going: ABI and backward compatibility.

Backwards compatibility prevents us from fixing technical debt. This is not a recipe for success.

C++ evolution process makes improvements even more difficult.

Chandler says that’s why C++ has failed. (What he is really implying is: C++ has failed Google.)

Correction: C++ has failed Google, according to some people at Google. There is no evidence currently that Carbon represents the official direction for Google.

Carbon is an experimental successor to C++. The main design goals for it are:

  • C++ interoperability.
  • Migration support.
  • Language evolution - tool-based upgrades as Carbon improves. Specifically, no backwards compatibility or stable ABI!

Carbon tries to make parsing code simpler, by adding more syntax. For example, these keywords act as introducers:

  • fn introduces functions
  • var introduces variables
  • let introduces constants (hi Scala!)

The if statement is actually an expression, like in Scala.

There are no references, just pointers. Output parameters of functions are pointers with a fancy new syntax, which are non-nullable and non-incrementable. Interestingly the current Google C++ Style Guide requires using non-const pointers, not references, for output parameters. There is also an explicit object parameter for class ‘methods’ (like ‘deducing this in C++’) which can be prefixed with addr to become mutable.

Files define namespaces via packages, and there is no global namespace.

Members are public by default, and can be declared private by using the keyword private before each of them.

There is single inheritance only, and classes are final by default (use base prefix to enable inheriting from the class).

There are generics with definition checking (like in the initial C++0x Concepts proposal), defined using the keyword interface. Pure virtual functions are marked with abstract keyword.

Example interface and usage from the talk:

interface Summary {
  fn Summarize[me: Self]() -> String;
}

fn PrintSummary[T:! Summary](x: T) {
  Console.Print(x.Summarize());
}

class NewsArticle {
  impl as Summary {
    fn Summarize[me: Self]() -> String { ... }
  }
}

This also enables extending classes you don’t own (extension points). There is no ADL in Carbon.

import OtherPackage;

external impl OtherPackage.Tweet as Summary {
  fn Summarize[me: Self]() -> String { ... }
}

fn SummarizeTweet(t: Tweet) {
  PrintSummary(t);
}

To prevent ODR violations, class extensions can only come with the original class or the interface.

I’m disappointed at the missed opportunity to adopt := as assignment operator.

The Carbon designers seem to love syntax. I hope they provide a pronunciation guide for it.

Chandler admitted that C++ is going to be here for a very long time. Therefore the most important point is C++ interoperability. You can import C++ header files (import Cpp library "foo.h";), which synthesizes a matching Carbon AST by using C++ front-end (like Clang), which is then imported into Carbon. The imported C++ types end up in the Cpp package. The other way around is when C++ imports a Carbon package as a header file (#include "foo.carbon.h") and treats it as a namespace (Foo). This is based on the old Clang modules, not the new C++ modules. It also reminds me of how Swift and Objective-C interop works. Come to think of it, how will this work with C++20 Modules, you know, the proper ones?

Finally, Chandler explained what the Carbon language process is going to be. It’s NOT ISO, but a “community” process, including GitHub, Discord and other modern collaboration tools (similar to Swift, I’m guessing). The licence is LLVM. The language evolution process with be based on GitHub pull requests. For bigger decisions there is a governance process headed by Chandler Carruth, Kate Gregory, and Richard Smith.

Batteries included: everything is provided out of the box: tools, ecosystem, package manager (eventually) etc. The build system is Bazel. I liked this quote by Oliver Smith @kfsone:

  • Looks at Carbon
  • Sees “Bazel”
  • Deletes search history
  • Throws computer out of the window

The slide with the current contributors to Carbon had some notable names in it:

  • Kate Gregory
  • David Sankel
  • Chandler Carruth
  • Daisy Hollman
  • Hana Dusíková
  • Matt Godbolt
  • Richard Smith

Companies and organisations involved so far are Google, Adobe, Indiana University.

There were some live examples of Carbon using VSCode as an IDE. Also, Compiler Explorer supports Carbon already.

In the Q&A session the following statements were made:

  • There is no metaprogramming or reflection story for Carbon at the moment.
  • There will be an independent foundation for developing Carbon, but at the moment it’s governed by Google CLA (Contributor License Agreement) – which is a showstopper for many people.
  • Inspiration was drawn from Rust, Zig, Haskell, Kotlin (members public by default), Swift (generics).
  • Carbon will have no fixed library ABI. When you build a Carbon program, you build everything. (Just like Google does now). There will be a way to define a stable ABI on the boundary (“a rare case!” – Richard Smith).
  • Source compatibility will be handled by tooling. (“A really easy way of migration” – Richard Smith). There may be some amount of versioning, or epochs. I get worried when I hear the words “really easy” in a context like this.
  • Build system will not necessarily be Bazel (phew!)
  • For C++ interop, C++17 is the baseline.
  • There will not be support for old hardware architectures.
  • Governance: there will be rotations of the three “benevolent dictators”. The goal is to end up with a “bench of people who have been a lead” – Chandler Carruth. A layered consensus model, with tiers of escalation. Like small groups. This sounds like a committee, with working groups.
  • Carbon will be willing to break old code. This is actually pretty inconvenient, as I experienced with Swift until it finally reached source stability. The migration tools sucked. Source stability: GOOD.
  • It’s unclear what the change rate will be, and the users will be able to influence the progress. How about 3 years?
  • As little undefined behaviour as possible, but there will be C++ boundary with potential UB. Richard Smith: “if there is UB, we will provide you with the ability to check for it”. Ooh, I’d like to see how they do that. Previously, by Chris Lattner

Interestingly, exceptions were not mentioned in the talk at all, and nobody asked about them during Q&A. This confirms my suspicion that Google positions Carbon as a successor to the Google’s flavour of C++, where exceptions are already prohibited. I’ll be happy to be proven wrong – after all, it’s early stages for Carbon, but Chandler could have at least mentioned such an important feature of C++, given Carbon’s focus on interop.

Come to think of it, there wasn’t any mention of Carbon error handling at all…

Peter Dimov tweets, presumably quoting Carbon docs:

“For now, Carbon does not have language features dedicated to error handling, but we would consider adding some in the future.”

Viktor Zverovich replies:

Carbon developers simply write code that always succeeds.

More from Viktor:

BREAKING: The C++ committee threatens to add new features faster than Carbon is able to support them.

Doug Gregor subtweets:

I don’t think any programming language unveiled in 2022 should lack memory safety. We have to move on from the “it must be as fast as unsafe C” mindset, because the engineering cost of unsafe-by-default is so very high. Nice syntax and whiz-bang features don’t make up for it.

It takes an enormous amount of effort to bring a new language into the world and make it useful, to port code, reimplement core libraries. If you aren’t getting safety out of it, why incur that cost? Is the end result actually better, or just more pretty?

I know a thing or two about programming language design and tradeoffs. I’m saying that not making a new language memory safe by default is, to me, a critical error that you cannot recover from once you have users.

To remind you, Doug Gregor helped create Swift.

Ville Voutilainen, regarding Carbon generic/concepts/interfaces:

I’ll be really interested to see how you write generic duck-typed glue functions that take two arbitrary types without opting in to either. That is, if that’s possible.

And the puns, the puns!

Viktor:

If you write spaghetti code in Carbon do you get Carbonara?

Marius Băncilă replies:

If you copy-paste code in Carbon do you get Carbon copies?

On a more serious note, here are some quotes from Reddit, another Reddit, and HackerNews.

Jonathan Müller:

To give some context, in February of 2020 there was a crucial vote in the C++ standard committee about breaking ABI compatibility in favor of performance, mostly pushed by Google employees.

The vote failed. Consequently, many Googlers have stopped participating in the standardization of C++, resigned from their official roles in the committee, and development of clang has considerably slowed down.

Now, they’ve revealed that they’ve been working on a successor language to C++. This is really something that should be taken seriously.

A reply to that goes:

Google is one of, if not the worst maintainer of languages there is. Their methodology is exactly what you see here. “Our way or the highway.”

Their documentation is snarky, where they insist some hacky way of doing something is the RIGHT way to do it. It is always written in a condescending manner.

Their developer resources are insulated from critique and criticism, where they are in charge, and if you disagree, too bad.

Let them do their thing, and take their toys and play in their sandbox at home away from anybody. They won’t have to share, but they’ll get bored with it and kill it in two years, anyhow.

This one though:

<…> coming up with a new language after losing a vote about a standardized language is a bit like an angry child throwing a tantrum transposed to the giant tech company world.

A reply:

This sentiment is blatantly uncharitable. If anything it’s difficult to understate the importance of C++ at Google. They didn’t spend billions upon billions writing hundreds of millions of lines of C++, only to throw it all away over a few rejected proposals. Indeed, it’s one thing to lose a vote on a proposal, but it’s another thing to lose faith in the standardization process as Google has.

A very long and very detailed comment that I’m not citing here, but definitely go and read it.

Jonathan Müller responds on Twitter:

To countless people in the reddit comment section: No, somebody implementing their own language after they’ve failed to change an existing one, isn’t throwing a tantrum.

Programming languages are tools. If they don’t work for you, switch. If no alternative exists, invent one.

My take on it: one outcome of the Carbon project (which is already happening) is that some very notable people who used to work on advancing C++ are now going to go and work on Carbon, and whether or not it succeeds and reaches its objectives, C++ may lose. On the other hand, what if the people who weren’t particularly interested in advancing C++ (or became disillusioned by the ISO process and the Committee) leave to work on Carbon and by doing that indirectly improve C++ process? I guess we’ll see fairly soon how this plays out.

The fact that Google is behind Carbon also suggests that there is a non-zero possibility the entire project will be abandoned in a few years (see Killed By Google).

Enough time has now passed since the initial announcement that hot takes and reaction videos started to appear.

TechRadar posted an article titled Google thinks its new programming language can topple C++. So did 9TO5Google.

Watch this video by Code Report (Conor Hoekstra), called Carbon Language: The C++ Killer?, which is very good:

A first impressions video from the creator of Odin programming language (no, I haven’t heard of that one either before):

A recent Reddit post titled What is ABI and why did Google create their own language for it? has many links on ABI, including articles and videos.

Bryce Lelbach and Conor Hoekstra

Bryce Lelbach and Conor Hoekstra talked about Carbon on their ADSP podcast:

  • Bryce: “Carbon evolution process is drastically (10x) better than the C++ ISO evolution process which is rather dated”. Bryce is much more optimistic about the future of Carbon and Rust because of that than the future of C++. (Reminder: this is the person that wants to be in charge of C++ evolution.) He said he hopes that the C++ process can be “fixed” in the future to not be tied to ISO.
  • Conor: “Why not just make C++ better?” Bryce: “People who created Carbon spent a decade trying to make C++ better, and it didn’t work out.” (For them? It’s a bit of a weird statement, to be honest. Is he saying there was no progress in C++ for the last 10 years?) He names other people - Doug Gregor, Dave Abrahams - who “tried to make C++ better” but then departed to make other languages, notably Swift. (Dave Abrahams is back to C++, but Bryce says he is not involved in the Committee).
  • Bryce: “We should be willing to breaking changes to fix mistakes” – and not only because of performance considerations.
  • Conor talked about fixing C++ defaults – all those constexpr and [[nodiscard]] we have to add because they weren’t available when the defaults were decided. Bryce said that Carbon doesn’t have to get the defaults right, because they will be able to fix them later. So when you are looking at Carbon code you have to know its ‘version’ or ‘revision’ in order to understand what is implied, because the defaults will change between versions. Will there be a way to specify which version of Carbon is used in a particular file? How will the upgrade tooling deal with it? Will it be possible to combine Carbon versions in a single program?

Looks like Bryce is super enthusiastic about Carbon and only time will tell how this will affect his involvement with C++.

Bryce refers to the CppNorth keynote by Sean Parent titled The Tragedy of C++. There is no video available yet, but I’m going to watch it as soon as it drops.

What makes a tragedy a tragedy? It requires the protagonist to be successful, or the story is simply tragic and not a tragedy. C++ is successful. In Act One, we explored the reasons why C++ is successful in introducing our character.

In Act Two, we looked at how some of the very strengths of C++ can also be very damaging. The standards process is a crucial example. It is a key reason for C++’s success and its ever-increasing complexity and size. Another area is performance, where the dogmatic pursuit of performance in the small is preventing C++ from unlocking the hardware’s full potential. At the end of Act Two, the contradiction is hopefully apparent to the audience and the protagonist. The very traits that have made C++ successful may also bring its demise.

Jonathan Müller

Jonathan Müller @foonathan wrote an article on Carbon’s function parameter passing titled Carbon’s most exciting feature is its calling convention.

By default, i.e. if you don’t write anything else, Carbon parameters are passed by the equivalent of a const T& in C++. <…> However – and this is the import part – the compiler is allowed to convert that to a T under the as-if rule.

Jonathan claims that the Carbon compiler will do the right thing so that you don’t have to think about it, selecting the best way to pass a concrete type. C++ can also optimise away const references and replace them with values, but apparently this creates a copy, whereas in Carbon it doesn’t? I’m not sure how this works with primitive types though. He says that in Carbon the value is simply set into a register. For primitive types this is a copy, isn’t it?

Jonathan says that C++ optimisation is limited because function signature is part of ABI, but in Carbon we don’t care about that of course.

Another advantage that Jonathan is especially excited about is that in Carbon you can’t get address of a parameter unless it’s specially annotated. When you can’t take a parameter address, the compiler can avoid escape analysis, i.e. parameters can’t be indirectly unexpectedly modified in the function.

Jonathan really dislikes C++ references. In the Reddit thread he said:

References are misfeatures, the language is better off without them. For example, const T& parameters are unnecessary, it’s the default.

He clarifies further:

The long version: https://www.jonathanmueller.dev/talk/cppnow2018/ TL;DW: References were added to C++ to allow operator overloading that return lvalues like indexing. Given that by design they act like aliases to other objects, they are problematic to generic code (see optional<T&>) and heavily complicate the type system. There is a lot of complexity in C++ solely because of reference types. If references did not exist, and the few use cases where they’re really useful replaced with different features (returning lvalues from functions, parameter passing, lifetime extension, range for), C++ would be a lot simpler.

In the same reply he also said of Carbon “concepts”:

The language has concepts, and the far superior C++0x version, not the stripped down sugar for SFINAE we’ve gotten in 20.

Wow, we’re witnessing a lot of venting about C++, aren’t we.

Tristan Brindle

Tristan’s Twitter thread is insightful:

I think this could actually be good for C++: Carbon can move fast, innovate and make mistakes that can be corrected. The successful ideas can be “back-ported” to C++ in a compatible way without the pressure of having to get it right first time.

As to the language itself: signature-based checked generics are clearly the way forward.

(As an aside, it’s interesting that Swift, Rust and now Carbon have all converged on a generics model very close to C++0x concepts – a validation of @dgregor79’s ideas. What we could have had…)

The memory safety story (“we’ll figure it out later”) is a concern.

Likewise error handling: it seems like exceptions will be necessary for C++ interop, but will they be idiomatic in Carbon? Or will it be monadic error types like other modern languages? Or something similar to “Herbceptions”?

Syntax quibbles: mandatory semicolons seem dreadfully old-fashioned in a language designed in 2022.

There is an unofficial Carbon subreddit, go check it out.

Recruiters don’t waste any time. Not sure if this is real, but this tweet shows a photo of a job posting which requires “10 years of Carbon experience - no exceptions and C++ doesn’t count”.

Amir Kirsh

Sooo. Exciting times for C++. See this brilliant tweet by Amir Kirsh: