Overload 180
Overload 180
In Context
Herb Sutter discusses C++’s
current security problems
and potential solutions
Judgment Day
Teedy Dee finds out what happens if AI takes your job
accu.org
OVERLOAD CONTENTS
April 2024
ISSN 1354-3172 Overload is a publication of the ACCU
Editor
Frances Buontempo For details of the ACCU, our publications
and activities, visit the ACCU website:
[email protected]
Advisors
Paul Bennett
[email protected]
www.accu.org
Matthew Dodkins
[email protected]
4 C++ Safety, In Context
Paul Floyd
[email protected]
Herb Sutter discusses C++’s current security
problems and potential solutions.
Jason Hearne-McGuiness
[email protected]
Mikael Kilpeläinen 14 To See a World in a Grain of Sand
[email protected] Jez Higgins shows how to refactor code that
Steve Love has grown organically, making it clearer and
[email protected] more concise.
Christian Meyenburg
[email protected]
Chris Oldwood
20 User-Defined Formatting in std::format
[email protected] Spencer Collyer demonstrates how to provide
Roger Orr formatting for a simple user-defined class.
[email protected]
Balog Pal 27 Judgment Day
[email protected] Teedy Dee finds out what
Honey Sukesan happens if AI takes your job.
[email protected]
Jonathan Wakely
[email protected]
Anthony Williams
[email protected]
Advertising enquiries
[email protected]
Cover design
Original design by Pete Goodliffe
[email protected]
Cover photo by Daniel James, of
a double row of the ‘colours’ of Copy deadlines
the Royal Tank Regiment that can All articles intended for publication in Overload 181 should be submitted by
be seen in the church of St Mary 1st May 2024 and those for Overload 182 by 1st July 2024.
Aldermary.
Copyrights and trademarks
Some articles and other contributions use terms that are either registered trade marks
ACCU or claimed as such. The use of such terms is not intended to support nor disparage any
ACCU is an organisation of trade mark claim. On request, we will withdraw all references to a specific trade mark
programmers who care about and its owner.
professionalism in programming. We By default, the copyright of all material published by ACCU is the exclusive property
care about writing good code, and of the author. By submitting material to ACCU for publication, an author is, by default,
about writing it in a good way. We are assumed to have granted ACCU the right to publish and republish that material in any
dedicated to raising the standard of medium as they see fit. An author of an article or column (not a letter or a review of
programming. software or a book) may explicitly offer single (first serial) publication rights and thereby
Many of the articles in this magazine retain all other rights.
have been written by ACCU members – Except for licences granted to 1) corporate members to copy solely for internal
by programmers, for programmers – and distribution 2) members to copy source code for use on their own computers, no material
all have been contributed free of charge. can be copied from Overload without written permission from the copyright holder.
I recently spoke at CppOnline [CppOnline], a new number and we end up with unbelievable nonsense. Maybe you already
online-only conference. It was loads of fun, though it know about infinite series and analytic continuations [Wikipedia], which
always feels odd talking to your monitor and hoping allow us to extend the domain of functions. They are not to be confused
someone is listening. We were advised to close with algebraic continuations which allow us to continue execution using
unnecessary applications and browser tabs down to futures and similar, and might mean I end up with more tabs open again
ensure smooth performance of our machines while were I to try to explain in detail. The take away message is that reasoning
we spoke. You may find this hard to believe, but I spent about four hours is often caveated with prerequisites; for example, a radius of convergence
closing browser tabs, taking up time I could have otherwise spent on an for a series. Applying similar logic in different circumstances may lead
editorial. I currently have 62 open; a grand improvement on the 99 or to surprises or mistakes. If something seems unbelievable, like adding
more before the conference. No editorial though, sorry. positive numbers and getting a negative answer, an assumption you are
making might be wrong.
If you’re not a tab hoarder you might find spending so much time closing
tabs very strange, but I know I am not the only person who does this. A relevant computing example concerns benchmarking. A long time ago,
I could just bookmark pages, but I gave up on bookmarks years ago, Roger Orr wrote an article entitled ‘Order notation in practice’, based
because links went stale and I had so many I couldn’t find anything. on his talk at an ACCU conference [Orr14]. He demonstrated various
If I have a tab open, it’s usually something I do want to read or listen factors which also influence the performance of an algorithm besides
to at some point, and then maybe make notes or buy music or similar. its complexity measure. He discussed strlen, and discovered many
One tab I closed was for a new turntable, because our old one seemed compilers had optimised away the call, so the theory didn’t match the
to have stopped working. I bit the bullet and bought the new turntable. practice. Trying to build up an intuition about possible outcomes, so
It’s excellent and in the process of setting it up, I discovered why the you spot when something is amiss, is an important skill, so well spotted
old turntable didn’t work. The pre-amp was unplugged. The new bit of Roger. Kevin Carpenter talked about building intuition at MeetingCpp
kit does have a USB port though, so I can record all my old records one [Carpenter23], and discussed making educated guesses, which may or
day. Closing that tab was expensive, informative and has probably caused may not be true. I couldn’t attend his talk, because it clashed with mine,
another time consuming job. so I had a tab open to listen at some point. Fortunately, I managed to catch
his re-working of the talk live at CppOnline and even ask a question. So,
Another tab was The Return of -1/12 by Numberphile on YouTube
I closed another tab.
[Numberphile]. They discussed infinite series. As many of you know, 1 +
½ + ¼ + … equals 2. We can prove this, since writing Our intuition can be wrong, but we need to start somewhere. Lots of
interesting mathematics falls out of proving a first guess is incorrect, or
S = 1+ 1
2 + 1
4 + 1
8 + ...
finding circumstances under which the ordinary does not happen, leaving
means us with something extraordinary. And wondering what-if can be fruitful.
Whether that’s imagining a square root of -1, or exploring what is possible
2S = 2 + 1 + 1
2 + 1
4 + 1
8 + ...
at compile time, new disciplines emerge. However, sometimes wondering
which tells us when we subtract both we get 2S - S = S = 2. QED. That why we have 5 test cases for a function with 7 if/else branches leads
doesn’t seem unreasonable. However, if we were now to try writing 1 + us to deduce we can delete the extra branches. The tests may still pass,
10 + 100 + … we get into trouble. Writing however there’s a chance someone forgot to add more tests when they
added more code. Mutation testing might well pick this kind of thing up.
S=1+10+100+…
If you’re not familiar with this, at a high level it randomly mutates the
would mean we could have code, dropping branches, changing + to – and similar, and reports back
if any tests still pass. Filip von Laenen wrote an article about mutation
10S=10+100+1000+…
testing for us back in 2012 [vonLaenen12] if you want to know more. He
so we would then be claiming 10S - S= 9S = -1. I’m not sure about you, did say at the time he wasn’t a C++ programmer so could only give details
but this suggests the sum, S, is -1/9, which seems very unlikely. Of course, on other languages and mention a couple of frameworks in C++ he was
there is a restriction on the terms of the infinite aware of. Perhaps the time has come for someone to write a new article
sum. The terms need to decrease by enough telling us about current tools?
so that we can actually write the equals sign,
Tests for branches in code came to mind because Jez Higgins recently
otherwise the sum doesn’t converge on a
tooted [Higgins24a] about some flappy code he refactored, which had
Frances Buontempo has a BA in Maths + Philosophy, an MSc in Pure Maths and a PhD using AI and
data mining. She’s written a book about machine learning: Genetic Algortithms and Machine Learning
for Programmers. She has been a programmer since the 90s, and learnt to program by reading the
manual for her Dad’s BBC model B machine. She can be contacted at [email protected].
W
e must make our software infrastructure more secure against
the rise in cyberattacks (such as on power grids, hospitals, and
Some background
banks), and safer against accidental failures with the increased Scope. To talk about C++’s current safety problems and solutions
use of software in life-critical systems (such as autonomous vehicles and well, I need to include the context of the broad landscape of security
autonomous weapons). and safety threats facing all software. I chair the ISO C++ standards
committee and I work for Microsoft, but these are my personal
The past two years in particular have seen extra attention on programming
opinions and I hope they will invite more dialog across programming
language safety as a way to help build more-secure and -safe software; on
language and security communities.
the real benefits of memory-safe languages (MSLs); and that C and C++
language safety needs to improve – I agree. Acknowledgments. Many thanks to people from the C, C++, C#,
Python, Rust, MITRE, and other language and security communities
But there have been misconceptions, too, including focusing too narrowly
whose feedback on drafts of this material has been invaluable,
on programming language safety as our industry’s primary security and
including: Jean-François Bastien, Joe Bialek, Andrew Lilley Brinker,
safety problem – it isn’t. Many of the most damaging recent security
Jonathan Caves, Gabriel Dos Reis, Daniel Frampton, Tanveer Gani,
breaches happened to code written in MSLs (e.g., Log4j [CISA-1]) or
Daniel Griffing, Russell Hadley, Mark Hall, Tom Honermann, Michael
had nothing to do with programming languages (e.g., Kubernetes Secrets
Howard, Marian Luparu, Ulzii Luvsanbat, Rico Mariani, Chris McKinsey,
stored on public GitHub repos [Kadkoda23]).
Bogdan Mihalcea, Roger Orr, Robert Seacord, Bjarne Stroustrup,
In that context, I’ll focus on C++ and try to: Mads Torgersen, Guido van Rossum, Roy Williams, Michael Wong.
highlight what needs attention (what C++’s problem is), and how Terminology. (See ISO/IEC 23643:2020 [ISO]). Software security
we can get there by building on solutions already underway; (or cybersecurity or similar) means making software able to protect
its assets from a malicious attacker. Software safety (or life safety
address some common misconceptions (what C++’s problem isn’t),
or similar) means making software free from unacceptable risk of
including practical considerations of MSLs; and
causing unintended harm to humans, property, or the environment.
leave a call to action for programmers using all languages. Programming language safety means a language’s (including its
standard libraries’) static and dynamic guarantees, including but not
tl;dr: I don’t want C++ to limit what I can express efficiently. I just want limited to type and memory safety, which helps us make our software
C++ to let me enforce our already-well-known safety rules and best
both more secure and more safe. When I say safety unqualified here,
practices by default, and make me opt out explicitly if that’s what I
want. Then I can still use fully modern C++… just nicer.
I mean programming language safety, which benefits both software
security and software safety.
Let’s dig in.
pointers); it’s not just about linked lists [Rust-2] but those are a simple safety rules carries a cost; worse, not all code can be easily updated to
well-known illustrative example. conform to safety rules (e.g., it’s old and not understood, it belongs to a
third party that won’t allow updates, it belongs to a shared project that
If we can get a 98% improvement and still have fully compatible interop
won’t take upstream changes and can’t easily be forked). That’s why above
with existing C++, that would be a holy grail worth serious investment.
(and in the Appendix) I stress that C++ should seriously try to deliver as
many of the safety improvements as practical without requiring manual
A 98% reduction source code changes, notably by automatically making existing code
A 98% reduction across those four categories is achievable in new/ do the right thing when that is clear (e.g., the bounds checks mentioned
updated C++ code, and partially in existing code above, or emitting static_cast pointer downcasts as effectively
Since at least 2014, Bjarne Stroustrup has advocated addressing safety in dynamic_cast without requiring the code to be changed), and by
C++ via a ‘subset of a superset’: That is, first ‘superset’ to add essential offering automated fixits that the programmer can choose to apply (e.g.,
items not available in C++14, then ‘subset’ to exclude the unsafe to change the source for static_cast pointer downcasts to actually say
constructs that now all have replacements. dynamic_cast). Even though in many cases a programmer will need
to thoughtfully update code to replace inherently unsafe constructs that
As of C++20, I believe we have achieved the ‘superset’, notably by can’t be automatically fixed, I believe for some percentage of cases we
standardizing span, string_view, concepts, and bounds-aware can deliver safety improvements by just recompiling existing code in the
ranges. We may still want a handful more features, such as a null- safety-rules-by-default mode, and we should try because it’s essential to
terminated zstring_view, but the major additions already exist. maximizing safety profiles’ adoptability and impact.
Now we should ‘subset’: Enable C++ programmers to enforce best
practices around type and memory safety, by default, in new code What the problem “isn’t”:
and code they can update to confirm to the subset. Enabling safety Some common misconceptions
rules by default would not limit the language’s power but would require
explicit opt-outs for non-standard practices, thereby reducing inadvertent (1) The problem “isn’t” defining what we mean by “C++’s most
risks. And it could be evolved over time, which is important because C++ urgent language safety problem.” We know the four kinds of
is a living language and adversaries will keep changing their attacks. safety that most urgently need to be improved: type, bounds,
ISO C++ evolution is already pursuing Safety Profiles for C++ initialization, and lifetime safety.
[Stroustrup23]. The suggestions in the Appendix are refinements to We know these four are the low-hanging fruit (see ‘The immediate
that, to demonstrate specific enforcements and to try to maximize their problem “is”…’ on page 4). It’s true that these are just four of perhaps
adoptability and useful impact. For example, everyone agrees that many two dozen kinds of ‘safety’ categories, including ones like safe integer
safety bugs will require code changes to fix. However, how many safety arithmetic. But:
bugs could be fixed without manual source code changes, so that just Most of the others are either much smaller sources of problems, or
recompiling existing code with safety profiles enabled delivers some are primarily important because they contribute to those four main
safety benefits? For example, we could by default inject a call-site bounds categories. For example, the integer overflows we care most about
check 0 <= b < a.size() on every subscript expression a[b] when are indexes and sizes, which fall under bounds safety.
a.size() exists and a is a contiguous container, without requiring any
source code changes and without upgrading to a new internally bounds- Most MSLs don’t address making these safe by default either,
checked container library; that checking would Just Work out of the typically due to the checking cost. But all languages (including
box with every contiguous C++ standard container, span, string_ C++) usually have libraries and tools to address them. For example,
view, and third-party custom container with no library updates needed
Microsoft ships a SafeInt library for C++ to handle integer overflows
(including therefore also no concern about ABI breakage). [Microsoft-1], which is opt-in. C# has a checked arithmetic language
feature [Microsoft-2] to handle integer overflows, which is opt-in.
Rules like those summarized in the Appendix would have prevented Python’s built-in integers are overflow-safe by default because they
(at compile time, test time or run time) most of the past CVEs I’ve automatically expand; however, the popular NumPy fixed-size
reviewed in the type, bounds, and initialization categories, and integer types do not check for overflow by default and require using
would have prevented many of the lifetime CVEs. I estimate a roughly checked functions, which is opt-in.
98% reduction in those categories is achievable in a well-defined and
standardized way for C++ to enable safety rules by default while still Thread safety is obviously important too, and I’m not ignoring it. I’m
retaining perfect backward link compatibility. See the Appendix on page just pointing out that it is not one of the top target buckets: Most of the
9 for a more detailed description. MSLs that NIST/NSA/CISA/etc. recommend over C++ (except uniquely
Rust, and to a lesser extent Python) address thread safety impact on user
We can and should emphasize adoptability and benefit also for C++ data corruption about as well as C++. The main improvement MSLs
code that cannot easily be changed. Any code change to conform to give is that a program data race will not corrupt the language’s own
6 | Overload | April 2024
Herb Sutter Feature
virtual machine (whereas, in C++, a data race is currently all-bets-are-off [Rust-4], and related tools like fuzzers. Sanitizers are known to be
undefined behavior). Some languages do give some additional protection, still needed as a complement to language safety, and not only for
such as that Python guarantees two racing threads cannot see a torn write when programmers use ‘unsafe’ code; furthermore, they go beyond
of an integer and reduces other possible interleavings because of the finding memory safety issues. The uses of Rust at scale that I know
global interpreter lock (GIL). of also enforce use of sanitizers. So using sanitizers can’t be an
indicator that a language is unsafe — we should use the supported
(2) The problem “isn’t” that C++ code sanitizers for code written in any language.
is not formally provably safe
Note: “Use your sanitizers” does not mean to use all of them all
Yes, C++ code makes it too easy to write silently-unsafe code by default the time. Some sanitizers conflict with each other, so you can only
(see ‘The immediate problem “is”…’ on page 4). use those one at a time. Some sanitizers are expensive, so they
But I’ve seen some people claim we need to require languages to be should only be run periodically. Some sanitizers should not be run in
production, including because their presence can create new security
formally provably safe, and that would be a bridge too far. Much to the
vulnerabilities.
chagrin of CS theorists, mainstream commercial programming languages
aren’t formally provably safe. Consider some examples:
None of the widely-used languages we view as MSLs (except (3) The problem “isn’t” that moving the world’s C
uniquely Rust) claim to be thread-safe and race-free by construction,
and C++ code to memory-safe languages (MSLs)
as covered in the previous section. Yet we still call C#, Go,
would eliminate 70% of security vulnerabilities
MSLs are wonderful! They just aren’t a silver bullet.
Java, Python, and similar languages “safe”. Therefore, formally
guaranteeing thread safety properties can’t be a requirement to be An oft-quoted number [Gaynor20] is that “70%” of programming
considered a sufficiently safe language. language-caused CVEs (reported security vulnerabilities) in C and
C++ code are due to language safety problems. That number is true and
That’s because a language’s choice of safety guarantees is a tradeoff:
repeatable, but has been badly misinterpreted in the press: No security
For example, in Rust, safe code uses tree-based dynamic data
expert I know believes that if we could wave a magic wand and instantly
structures only. This feature lets Rust deliver stronger thread safety
transform all the world’s code to MSLs, that we’d have 70% fewer CVEs,
guarantees than other safe languages, because it can more easily
data breaches, and ransomware attacks. (For example, see this February
reason about and control aliasing. However, this same feature also
2024 example analysis paper [Hanley24].)
requires Rust programs to use unsafe code more often to represent
common data structures that do not require unsafe code to represent Consider some reasons.
in other MSLs such as C# or Java, and so 30% to 50% of Rust crates
use unsafe code [Wang22], compared for example to 25% of Java
That 70% is of the subset of security CVEs that can be addressed by
programming language safety. See figure 1 again: Most of 2023’s
libraries [Mastrangelo15].
top 10 “most dangerous software weaknesses” were not related to
C#, Java, and other MSLs still have use-before-initialized and memory safety. Many of 2023’s largest data breaches and other
use-after-destroyed type safety problems too: They guarantee not cyberattacks and cybercrime had nothing to do with programming
accessing memory outside its allocated lifetime, but object lifetime languages at all. In 2023, attackers reduced their use of malware
is a subset of memory lifetime (objects are constructed after, and because software is getting hardened and endpoint protection is
destroyed/disposed before, the raw memory is allocated and effective (CRN) [Alspach23], and attackers go after the slowest
deallocated; before construction and after dispose, the memory is animal in the herd. Most of the issues listed in NISTIR-8397
allocated but contains “raw bits” that likely don’t represent a valid [Black21] affect all languages equally, as they go beyond memory
object of its type). If you doubt, please run (don’t walk) and ask safety (e.g., Log4j [CISA-1]) or even programming languages (e.g.,
ChatGPT about Java and C# problems with: access-unconstructed- automated testing, hardcoded secrets, enabling OS protections,
object bugs (e.g., in those languages, any virtual call in a constructor string/SQL injections, software bills of materials). For more detail,
is “deep” and executes in a derived object before the derived see the Microsoft response to NISTIR-8397 [Microsoft-3], for
object’s state is initialized); use-after-dispose bugs; “resurrection” which I was the editor. (More on this in the ‘Call to Action’, below.)
bugs; and why those languages tell people never to use their
finalizers. Yet these are great languages and we rightly consider
MSLs get CVEs too, though definitely fewer (again, e.g., Log4j).
For example, see MITRE list of Rust CVEs, including six so far in
them safe languages. Therefore, formally guaranteeing no-use-
2024 [MITRE-2]. And all programs use unsafe code; for example,
before-initialized and no-use-after-dispose can’t be a requirement
see the ‘Conclusions’ section of Firouzi et al.’s study of uses of
to be considered a sufficiently safe language.
C#’s unsafe on StackOverflow [Firouzi20] and prevalence of
Rust, Go, and other languages support sanitizers too [Rust-3], vulnerabilities, and that all programs eventually call trusted native
including ThreadSanitizer and undefined behavior sanitizers libraries or operating system code.
April 2024 | Overload | 7
Feature Herb Sutter
Saying the quiet part out loud: CVEs are known to be an imprecise To address all these points, I think we need the C++ standard to specify
metric. We use it because it’s the metric we have, at least for security a mode of well-agreed and low-or-zero-false-positive deterministic rules
vulnerabilities, but we should use it with care. This may surprise that are sufficiently low-cost to implement in-the-box at build time.
you, as it did me, because we hear a lot about CVEs. But whenever
I’ve suggested improvements for C++ and measuring “success” via Call(s) to action
a reduction in CVEs (including in this essay), security experts insist As an industry generally, we must make a major improvement in
to me that CVEs aren’t a great metric to use… including the same programming language memory safety – and we will.
experts who had previously quoted the 70% CVE number to me. —
Reasons why CVEs aren’t a great metric include that CVEs are self-
reported and often self-selected, and not all are equally exploitable;
but there can be pressure to report a bug as a vulnerability even if
there’s no reasonable exploit because of the benefits of getting one’s
name on a CVE. In August 2023, the Python Software Foundation
became a CVE Numbering Authority (CNA) for Python and pip
distributions [MITRE-3], and now has more control over Python
and pip CVEs. The C++ community has not done so.
CVEs target only software security vulnerabilities (cyberattacks
and intrusions), and we also need to consider software safety (life-
critical systems and unintended harm to humans).
(4) The problem “isn’t” that C++ programmers aren’t trying hard
enough/using the existing tools well enough. The challenge is
making it easier to enable them.
Today, the mitigations and tools we do have for C++ code are an uneven
mix, and all are off-by-default:
Kind. They are a mix of static tools, dynamic tools, compiler
switches, libraries, and language features.
Acquisition. They are acquired in a mix of ways: in-the-box in the
C++ compiler, optional downloads, third-party products, and some
you need to google around to discover.
Accuracy. Existing rulesets mix rules with low and high false
positives. The latter are effectively unadoptable by programmers,
and their presence makes it difficult to ‘just adopt this whole set of
rules’.
Determinism. Some rules, such as ones that rely on interprocedural
analysis of full call trees, are inherently nondeterministic (because
an implementation gives up when fully evaluating a case exceeds
the space and time available; a.k.a. ‘best effort’ analysis). This
means that two implementations of the identical rule can give
different answers for identical code (and therefore nondeterministic
rules are also not portable, see below).
Efficiency. Existing rulesets mix rules with low and high (and
sometimes impossible) cost to diagnose. The rules that are not
efficient enough to implement in the compiler will always be
relegated to optional standalone tools.
Portability. Not all rules are supported by all vendors. ‘Conforms
to ISO/IEC 14882 (Standard C++)’ is the only thing every C++ tool
vendor supports portably.
8 | Overload | April 2024
Herb Sutter Feature
In C++ specifically, we should first target the four key safety categories Do keep investing long-term in keeping your threat modeling
that are our perennial empirical attack points (type, bounds, initialization, current, so that you can stay adaptive as your adversaries keep
and lifetime safety), and drive vulnerabilities in these four areas down to trying different attack methods.
the noise for new/updated C++ code – and we can.
We need to improve software security and software safety across the
But we must also recognize that programming language safety is not a industry, especially by improving programming language safety in C and
silver bullet to achieve cybersecurity and software safety. It’s one battle C++, and in C++ a 98% improvement in the four most common problem
(not even the biggest) in a long war: Whenever we harden one part of areas is achievable in the medium term. But if we focus on programming
our systems and make that more expensive to attack, attackers always language safety alone, we may find ourselves fighting yesterday’s war
switch to the next slowest animal in the herd. Many of 2023’s worst and missing larger past and future security dangers that affect software
data breaches did not involve malware, but were caused by inadequately written in any language.
stored credentials (e.g., Kubernetes Secrets on public GitHub repos
Sadly, there are too many bad actors. For the foreseeable future, our
[Kadkoda23]), misconfigured servers (e.g., DarkBeam [Okunytė23a],
software and data will continue to be under attack, written in any language
Kid Security [Okunytė23b]), lack of testing, supply chain vulnerabilities,
and stored anywhere. But we can defend our programs and systems, and
social engineering, and other problems that are independent of
we will.
programming languages. Apple’s white paper about 2023’s rise in
cybercrime emphasizes improving the handling, not of program code, but Be well, and may we all keep working to have a safer and more secure
of the data [Madnick23]: 2024.
it’s imperative that organizations consider limiting the amount of
personal data they store in readable format while making a greater Appendix: Illustrating why a 98%
effort to protect the sensitive consumer data that they do store reduction is feasible
[including by using] end-to-end [E2E] encryption. This Appendix exists to support why I think a 98% reduction in type/
bounds/initialization/lifetime CVEs in C++ code is believable. This is not
No matter what programming language we use, security hygiene is a formal proposal, but an overview of concrete ways to achieve such an
essential: improvement it in new and updatable code, and ways to even get some
Do use your language’s static analyzers and sanitizers. Never fraction of that improvement in existing code we cannot update but can
pretend using static analyzers and sanitizers is unnecessary “because recompile. These notes are aligned with the proposals currently being
I’m using a safe language.” If you’re using C++, Go, or Rust, then pursued in the ISO C++ safety subgroup, and if they pan out as I expect in
use those languages’ supported analyzers and sanitizers. If you’re ongoing discussions and experiments, then I intend to write further details
a manager, don’t allow your product to be shipped without using about them in a future paper.
these tools. (Again: This doesn’t mean running all sanitizers all the There are runtime and code size overheads to some of the suggestions
time; some sanitizers conflict and so can’t be used at the same time, in all four buckets, notably checking bounds and casts. But there is no
some are expensive and so should be used periodically, and some reason to think those overheads need to be inherently worse in C++ than
should be run only in testing and never in production including other languages, and we can make them on by default and still provide a
because their presence can create new security vulnerabilities.) way to opt out to regain full performance where needed.
Do keep all your tools updated. Regular patching is not just for iOS Note: For example, bounds checking can cause a major impact on
and Windows, but also for your compilers, libraries, and IDEs. some hot loops, when using a compiler whose optimizer does not hoist
Do secure your software supply chain. Do use package management bounds checks; not only can the loops incur redundant checking, but
for library dependencies. Do track a software bill of materials for they also may not get other optimizations such as not being vectorized.
This is why making bounds-checking on by default is good, but all
your projects.
performance-oriented languages also need to provide a way to say
Don’t store secrets in code. (Or, for goodness’ sake, on GitHub!) “trust me” and explicitly opt out of bounds checking tactically where
needed.
Do configure your servers correctly, especially public Internet-
facing ones. (Turn authentication on! Change the default password!) This appendix refers to the ‘profiles’ in the C++ Core Guidelines safety
Do keep non-public data encrypted, both when at rest (on disk) and profiles [CPP], a set of about two dozen enforceable rules for type and
when in motion (ideally E2E… and oppose proposed legislation memory safety of which I am a co-author. I refer to them only as examples,
that tries to neuter E2E encryption with ‘backdoors only good guys to show ‘what’ already-known rules exist that we can enforce, to support
will use’ because there’s no such thing). that my claimed improvement is possible. They are broadly consistent
with rules in other sources, such as: The C++ Programming Language’s
advice on type safety [Stroustrup13]; C++ Coding Standards’ section on
type safety [Sutter04]; the Joint Strike Fighter Coding Standards [LM05]; on every expression of the form a[b], where a is a contiguous
High Integrity C++ [Perforce13]; the C++ Core Guidelines section on sequence with a size/ssize function and b is an integral index.
safety profiles (a small enforceable set of safety rules) [CPP-1]; and the When a violation happens, the action taken can be customized
recently-released MISRA C++:2023 [MISRA]. using a global bounds violation handler; some programs will want
to terminate (the default), others will want to log-and-continue,
The best way for ‘how’ to let the programmer control enabling those rules
throw an exception, integrate with a project-specific critical fault
(e.g., via source code annotations, compiler switches, and/or something
infrastructure.
else) is an orthogonal UX issue that is now being actively discussed in the
C++ standards committee and community. Importantly, the latter explicitly avoids implementing bounds-checking
intrusively for each individual container/range/view type. Implementing
Type safety bounds-checking non-intrusively and automatically at the call site makes
Enforce the Pro.Type safety profile by default [CPP-2]. That includes full bounds checking available for every existing standard and user-
either banning or checking all unsafe casts and conversions (e.g., written container/range/view type out of the box: Every subscript into
static_cast pointer downcasts, reinterpret_cast), including a vector, span, deque, or similar existing type in third-party and
implicit unsafe type punning via C union and vararg. company-internal libraries would be usable in checked mode without any
need for a library upgrade.
However, these rules haven’t yet been systematically enforced in
the industry. For example, in recent years I’ve painfully observed a It’s important to add automatic call-site checking now before libraries
significant set of type safety-caused security vulnerabilities whose root continue adding more subscript bounds checking in each library, so
cause was that code used static_cast instead of dynamic_cast for that we avoid duplicating checks at the call site and in the callee. As a
pointer downcasts, and ‘C++’ gets blamed even when the actual problem counterexample, C# took many years to get rid of duplicate caller-and-
was failure to follow the well-publicized guidance to use the language’s callee checking, but succeeded and .NET Core addresses this better now;
existing safe recommended feature. It’s time for a standardized C++ we can avoid most of that duplicate-check-elimination optimization work
mode that enforces these rules by default. by offering automatic call-site checking sooner.
Language constructs like the range-for loop are already safe by
Note: On some platforms and for some applications, dynamic_cast
has problematic space and time overheads that hinder its use. Many
construction and need no checks.
implementations bundle dynamic_cast indivisibly with all C++ run- In cases where bounds checking incurs a performance impact, code can
time typing (RTTI) features (e.g., typeid), and so require storing still explicitly opt out of the bounds check in just those paths to retain
full potentially-heavyweight RTTI data even though dynamic_cast
full performance and still have full bounds checking in the rest of the
needs only a small subset. Some implementations also use needlessly
application.
inefficient algorithms for dynamic_cast itself. So the standard must
encourage (and, if possible, enforce for conformance, such as by
setting algorithmic complexity requirements) that dynamic_cast Initialization safety
implementations be more efficient and decoupled from other RTTI Enforce initialization-before-use by default. That’s pretty easy to
overheads, so that programmers do not have a legitimate performance statically guarantee, except for some cases of the unused parts of lazily
reason not to use the safe feature. That decoupling could require constructed array/vector storage. Two simple alternatives we could
an ABI break; if that is unacceptable, the standard must provide an
enforce are (either is sufficient):
alternative lightweight facility such as a fast_dynamic_cast that
is separate from (other) RTTI and performs the dynamic cast with Initialize-at-declaration as required by Pro.Type [CPP-2] and ES.20
minimum space and time cost. [CPP-4]; and possibly zero-initialize data by default as currently
proposed in P2723 [Bastien23]. These two are good but with
Bounds safety some drawbacks; both have some performance costs for cases that
Enforce the Pro.Bounds safety profile [CPP-3] by default, and require ‘dummy’ writes that are never used but hard for optimizers
guarantee bounds checking. We should additionally guarantee that: to eliminate, and the latter has some correctness costs because it
‘fixing’ some uninitialized cases where zero is a valid value but
Pointer arithmetic is banned (use std::span instead); this enforces masks others for which zero is not a valid initializer and so the
that a pointer refers to a single object. Array-to-pointer decay, if behavior is still wrong, but because a zero has been jammed in it’s
allowed, will point to only the first object in the array. harder for sanitizers to detect.
Only bounds-checked iterator arithmetic is allowed (also, prefer Guaranteed initialization-before-use, similar to what Ada and C#
ranges instead). successfully do. This is still simple to use, but can be more efficient
All subscript operations are bounds-checked at the call site, by because it avoids the need for artificial ‘dummy’ writes, and can be
having the compiler inject an automatic subscript bounds check more flexible because it allows alternative constructors to be used
I
n a recent blog post def canonicalise_reference(reference_type, reference_match, canonical_form):
[Higgins24] about my if (
sadness and disappointment (reference_type == "RefYearAbbrNum")
about the candidates we | (reference_type == "RefYearAbbrNumTeam")
| (reference_type == "YearAbbrNum")
were getting for interview, I ):
talked about the refactoring components = re.findall(r"\d+", reference_match)
exercise we give people, and year = components[0]
the conversations we have d1 = components[1]
afterwards. d2 = ""
corrected_reference = canonical_form.replace("dddd", year).replace("d+", d1)
I’m not able to show any of elif (
that code, but I am going to talk (reference_type == "RefYearAbbrNumNumTeam")
| (reference_type
about some code here of the == "RefYearAbrrNumStrokeNum")
type we often see. According to | (reference_type == "RefYearNumAbbrNum")
the version history, it’s passed ):
through a number of hands, components = re.findall(r"\d+", reference_match)
year = components[0]
and I want to be clear I know d1 = components[1]
none of the people involved d2 = components[2]
nor have I spoken to them. corrected_reference = (
They are, though, exactly canonical_form.replace("dddd", year).replace("d1", d1).replace("d2", d2)
the type of person presenting )
elif (
themselves for interview, and (reference_type == "AbbrNumAbbrNum")
so for my purposes here they | (reference_type == "NumAbbrNum")
are exemplars. | (reference_type == "EuroRefC")
| (reference_type == "EuroRefT")
Here’s some Python code. ):
It’s from a larger document components = re.findall(r"\d+", reference_match)
processing pipeline. year = ""
d1 = components[0]
Documents come shoved d2 = components[1]
into the system, get squished corrected_reference = canonical_form.replace("d1", d1).replace("d2", d2)
around a bit, have metadata
added, some formatting fixups, return corrected_reference, year, d1, d2
then squirt out the other end Listing 1
as nice looking pdfs. Standard
stuff.
That’s where the function in Listing 1, normalise_reference, comes
This is not about them, though. I hold them blameless, and wish t hem in. I have obfuscated identifiers in the code sample, but its structure and
only happiness. This is about the places that they worked, about the wider behaviour are as I found it.
trade, about a culture that says this is fine.
I’d been kind-of browsing around a git repository, looking at folder
structure, getting the general picture. A chunk of the system is a Django
To see a world in a grain of sand webapp and thus has that shape, so I went digging for a bit of the meat
Documents can have references to other documents, both within the underneath. This was almost the first thing I saw and, well, I kind of
existing corpus, and to a variety of external sources. These references flinched. Poking around some more confirmed it’s not an anomaly. It is
have standard forms, and when we find something that looks like a representative of the system.
document reference, we do a bit of work to make sure it’s absolutely
clean and proper. You’ve probably had some kind of reaction of your own. This is what
immediately leapt out at me:
Jez Higgins lives on the Pembrokeshire coast, largely to make The length
return-to-office mandates impractical. Truth is, he hasn’t worked in
an office for nearly 25 years, and has no intention of starting now. The width!1
He’s been programming for a living that whole time and thinks he
might be starting getting to get the hang of it. He can be contacted at
1 As this is a printed publication, in most listings the very wide lines are
[email protected] or @[email protected]
wrapped. Listing 1 is presented full-width, as is Listing 6.
14 | Overload | April 2024
Jez Higgins Feature
The visual repetition, both in the if conditions and in the bodies of I been able I would absolutely have signed up for it. It’s fascinating stuff
the conditionals and right up a multiplicity of my alleys.
The string literals Let’s imagine for a moment that I was sitting down for my first day on
this job, what would I do with this code? Well, at a guess, nothing. Well,
The string literal with the spelling mistake
nothing until I needed to, and then I’d spend a bit of time on it. But I’d
The extraneous brackets in the second conditional body – written absolutely be talking to my new colleagues about, well, everything.
by someone else?
The extra line before the return – functionally, of course, it makes One step at a time
no difference, but still, urgh The code in Listing 1 is just not great. It’s long, for a start, and it’s long
because it’s repetitious. The line
Straightaway I’m thinking that more than one person has worked on this
components = re.findall(r"\d+", reference_match)
over time. That’s normal, of course, but I shouldn’t be able to tell. If I can,
it’s a contra-indicator. appears in every branch of the if/else. Let’s start by hoisting that up.
Looking a little longer, there’s a lot of repetition – in shape, and in
detail. Looking a little longer still, and I think function parameters are Clearing visual noise
in the wrong order. reference_type and canonical_form are The unnecessary brackets in the first elif body just jar. They catch the
correlated and originate within the system. They should go together. It’s eye and makes it appear that something different is happening in the
reference_match which comes from the input document, it’s the middle there, when in fact it adds nothing and is just visual noise.
only true variable and so, for me anyway, should be the first parameter. I (This result of this change and the previous one are shown in Listing 2).
suspect this function only had two parameters initially, and the third was
added without a great deal of thought to the aesthetics of the change. Move the action down
That’s a lot to not like in not a lot of code. The if/else ladder sets up a load of variables, which are then used to
build corrected_reference.
But at least there are tests The lines building corrected_reference aren’t the same, but they
And hurrah for that! There are tests for this function, tangled up in a source are pretty similar. We can move them out of the if/else ladder and
file with some unrelated tests that pull in a mock database connection and combine them together.
some other business, but they do exist.
There are two test functions, one checking well-formed references, the
other malformed references, but, in fact, each function checks multiple def canonicalise_reference(reference_type,
cases. reference_match, canonical_form):
components = re.findall(r"\d+",
It’s a start, but the test code is much the same as the code it exercises – reference_match)
long and repetitious – which isn’t, perhaps, that surprising. A quick visual
if (
check shows they’re deficient in other, more serious ways. There are ten (reference_type == "RefYearAbbrNum")
reference types named in canonicalise_reference. The tests check | (reference_type == "RefYearAbbrNumTeam")
seven of them and, in fact, there is a whole branch of the if/else ladder | (reference_type == "YearAbbrNum")
that isn’t exercised. That’s the branch I already suspect of being a later ):
addition. year = components[0]
d1 = components[1]
Curiously too, while canonicalise_reference returns a 4-tuple, the d2 = ""
corrected_reference =
tests only check the corrected reference and the year, ignoring the other two
canonical_form.replace("dddd", year)
values. That sent me off looking for the canonicalise_reference .replace("d+", d1)
call sites, where all four elements of the tuple are used. Again, I’d suggest
the 4-tuple came in after the tests were first written and were not updated elif (
to match. After all, they still passed. (reference_type == "RefYearAbbrNumNumTeam")
| (reference_type ==
I am sure these tests were written post-hoc. They did not inform the "RefYearAbrrNumStrokeNum")
design and development of the code they support. | (reference_type == "RefYearNumAbbrNum")
):
year = components[0]
Miasma d1 = components[1]
If the phrase coming to mind is code smells, then I guess you’re right. This d2 = components[2]
corrected_reference =
code is a stinky bouquet of bad odours, except they aren’t clues to some canonical_form.replace("dddd", year)
deeper problem with the code. We don’t need clues – it’s right out there .replace("d1", d1).replace("d2", d2)
front and centre. No, these smells emanate from with the organisation,
from a failure to develop the programmers whose hands this code has elif (
passed through. The code works, let’s be clear, but there’s a clumsiness to (reference_type == "AbbrNumAbbrNum")
| (reference_type == "NumAbbrNum")
it and a lack of care in its evolution. That’s a cultural and organisational | (reference_type == "EuroRefC")
failing. | (reference_type == "EuroRefT")
):
I keep saying this is about organisations. I’m not saying these are bad year = ""
places to work, where maladjusted managers delight in making their d1 = components[0]
underlings squirm. Quite the contrary, I’ve worked at more than one of d2 = components[1]
the organisations responsible for the code above and had a great time. corrected_reference =
canonical_form.replace("d1", d1)
But there is something wrong – an unacknowledged failure. An unknown .replace("d2", d2)
failure even. There so much potential, and it’s just not being taken up
I came across this code because I was talking about potential work on it, return corrected_reference, year, d1, d2
going back into one of those organisations. That didn’t pan out, but had
Listing 2
April 2024 | Overload | 15
Feature Jez Higgins
YearAbbrNum_Group = [
def canonicalise_reference(reference_type, RefYearAbbrNum,
reference_match, canonical_form): RefYearAbbrNumTeam,
components = re.findall(r"\d+", YearAbbrNum
reference_match) ]
if ( Having tried it, I like that. Let’s roll it out to the rest of the types (see
(reference_type == "RefYearAbbrNum") Listing 5.)
| (reference_type == "RefYearAbbrNumTeam")
| (reference_type == "YearAbbrNum") Love it.
):
year = components[0] Remembered Python calls arrays lists, but also that it has tuples too.
d1 = components[1] Tuples are immutable, so they’re a better choice for our groups.
d2 = ""
elif (
(reference_type == "RefYearAbbrNumNumTeam")
| (reference_type == def canonicalise_reference(reference_type,
"RefYearAbrrNumStrokeNum") reference_match, canonical_form):
| (reference_type == "RefYearNumAbbrNum") components = re.findall(r"\d+",
): reference_match)
year = components[0]
d1 = components[1] if reference_type in YearNum_Group:
d2 = components[2] year = components[0]
d1 = components[1]
elif ( d2 = ""
(reference_type == "AbbrNumAbbrNum") elif reference_type in YearNumNum_Group:
| (reference_type == "NumAbbrNum") year = components[0]
| (reference_type == "EuroRefC") d1 = components[1]
| (reference_type == "EuroRefT") d2 = components[2]
): elif reference_type in NumNum_Group:
year = "" year = ""
d1 = components[0] d1 = components[0]
d2 = components[1] d2 = components[1]
corrected_reference = corrected_reference =
(canonical_form.replace("dddd", year) (canonical_form.replace("dddd", year)
.replace("d1", d1) .replace("d1", d1)
.replace("d2", d2)) .replace("d2", d2))
Listing 3 Listing 5
16 | Overload | April 2024
Jez Higgins Feature
The result of swapping tuples for lists by switching [] def canonicalise_reference(reference_type,
to () is: reference_match, canonical_form):
YearAbbrNum_Group = ( components = re.findall(r"\d+",
RefYearAbbrNum, reference_match)
RefYearAbbrNumTeam,
YearAbbrNum if reference_type in YearNum_Group:
) year, d1, d2 = components[0], components[1], ""
elif reference_type in YearNumNum_Group:
year, d1, d2 = components[0], components[1], components[2]
Destructure FTW! elif reference_type in NumNum_Group:
We can collapse the year, d1, d2 = "", components[0], components[1]
That worked, and Listing 8 shows it extended across the two elif
branches.
Yoink out the decision making
It’s not really clear in the code, but there are only two things
YearNum_Group = { really going on in this function. The first is pulling chunks out of
"Types": [ reference_match, and the second is putting those parts back together
RefYearAbbrNum, into canonical_reference. Let’s make that clearer (see Listing 9).
RefYearAbbrNumTeam,
YearAbbrNum def reference_components(reference_type,
], reference_match):
"Parts": lambda cmpts: (cmpts[0], cmpts[1], "") components = re.findall(r"\d+",
} reference_match)
if reference_type in YearNum_Group.Types:
def canonicalise_reference(reference_type, year, d1, d2 =
reference_match, canonical_form): YearNum_Group.Parts(components)
components = re.findall(r"\d+", elif reference_type in YearNumNum_Group.Types:
reference_match) year, d1, d2 =
YearNumNum_Group.Parts(components)
if reference_type in YearNum_Group.Types: elif reference_type in NumNum_Group.Types:
year, d1, d2 = year, d1, d2 = NumNum_Group.Parts(components)
YearNum_Group.Parts(components)
elif reference_type in YearNumNum_Group.Types: return year, d1, d2
year, d1, d2 =
YearNumNum_Group.Parts(components) def canonicalise_reference(reference_type,
elif reference_type in NumNum_Group.Types: reference_match, canonical_form):
year, d1, d2 = year, d1, d2 = reference_components(
NumNum_Group.Parts(components) reference_type, reference_match)
corrected_reference = corrected_reference =
(canonical_form.replace(“dddd”, year) (canonical_form.replace("dddd", year)
.replace("d1", d1) .replace("d1", d1)
.replace("d2", d2)) .replace("d2", d2))
def canonicalise_reference(reference_type, He was right and I knew it. Had this code been in C#, for instance, I’d
reference_match, canonical_form): probably have gone straight from the if ladder to a LINQ expression.
year, d1, d2 =
reference_components(reference_type, He set me off. I knew Python’s list comprehensions were its LINQ-a-like,
reference_match) and I had half an idea I could use one here.
corrected_reference = However, I thought list comprehensions only created new lists. If I’d
(canonical_form.replace(“dddd”, year) done that here, it would mean I’d still have to extract the first element.
.replace("d1", d1) That felt at least as clumsy as the for loop.
.replace("d2", d2))
Turns out I’d only ever half used them, though. A list comprehension
return corrected_reference, year, d1, d2 actually returns an iterable. Combined with next(), which pulls the next
Listing 11 element off the iterable, and well, it’s more pythonic.
18 | Overload | April 2024
Jez Higgins Feature
def reference_components(reference_type, def reference_components(reference_match,
reference_match): reference_type):
components = re.findall(r"\d+", components = re.findall(r"\d+",
reference_match) reference_match)
return next(group.Parts(components) for group in TypeGroups:
for group in TypeGroups if reference_type in group.Types:
if reference_type in group.Types) return group.Parts(components)
What’s kind of fascinating about this change is that the list comprehension
def build_canonical_form(canonical_form,
has the exact same elements as the for version, but the intent, as Barney year, d1, d2):
suggested, is very different. return (canonical_form.replace("dddd", year)
.replace("d1", d1)
At the same time, Barney came up with almost exactly the same thing, too .replace("d2", d2))
[Dellar24]. We’d done a weird long-distance almost-synchronous little
pairing session. Magic. def canonicalise_reference(reference_match,
reference_type, canonical_form):
year, d1, d2 =
Reflecting reference_components(reference_match,
This is contrived, obviously, because it’s a single function I’ve pulled out reference_type)
of larger code base.
corrected_reference =
But, but, but, I do believe that now I’ve shoved it about that it’s better build_canonical_form(canonical_form,
code. year, d1, d2)
If I was able to work to my way out from here, I’m confident I could make return corrected_reference, year, d1, d2
the whole thing better. It’d be smaller, it would be easier to read, easier
to change.
Listing 13
User-Defined Formatting
in std::format
std::format allows us to format values quickly and safely.
Spencer Collyer demonstrates how to provide formatting
for a simple user-defined class.
I
n a previous article [Collyer21], [I gave an introduction to the C++26 and runtime_format
std::format library, which brings modern text formatting Forcing the use of the v-prefixed functions for non-constant format
capabilities to C++. specs is not ideal, and can introduce some problems. The original
That article concentrated on the output functions in the library and how P2216 paper mentioned possible use of a runtime_format to allow
they could be used to write the fundamental types and the various string non-constant format specs but did not add any changes to enable that.
types that the standard provides. A new proposal [P2918] does add such a function, and once again
allows non-constant format specs in the various format functions. This
Being a modern C++ library, std::format also makes it relatively easy paper has been accepted into C++26, and the libstdc++ library that
to output user-defined types, and this series of articles will show you how comes with GCC should have it implemented by the time you read this
to write the code that does this. article, if you want to try it out.
There are three articles in this series. This article describes the basics
of setting up the formatting for a simple user-defined class. The second Creating a formatter for a user-defined type
article will describe how this can be extended to classes that hold objects To enable formatting for a user-defined type, you need to create a
whose type is specified by the user of your class, such as containers. specialization of the struct template formatter. The standard defines
The third article will show you how to create format wrappers, special this as:
purpose classes that allow you to apply specific formatting to objects of template<class T, class charT = char>
existing classes. struct formatter;
A note on the code listings: The code listings in this article have lines where T is the type you are defining formatting for, and charT is the
labelled with comments like // 1. Where these lines are referred to in character type your formatter will be writing.
the text of this article, it will be as ‘line 1’ for instance, rather than ‘the
Each formatter needs to declare two functions, parse and format,
line labelled // 1’.
that are called by the formatting functions in std::format. The purpose
and design of each function is described briefly in the following sections.
Interface changes
Since my previous article was first published, based on the draft C++20
Inheriting existing behaviour
standard, the paper [P2216] was published which changes the interface
Before we dive into the details of the parse and format functions, it is
of the format, format_to, format_to_n, and formatted_size
worth noting that in many cases you can get away with re-using existing
functions. They no longer take a std::string_view as the format
formatters by inheriting from them. Normally, you would do this if the
string, but instead a std::format_string (or, for the wide-character
standard format spec does everything you want, so you can just use the
overloads std::wformat_string). This forces the format string to
inherited parse function and write your own format function that
be a constant at compile time. This has the major advantage that compile
ultimately calls the one on the parent class to do the actual formatting.
time checks can be carried out to ensure it is valid.
For instance, you may have a class that wraps an int to provide
The interfaces of the equivalent functions prefixed with v (e.g. vformat)
some special facilities, like clamping the value to be between min and
has not changed and they can still take runtime-defined format specs.
max values, but when outputting the value you are happy to have the
One effect of this is that if you need to determine the format spec standard formatting for int. In this case you can just inherit from
at runtime then you have to use the v-prefixed functions and pass the std::formatter<int> and simply override the format function to
arguments as an argument pack created with make_format_args or call the one on that formatter, passing the appropriate values to it. An
make_wformat_args. This will impact you if, for instance, you want example of doing this is given in Listing 1 on the next page.
to make your program available in multiple languages, where you would
Or you may be happy for your formatter to produce a string representation
read the format spec from some kind of localization database.
of your class and use the standard string formatting to output that string.
Another effect is on error reporting in the functions that parse the format You would inherit from std::formatter<std::string> and just
spec. We will deal with this when describing the parse function of the override the format function to generate your string representation and
formatter classes described in this article. then call the parent format function to actually output the value.
#include <format> On entry to the function, pc.begin() points to the start of the format-
#include <iostream> spec for the replacement field being formatted. The value of pc.end() is
#include <type_traits> such as to allow the parse function to read the entire format-spec. Note
that the standard specifies that an empty format-spec can be indicated by
class MyInt
{
either pc.begin() == pc.end() or *pc_begin() == '}', so
public: your code needs to check for both conditions.
MyInt(int i) : m_i(i) {};
int value() const { return m_i; }; The parse function should process the whole format-spec. If it
private: encounters a character it doesn’t understand, other than the } character
int m_i; that indicates the format-spec is complete, it should report an error. The
}; way to do this is complicated by the need to allow the function to be
template<> called at compile time. Before that change was made, it would be normal
struct std::formatter<MyInt>
: public std::formatter<int> to throw a std::format_error exception. You can still do this, with
{ the proviso that the compiler will report an error, as throw cannot be
using Parent = std::formatter<int>; used when evaluating the function at compile time. Until such time as
auto format(const MyInt& mi, a workaround has been found for this problem, it is probably best to
std::format_context& format_ctx) const
{ just throw the exception and allow the compiler to complain. That is the
return Parent::format(mi.value(), solution used in the code presented in this article.
format_ctx);
} If the whole format-spec is processed with no errors, the function should
}; return an iterator pointing to the terminating } character. This is an
int main() important point – the } is not part of the format-spec and should not be
{ consumed, otherwise the formatting functions themselves will throw an
MyInt mi{1};
std::cout << std::format(“{0} [{0}]\n”, mi);
error.
}
Format specification mini-language
Listing 1 The format-spec for your type is written in a mini-language which you
It should store any formatting information from the format-spec in the design. It does not have to look like the one for the standard format-specs
formatter object itself1. defined by std::format. There are no rules for the mini-language, as
As a reminder of what is actually being parsed, my previous article had long as you can write a parse function that will process it.
the following for the general format of a replacement field: An example of a specialist mini-language is that defined by std::chrono
‘{’ [arg-id] [‘:’ format-spec] ‘}’ or its formatters, given for instance at [CppRef]. Further examples are
given in the code samples that make up the bulk of this series of articles.
so the format-spec is everything after the : character, up to but not There are some simple guidelines to creating a mini-language in the
including the terminating }. appendix at the end of this article: ‘Simple Mini-Language Guidelines’.
Assume we have a typedef PC defined as follows:
using PC = basic_format_parse_context<charT>;
The format function
The format function does the work of actually outputting the value of
where charT is the template argument passed to the formatter the argument for the replacement field, taking account of the format-spec
template. Then the parse function prototype looks like the following: that the parse function has processed.
constexpr PC::iterator parse(PC& pc); Assume we have a typedef FC defined as follows:
The function is declared constexpr so it can be called at compile time. using FC = basic_format_context<Out, charT>;
The standard defines specialisations of the basic_format_parse_ where Out is an output iterator and charT is the template argument
context template called format_parse_context and wformat_ passed to the formatter template. Then the format function prototype
parse_context, with charT being char and wchar_t respectively. looks like the following:
1 There is nothing stopping you storing the formatting information in a FC::iterator format(const T& t, FC& fc) const;
class variable or even a global variable, but the standard specifies that
the output of the format function in the formatter should only where T is the template argument passed to the formatter template.
depend on the input value, the locale, and the format-spec as parsed by
Note that the format function should be const-qualified. This is
the last call to parse. Given these constraints, it is simpler to just store
because the standard specifies that it can be called on a const object.
the formatting information in the formatter object itself.
April 2024 | Overload | 21
Feature Spencer Collyer
The standard defines specialisations of the basic_format_context In the parse function, the lambda get_char defined in line 1 acts as
template called format_context and wformat_context, with a convenience function for getting either the next character from the
charT being char and wchar_t respectively. format-spec, or else indicating the format-spec has no more characters
by returning zero. It is not strictly necessary in this function as it is only
The function should format the value t passed to it, using the formatting
called once, but will be useful as we extend the format-spec later.
information stored by parse, and the locale returned by fc.locale()
if it is locale-dependent. The output should be written starting at The if-statement in line 2 checks that we have no format-spec defined.
fc.out(), and on return the function should return the iterator just past The value 0 will be returned from the call to get_char if the begin and
the last output character. end calls on parse_ctx return the same value.
If you just want to output a single character, the easiest way is to write The format function has very little to do – it just returns the result of
something like the following, assuming iter is the output iterator and c calling format_to with the appropriate output iterator, format string,
is the character you want to write: and details from the Point object. The only notable thing to point out is
*iter++ = c; that we wrap the format_ctx.out() call which gets the output iterator
If you need more complex formatting than just writing one or two #include "Point.hpp"
#include <format>
characters, the easiest way to create the output is to use the formatting #include <iostream>
functions already defined by std::format, as they correctly maintain #include <type_traits>
the output iterator.
template<>
The most useful function to use is std::format_to, as that takes the struct std::formatter<Point>
iterator returned by fc.out() and writes directly to it, returning the {
updated iterator as its result. Or if you want to limit the amount of data constexpr auto parse(
std::format_parse_context& parse_ctx)
written, you can use std::format_to_n. {
Using the std::format function itself has a couple of disadvantages. auto iter = parse_ctx.begin();
auto get_char = [&]() { return iter
Firstly it returns a string which you would then have to send to the != parse_ctx.end() ? *iter : 0; }; // 1
output. And secondly, because it has the same name as the function in char c = get_char();
formatter, you have to use a std namespace qualifier on it, even if if (c != 0 && c != '}') // 2
you have a using namespace std; line in your code, as otherwise {
throw std::format_error(
function name resolution will pick up the format function from the
"Point only allows default formatting");
formatter rather than the std::format one. }
return iter;
Formatting a simple object }
auto format(const Point& p,
For our first example we are going to create a formatter for a simple std::format_context& format_ctx) const
Point class, defined in Listing 2. {
return std::format_to(std::move(
format_ctx.out()), "{},{}", p.x(), p.y());
Default formatting }
Listing 3 shows the first iteration of the formatter for Point. This just };
allows default formatting of the object. int main()
{
Point p;
class Point std::cout << std::format("{0} [{0}]\n", p);
{ try
public: {
Point() {} std::cout << std::vformat("{0:s}\n",
Point(int x, int y) : m_x(x), m_y(y) {} std::make_format_args(p));
}
const int x() const { return m_x; } catch (std::format_error& fe)
const int y() const { return m_y; } {
std::cout << "Caught format_error : "
private: << fe.what() << "\n";
int m_x = 0; }
int m_y = 0; }
};
Listing 2 Listing 3
22 | Overload | April 2024
Spencer Collyer Feature
in std::move. This is in case the user is using an output that has move- The code for this example is in Listing 4.
only iterators.
Member variables
Adding a separator character and width specification The first point to note is that we now have to store information derived
Now we have seen how easy it is to add default formatting for a class, from the format-spec by the parse function so the format function
let’s extend the format specification to allow some customisation of the can do its job. So we have a set of member variables in the formatter
output. defined from line 10 onwards.
The format specification we will use has the following form: The default values of these member variables are set so that if no format-
spec is given, a valid default output will still be generated. It is a good
[sep] [width]
idea to follow the same principle when defining your own formatters.
where sep is a single character to be used as the separator between the two
values in the Point output, and width is the minimum width of each of The parse function
the two values. Both elements are optional. The sep character can be any The parse function has expanded somewhat to allow parsing of the
character other than } or a decimal digit. new format-spec. Line 1 gives a short-circuit if there is no format-spec
defined, leaving the formatting as the default.
#include "Point.hpp"
#include <format>
#include <iostream> if (!IsDigit(c)) // 7
{
using namespace std; throw format_error("Invalid format "
"specification for Point");
template<> }
struct std::formatter<Point> m_width = get_int(); // 8
{ m_width_type = WidthType::Literal;
constexpr auto parse( if ((c = get_char()) != '}') // 9
format_parse_context& parse_ctx) {
{ throw format_error("Invalid format "
auto iter = parse_ctx.begin(); "specification for Point");
auto get_char = [&]() { return iter }
!= parse_ctx.end() ? *iter : 0; }; return iter;
char c = get_char(); }
if (c == 0 || c == '}') // 1 auto format(const Point& p,
{ format_context& format_ctx) const
return iter; {
} if (m_width_type == WidthType::None)
auto IsDigit = [](unsigned char uc) { return {
isdigit(uc); }; // 2 return
if (!IsDigit(c)) // 3 format_to(std::move(format_ctx.out()),
{ "{0}{2}{1}", p.x(), p.y(), m_sep);
m_sep = c; }
++iter; return format_to(std::move(format_ctx.out()),
if ((c = get_char()) == 0 || c == '}') //4 "{0:{2}}{3}{1:{2}}", p.x(), p.y(), m_width,
{ m_sep);
return iter; }
} private:
} char m_sep = ‘,’; // 10
auto get_int = [&]() { // 5 enum WidthType { None, Literal };
int val = 0; WidthType m_width_type = WidthType::None;
char c; int m_width = 0;
while (IsDigit(c = get_char())) // 6 };
{ int main()
val = val*10 + c-'0'; {
++iter; Point p1(1, 2);
} cout << format("[{0}] [{0:/}] [{0:4}]"
return val; "[{0:/4}]\n", p1);
}; }
In the code following the check above we need to check if the specified as in the standard format specification with either {} or {n},
character we have is a decimal digit. The normal way to do this is to where n is an argument index.
use std::isdigit, but because this function has undefined behaviour
The format specification for this example is identical to the one above,
if the value passed cannot be represented as an unsigned char, we
with the addition of allowing the width to be specified at runtime.
define lambda IsDigit at line 2 as a wrapper which ensures the value
passed to isdigit is an unsigned char. The code for this example is in Listing 5. When labelling the lines in this
listing, corresponding lines in Listing 4 and Listing 5 have had the same
As mentioned above, any character that is not } or a decimal digit is taken
labels applied. This does mean that some labels are not used in Listing 5
as being the separator. The case of } has been dealt with by line 1 already.
if there is nothing additional to say about those lines compared to Listing
The if-statement at line 3 checks for the second case. If we don’t have
4. We use uppercase letters for new labels introduced in Listing 5.
a decimal digit character, the value in c is stored in the member variable.
We need to increment iter before calling get_char in line 4 because
get_char itself doesn’t touch the value of iter.
#include "Point.hpp"
Line 4 checks to see if we have reached the end of the format-spec after #include <format>
reading the separator character. Note that we check for the case where #include <iostream>
get_char returns 0, which indicates we have reached the end of the using namespace std;
format string, as well as the } character that indicates the end of the template<>
format-spec. This copes with any problems where the user forgets to struct std::formatter<Point>
terminate the replacement field correctly. The std::format functions {
will detect such an invalid condition and throw a std::format_error constexpr auto
exception. parse(format_parse_context& parse_ctx)
{
The get_int lambda function defined starting at line 5 attempts to auto iter = parse_ctx.begin();
read a decimal number from the format-spec. On entry iter should be auto get_char = [&]() { return iter
!= parse_ctx.end() ? *iter : 0; };
pointing to the start of the number. The while-loop controlled by line 6 char c = get_char();
keeps reading characters until a non-decimal digit is found. In the normal if (c == 0 || c == '}')
case this would be the } that terminates the format-spec. We don’t check {
in this function for which character it was, as that is done later. Note that return iter;
as written, the get_int function has undefined behaviour if a user uses }
auto IsDigit = [](unsigned char uc)
a value that overflows an int – a more robust version could be written if { return isdigit(uc); };
you want to check against users trying to define width values greater than if (c != '{' && !IsDigit(c)) // 3
the maximum value of an int. {
m_sep = c;
The check in line 7 ensures we have a width value. Note that the checks ++iter;
in lines 3 and 4 will have caused the function to return if we just have a if ((c = get_char()) == 0 || c == '}')
sep element. {
return iter;
The width is read and stored in line 8, with the following line indicating }
we have a width given. }
auto get_int = [&]() {
Finally, line 9 checks that we have correctly read all the format-spec. This int val = 0;
char c;
is not strictly necessary, as the std::format functions will detect any while (IsDigit(c = get_char()))
failure to do so and throw a std::format_error exception, but doing {
it here allows us to provide a more informative error message. val = val*10 + c-'0';
++iter;
}
The format function return val
The format function has changed to use the sep and width elements };
specified. It should be obvious what is going on, so we won’t go into it if (!IsDigit(c) && c != '{') // 7
in any detail. {
throw format_error("Invalid format "
"specification for Point");
Specifying width at runtime }
In this final example we will allow the width element to be specified at
runtime. We do this by allowing a nested replacement field to be used, Listing 5
24 | Overload | April 2024
Spencer Collyer Feature
if (c == '{') // A private:
{ mutable char m_sep = ',';
m_width_type = WidthType::Arg; // B enum WidthType { None, Literal, Arg };
++iter; mutable WidthType m_width_type
if ((c = get_char()) == '}') // C = WidthType::None;
{ mutable int m_width = 0;
m_width = parse_ctx.next_arg_id(); };
} int main()
else // D {
{ Point p1(1, 2);
m_width = get_int(); cout << format(
parse_ctx.check_arg_id(m_width); "[{0}] [{0:-}] [{0:4}] [{0:{1}}]\n", p1, 4);
} cout << format(
++iter; "With automatic indexing: [{:{}}]\n", p1, 4);
} try
else // E {
{ cout << vformat("[{0:{2}}]\n",
m_width = get_int(); // 8 std::make_format_args(p1, 4));
m_width_type = WidthType::Literal; }
} catch (format_error& fe)
if ((c = get_char()) != '}') {
{ cout << format("Caught exception: {}\n",
throw format_error("Invalid format " fe.what());
"specification for Point"); }
} }
return iter;
} Listing 5 (cont’d)
auto format(const Point& p,
format_context& format_ctx) const Nested replacement fields
{
if (m_width_type == WidthType::None) The standard format-spec allows you to use nested replacement fields
{ for thewidth and prec fields. If your format-spec also allows nested
return replacement fields, the basic_format_parse_context class has a
format_to(std::move(format_ctx.out()), couple of functions to support their use: next_arg_id and check_
"{0}{2}{1}", p.x(), p.y(), m_sep); arg_id. They are used in the parse function for Listing 5, and a
}
if (m_width_type == WidthType::Arg) // F description of what they do will be given in that section.
{
m_width = get_arg_value(format_ctx, The parse function
m_width);
} The first change in the parse function is on line 3. As can be seen, in
return format_to(std::move(format_ctx.out()), the new version, it has to check for the { character as well as for a digit
"{0:{2}}{3}{1:{2}}", p.x(), p.y(), m_width, when checking if a width has been specified. This is because the dynamic
m_sep); width is specified using a nested replacement field, which starts with a {
} character.
private:
int get_arg_value(format_context& format_ctx, The next difference is in line 7, where we again need to check for a {
int arg_num) const // G
character as well as a digit to make sure we have a width specified.
{
auto arg = format_ctx.arg(arg_num); // H The major change to this function starts at line A. This if-statement
if (!arg)
{ checks if the next character is a {, which indicates we have a nested
string err; replacement field. If the test passes, line B marks that we need to read
back_insert_iterator<string> out(err); the width from an argument, and then we proceed to work out what the
format_to(out, "Argument with id {} not " argument index is.
"found for Point", arg_num);
throw format_error(err); The if-statement in line C checks if the next character is a }, which
} means we are using automatic indexing mode. If the test passes, we call
int width = visit_format_arg([]
the next_arg_id function on parse_ctx to get the argument number.
(auto value) -> int { // I
if constexpr ( That function first checks if manual indexing mode is in effect, and if
!is_integral_v<decltype(value)>) it is it throws a format_error exception, as you cannot mix manual
{ and automatic indexing. Otherwise, it enters automatic indexing mode
throw format_error("Width is not " and returns the next argument index, which in this case is assigned to
"integral for Point”);
} the m_width variable.
else if (value < 0 If the check in line C fails, we enter the else-block at line D to do manual
|| value > numeric_limits<int>::max())
{ indexing. We get the argument number by calling get_int, and then
throw format_error("Invalid width for " we call the check_arg_id function on parse_ctx. The function
Point"); checks if automatic indexing mode is in effect, and if so it throws a
} format_error exception. If automatic indexing mode is not in effect
else
{ then check_arg_id enters manual indexing mode.
return value; The else-block starting at line E just handles the case where we have
}
}, arg); literal width specified in the format-spec, and is identical to the code
return width; starting at line 8 in Listing 4.
}
Note that when used at compile time, next_arg_id or check_arg_id
check that the argument id returned (for next_arg_id) or supplied (for
Listing 5 (cont’d)
April 2024 | Overload | 25
Feature Spencer Collyer
check_arg_id) is within the range of the arguments, and if not will fail Enable a sensible default
to compile. However, this is not done when called at runtime. It should be possible to use an empty format-spec and obtain sensible
output for your type. Then the user can just write {} in the format string
The format function and get valid output. Effectively this means that every element of your
The changes to the format function are just the addition of the if- mini-language should be optional, and have a sensible default.
statement starting at line F. This checks if we need to read the width value
from an argument, and if so it calls the get_arg_value function to get Shorter is better
the value and assign it to the m_width variable, so the format_to call Your users are going to be using the mini-language each time they want
following can use it. to do non-default outputting of your type. Using single characters for the
elements of the language is going to be a lot easier to use than having to
The get_arg_value function type whole words.
The get_arg_value function, defined starting at line G, does the work
of actually fetching the width value from the argument list. Keep it simple
Line H tries to fetch the argument from the argument list. If the argument Similar to the above, avoid having complicated constructions or
number does not represent an argument in the list, it returns a default interactions between different elements in your mini-language. A simple
constructed value. The following if-statement checks for this, and interaction, like in the standard format-spec where giving an align element
reports the error if required. Note that in your own code you might want causes any subsequent ‘0’ to be ignored, is fine, but having multiple
to disable or remove any such checks from production builds, but have elements interacting or controlling others is going to lead to confusion.
them in debug/testing builds.
Make it single pass
If the argument is picked up correctly, line I uses the visit_format_arg
function to apply the lambda function to the argument value picked up in It should be possible to parse the mini-language in a single pass. Don’t
line H. The visit_format_arg function is part of the std::format have any constructions which necessitate going over the format-spec
API. The lambda function checks that the value passed is of the correct more than once. This should be helped by following the guideline above
type – in this case, an integral type – and that its value is in the allowed to ‘Keep it simple’. This is as much for ease of programming the parse
range. Failure in either case results in a format_error exception. function as it is for ease of writing format-specs.
Otherwise, the lambda returns the value passed in, which is used as the
width. Avoid ambiguity
If it is possible for two elements in your mini-language to look alike then
Summary you have an ambiguity. If you cannot avoid this, you need a way to make
We have seen how to add a formatter for a user-defined class, and the second element distinguishable from the first.
gone as far as allowing the user to specify certain behaviour (in our case For instance, in the standard format-spec, the width and prec elements are
the width) at runtime. We will stop at this point as we’ve demonstrated both integer numbers, but the prec element has ‘.’ as an introducer so you
what is required, but there is no reason why a real-life Point class couldn’t can always tell what it is, even if no width is specified.
have further formatting abilities added.
In the next article in the series, we will explain how you can write a Use nested-replacement fields like the standard ones
formatter for a container class, or any other class where the types of some If it makes sense to allow some elements (or parts of elements) to be
elements of the class can be specified by the user. n specified at run-time, use nested replacement fields that look like the
ones in the standard format-spec to specify them, i.e. { and } around an
Appendix: Simple mini-language guidelines optional number.
As noted when initially describing the parse function of the formatters,
the format-spec you parse is created using a mini-language, the design Avoid braces
of which you have full control over. This appendix offers some simple Other than in nested replacement fields, avoid using braces (`{` and `}`)
guidelines to the design of your mini-language. in your mini-language, except in special circumstances.
Before giving the guidelines, I’d like to introduce some terminology.
These are not ‘official’ terms but hopefully will make sense. References
[Collyer21] Spencer Collyer (2021) ‘C++20 Text Formatting – An
An element of a mini-language is a self-contained set of characters
Introduction’ in Overload 166, December 2021, available at:
that perform a single function. In the standard format-spec most
https://accu.org/journals/overload/29/166/collyer/
elements are single characters, except for the width and prec values,
and the combination of fill and align. [CppRef] std::formatter<std::chrono::systime>:
https://en.cppreference.com/w/cpp/chrono/system_clock/formatter
An introducer is a character that says the following characters make
[P2216] P2216R3 – std::format improvements, Victor Zverovich, 5 Feb
up a particular element. In the standard format-spec the ‘.’ at the
2021, https://wg21.link/P2216
start of the prec element is an introducer.
[P2918] P2918R2 – Runtime format strings II, Victor Zverovich, 7 Nov
Remember, the following are guidelines, not rules. Feel free to bend or 2023, https://wg21.link/P2918
break them if you think you have a good reason for doing so.
Judgment Day
What if AI takes your job?
Teedy Deigh finds out.
TD what? that ‘thorough study’ means they saw a couple of videos, read some
MD I’ve been trying to get in touch. press releases and spent the rest of the day binge-watching classic
sci-fi
TD i know
got the same desperate msg from you on a dozen platforms MD I’m sure they were more thorough than that.
repeated enough times to buffer overflow TD fraid not
you even left voicemail msgs been dealing with their ‘architectures’ for years
who even uses phones for that anymore? me and the other devs had sweepstakes bout what was gonna come
and all before a reasonable person’s had the chance to have a 4th up
coffee both the questionable technical choices and the movie refs
so what’s app? MD Movie references?
MD We have a problem and we need your help. TD plus we kept a repo of ADRs to deal with their decisions
TD i don’t work for you any more MD ADRs?
MD But we’ve got a problem. TD Architecture Denial Records
TD you fired all the developers just over 2 weeks ago ways of working around and avoiding the official architecture
MD It’s serious. TBH might’ve been the most enjoyable and creative part of my job
TD so was firing all the developers MD I found their presentations compelling and insightful.
MD We had no choice. Our new AI-only development strategy was TD that’s not how you spell inciteful
approved by the board. We followed through. There’s no turning your predecessor made them architects to keep them out of the code
back. We’re embracing the future. reckoned they couldn’t do as much damage with PowerPoint
marketecture
TD who proposed the strategy?
guess we now know that wasn’t true
MD That’s not important.
MD Which is why I’m contacting you.
TD who proposed the strategy? It’s not working.
MD I did. But it was based on a thorough study and supported by a TD what’s not working?
number of others.
MD It. You know. The software. The stuff you develop.
TD who?
TD developed
MD Some managers, the finance department, marketing, HR and C-level
MD Whatever. It’s not working. After the last sprint things started going
execs.
wrong, and it’s all blown up this morning.
TD C-level?
TD when you say last sprint you mean the first sprint using 100% LLM-
sounds like you went overboard
based codegen?
you involve any techies?
MD Yes, and we don’t understand what’s wrong. I’ve been told all the
MD Yes, a couple of senior architects did the study.
tests are passing.
TD i meant bit wranglers not hand wavers
TD which tests?
MD You mean developers?
MD The ones generated by the AI.
Of course not! That’s like getting turkeys to vote on Xmas.
TD seriously WTF?!
MD Sorry about that. Sensitivity training’s not booked until next month.
Anyway, the architects said lots of technical things that sounded very TD
impressive and quite persuasive. has anyone looked at the code?
That all you need are product owners describing the functionality
and architects filling in some technical bits, the non-functional stuff. Teedy Deigh
AI generates all the code. Teedy says she’s been dealing with artificial intelligence her whole
They called it the Skynet strategy, for some reason, and said it would career, that many of her colleagues qualify and are not as smart as
terminate our need for developers. they make themselves out to be, (deeply) faking and (heavily) bluffing
TD oh I know which architects you mean their way through codebases, technologies and business decisions,
‘non-functional’ is definitely the right description playing an imitation game informed by Stack Overflow, hype cycles and
group think, and that it’s not imposter syndrome if they are actually
imposters.
April 2024 | Overload | 27
Feature Teedy Deigh
MD Yes, the architects. MD I don’t recall all this stuff about ‘precision’, ‘rigour’, ‘detail’ and
TD what did they say? ‘checking’ being mentioned in the study. Is this what they call
‘prompt engineering’?
MD They shrugged and said ‘LGTM’, if I recall correctly. Not quite sure
what they meant. TD it’s what we call programming
tell you what
TD when a dev uses LGTM it means they couldn’t be bothered to look
i’ll help you sort out this mess if you give me my old job back
through it
when an architect uses LGTM it means they haven’t a clue MD We can’t do that. There’s no software development department
basically your CI/CD pipeline is now a GIGO pipeline anymore. We let it go, and the budget for software is frozen.
MD Is that bad? TD well that’s all very Disney of you but no job means no help
to be clear
TD very
what you need is someone to correctly specify, verify, adapt and
MD I also overheard them later on being concerned about someone called adjust prompts?
Ellie.
MD Exactly.
TD that would probably be ELE
TD that would be like a product owner right?
Extinction Level Event
MD Yes.
MD What does that mean?
I see.
TD they were probably talking about the deep impact on the company’s We have hiring capacity for POs. But that would mean hiring you
prospects back at a higher pay grade than when you were a software developer.
MD This is even worse than I thought! TD i have no problem with that
TD perhaps your product owners could have a go at fixing things and as a senior PO i’d be able to take advantage of this (re)hiring
i mean it’s their code right? capacity yes?
MD They just told the AI what they wanted it to do. MD Wait, why would you be senior?
TD did they precisely and rigorously specify what they wanted? TD you need a PO with the specific ability to be specific in a way that is
MD They’re product owners, what do you think? correct?
that seems to be a higher grade of ability than the other POs
TD ah
guess that also means they didn’t check the results or specify at a MD That’s true.
high-level of detail? TD and you have a (very very) big problem that needs to be solved asap
MD Do they need to do that? It seems like a lot of work. I thought they MD That’s also true.
just needed to nudge the AI and it would all work. TD just to check: senior PO is higher up the hierarchy than senior
TD ‘prompt’ not ‘nudge’ architect?
you need to be very detailed and very precise and to pay a lot of MD Correct.
attention
TD then i accept
and then you do the nudging
pls tell the architects i’ll be back
(and often quite a lot of shoving)
if not, it’s no better than telling your cat you farted
accu
accu
Professionalism in Programming
Professional development
World-class conference
Printed journals
Email discussion groups
Individual membership
Corporate membership
Visit accu.org
for details