Error Message Style Guides of Various Languages
PyPy has been trying to produce good SyntaxErrors and other errors for a long time. CPython has also made an enormous push to `improve its SyntaxErrors in the last few releases`_. These improvements are great, but the process feels somewhat arbitrary sometimes. To see what other languages are doing, I asked people on Twitter whether they know of error message style guides for other programming languages.
Wonderfully, people answered me with lots of helpful links (full list at the end of the post), thank you everybody! All those sources are very interesting and contain many great points, I recommend reading them directly! In this post, I'll try to summarize some common themes or topics that I thought were particularly interesting.
Almost all guides stress the need for plain and simple English, as well as conciseness and clarity [Flix, Racket, Rust, Flow]. Flow suggests to put coding effort into making the grammar correct, for example in the case of plurals or to distinguish between "a" and "an".
The suggested tone should be friendly and neutral, the messages should not blame the Programmer [Flow]. Rust and Flix suggest to not use the term 'illegal' and use something like 'invalid' instead.
Flow suggests to avoid "compiler speak". For example terms like 'token' and 'identifier' should be avoided and terms that are more familiar to programmers be used (eg "name" is better). The Racket guide goes further and has a list of allowed technical terms and some prohibited terms.
Several guides (such as Flix and Flow) point out a 80/20 rule: 80% of the times an error message is read, the developer knows that message well and knows exactly what to do. For this use case it's important that the message is short. On the other hand, 20% of the times this same message will have to be understood by a developer who has never seen it before and is confused, and so the message needs to contain enough information to allow them to find out what is going on. So the error message needs to strike a balance between brevity and clarity.
The Racket guide proposes to use the following general structure for errors: 'State the constraint that was violated ("expected a"), followed by what was found instead.'
The Rust guides says to avoid "Did you mean?" and questions in general, and wants the compiler to instead be explicit about why something was suggested. The example the Rust guide gives is: 'Compare "did you mean: Foo" vs. "there is a struct with a similar name: Foo".' Racket goes further and forbids suggestions altogether because "Students will follow well‐meaning‐but‐wrong advice uncritically, if only because they have no reason to doubt the authoritative voice of the tool."
Formatting and Source Positions
The Rust guide suggests to put all identifiers into backticks (like in Markdown), Flow formats the error messages using full Markdown.
The Clang, Flow and Rust guides point out the importance of using precise source code spans to point to errors, which is especially important if the compiler information is used in the context of an IDE to show a red squiggly underline or some other highlighting. The spans should be as small as possible to point out the source of the error [Flow].
I am quite impressed how advanced and well-thought out the approaches are. I wonder whether it would makes sense for Python to adopt a (probably minimal, to get started) subset of these ideas as guidelines for its own errors.
More about the research that lead to the Racket guidelines (including the referenced papers): https://twitter.com/ShriramKMurthi/status/1451688982761381892
Elm's error message catalog: https://github.com/elm/error-message-catalog