POSTS
Just Call Me Johnny 262-Seed
Assumed audience: web developers and web platform implementersLike all good folklore, this tale starts with conformance tests. Test262 is the official test suite for the JavaScript programming language. From browsers to servers and even embedded systems, lots of projects “interpret” JavaScript, and Test262 is one way they make sure they’re doing it right.
In 2014, there were a bunch of new features coming to the language, and the test suite needed a lot of work to include tests for all the new stuff. My day job at Bocoup gave me the amazing opportunity to participate in that effort. As a result, Test262’s reputation improved throughout the web industry. Secondarily, I developed an understanding of how the project could be improved still further.
I’ve been maintaining a project called JSHint for many years now. JSHint accepts JavaScript code as input and looks for potential sources of mistakes. To do this, it doesn’t need to execute the code; it only needs to read it. In terms of software, the process of “reading without executing” is known as “parsing,” and it turns out that many projects parse JavaScript code.
Back at Test262, all those tests were written with JavaScript runtimes in mind, not JavaScript parsers. I’m still proud of the work Bocoup has done (and is doing), but back in 2014, I felt like we were missing a huge audience. If JavaScript parsers were also using the project, not only would they become more correct, but they would be better-positioned to contribute tests of their own, so everyone else would benefit, too.
That’s why I set out to disseminate Test262 among JavaScript parsers.
Refactoring
First thing’s first: the tests needed to clearly state their “pass” and “failure” conditions. This might sound like an obvious requirement for a well-established suite like Test262. Why hadn’t we addressed it yet? The problem was that the expectations were expressed in a way that could only be verified by runtimes. For instance:
/*---
description: IdentifierReference to a non-existent binding produces a ReferenceError
negative: ReferenceError
---*/
thisVariableDoesNotExist;
This is a “negative” test because it expects an error to be reported. Loosely
speaking, it verifies that a JavaScript ReferenceError
is thrown whenever
code tries to use an undefined variable. A JavaScript parser can’t reliably
pass this test since it can’t reliably detect when a variable is defined.
Compare that with a test like this:
/*---
description: Number values are not valid assignment targets
negative: ReferenceError
---*/
0 = 0;
This test also expects a ReferenceError
, but unlike the one before it, it
MUST be reported by a JavaScript parser.
In order to help its consumers make this distinction, Test262 needed to provide additional information about expected errors. Specifically, it needed to specify if the errors should occur while parsing or while executing. In that way, JavaScript parsers would know that the first test should be parsed without error (it is valid JavaScript, after all) and the second should produce a parsing error.
So after some public discussion and a few false starts, I extended the test metadata to specify the “phase” of the expected error. Put in these terms, that second test would look like this:
/*---
description: Number values are not valid assignment targets
negative:
phase: early
type: ReferenceError
---*/
0 = 0;
With this metadata in place, Test262 was ready for use by parsers!
Dissemination
JSHint made for a good first choice since it’s a project I’ve been maintaining for a few years. Compared to designing and implementing spanning changes to the widely-used Test262 project, adding a bit of testing infrastructure to JSHint was refreshingly straightforward. The most notable aspect was the number of tests the project was failing–a whopping 1,406 tests, or just under 6% of the test suite. We had our work cut out for us in JSHint, but that’s a topic for another post. For now, I felt confident bringing Test262 to other parsers.
Next up was the parser of the popular Babel tool. The reviewers were kind and courteous, and we had a solution merged in just under a month.
Both JSHint and Babel are Node.js-based projects, so there ended up being a lot in common between their integrations. Since there were still two more Node.js-based parsers on the list, I abstracted the shared code into a standalone library, test262-stream. This creates a Node.js object stream that emits test material for a given file system directory containing the Test262 repository.
The next parsers on the list were Esprima and Acorn, and test262-stream made the patches much more manageable. Although it can be tough to convince open source project maintainers to accept your feature branch, adding tests is decidedly easier. The Esprima maintainers helped merge my patch in short order, but the story with Acorn was a little more interesting.
Folks had noticed my contribution to Babel, and they had already repurposed it for use in Acorn. Only in the world of free software will you compete with your own work! Stranger still, the Acorn maintainers ultimately selected the other patch. I felt a little like Charlie Chaplin in the apocryphal Charlie Chaplin lookalike contest, but my ego could stand to take a hit or two. The important thing was that Acorn was using Test262.
More recently, I’ve been pitching Test262 integration to syntax highlighting projects. The idea is that in order to highlight JavaScript source code, these projects must first recognize each syntax fragment. I used to think that entails following a subset of the language’s parsing rules. With still more metadata, I thought Test262 could be used to validate these projects, too.
The thing is, you can’t tokenize JavaScript without parsing it. Any syntax highlighter that wants to be correct enough to satisfy Test262 (even a new version tailored to their needs) will need to implement a full-blown parser, anyway. Most highlighting projects (and their consumers) will probably agree that lexing JavaScript is a lot like horse shoes and hand grenades–being “close” is good enough.
“Incredibly inconvenient”
Meanwhile! Other developers were starting to scrutinize Test262 with an eye
toward parsing. They found a great many tests about parsing rules were
structured in a way that only runtimes could use. Basically: the tests used
eval
.
Here’s an example:
/*---
description: with statement in strict mode throws SyntaxError
---*/
"use strict";
assert.throws(SyntaxError, function() {
eval("with ({}) { throw new Error();}");
});
That’s a valid test, but the interesting parsing part is tied up in the string
that’s being passed to eval
, and parsers can’t do much with it. We could use
the new extended metadata to rewrite the test like this:
/*---
description: with statement in strict mode throws SyntaxError
negative:
phase: parse
type: SyntaxError
---*/
"use strict";
with ({}) {}
In additional to being meaningful to parsers, this made the test more direct
for runtimes (eval
is a pretty complicated
function).
Isiah Meadows was kind enough to provide an extensive list of candidates for refactoring, so I set out to remove the parser-hostile pattern from Test262.
This seemed to scream for a tool-assisted approach, so I experimented with some
light (and ugly) scripts built from regular expressions. Unfortunately,
applying them required a lot of oversight because only some of the tests were
as straightforward as the example above. Many others were more like these
tests for for
statements–they
were misnamed and needed to be relocated. Some were actually duplicates and had
to be removed (e.g. these tests for line
terminators).
Still others were incomplete and needed to be extended in order to offer full
coverage (e.g. these tests for the directive
prologue).
All told, the effort took over a year. Throughout, longtime Test262 maintainers Rick Waldron and Leo Balter performed the crucial yet unglamorous job of code review. These were some of the most boring patches I’ve ever written; their dullness could only be worse for a reviewer. All the same, Rick and Leo looked over each one without delay.
Refinements
In the years since enabling consumption by parsers, we’ve made a few improvements to the test structure.
/*---
description: Invalid use of RegExp quantifier
negative:
phase: early
type: SyntaxError
---*/
throw "Test262: This statement should not be evaluated.";
/?/;
This doesn’t matter much for a parser because the code is invalid with or
without the throw
statement. The new statement ensures that runtimes reject
the code without executing it. Believe it or not, that actually caught bugs.
Even today, V8 (the engine that powers the Google Chrome
browser and
Node.js) does not report invalid regular expression
literals as “early” errors–it produces a runtime exception when the faulty
literals are evaluated.
Prior to this change, V8 had been passing tests like the one above. With the
inclusion of the throw
statement, it could no longer pass those tests.
Later, we recognized that one kind of “early” errors is not necessarily detectable by parsers. When JavaScript code imports one or more modules, all of the modules are retrieved and parsed before any code is executed. If one module imports another, and that second includes a syntax error, then a syntax error should be reported.
A test along these lines might look like this:
/*---
description: Syntax error during module resolution
negative:
phase: early
type: SyntaxError
---*/
import 'FIXTURE.js';
The code in this test is syntactically valid. The test only works if a file
named FIXTURE.js
is also available, and if that file has invalid syntax.
While JavaScript parsers could resolve module references and detect problems like this, none of them currently do, and more importantly, it’s not clear if they should. In recognition of this, we further expanded the test metadata by introducing a new phase for negative tests and named it “resolution”. This allows parsers to further refine the way they interpret the tests according to whether or not they resolve module references.
Conclusion
Today, all of Test262’s 70,000+ tests are useful to JavaScript runtimes and JavaScript parsers. Thanks to the conventions we established, all new tests satisfy this criteria as a matter of course. Thanks to the tooling we developed, more projects share the same infrastructure for interpreting the tests (the general-purpose test262-harness now relies on test262-stream, so even runtimes make use of the new project).
More than any specific test change, these shifts in process have been the most satisfying. Although much of the work was pretty mundane, I’m grateful for the help I received from implementers and other web standards experts. They were consistently welcoming and enthusiastic, and that really helped me see this project through.
I’ve reached my goals for this project, but I still enjoy the work of integrating codebases with public test suites. Please contact me if you have a project that could benefit from Test262 (or from the web-platform-tests, for that matter). I’d love to help you get involved!