POSTS

Just Call Me Johnny 262-Seed

November 12, 2019 Assumed audience: web developers and web platform implementers

Like all good folklore, this tale starts with conformance tests. Test262 is the official test suite for the JavaScript programming language. From browsers to servers and even embedded systems, lots of projects “interpret” JavaScript, and Test262 is one way they make sure they’re doing it right.

In 2014, there were a bunch of new features coming to the language, and the test suite needed a lot of work to include tests for all the new stuff. My day job at Bocoup gave me the amazing opportunity to participate in that effort. As a result, Test262’s reputation improved throughout the web industry. Secondarily, I developed an understanding of how the project could be improved still further.

I’ve been maintaining a project called JSHint for many years now. JSHint accepts JavaScript code as input and looks for potential sources of mistakes. To do this, it doesn’t need to execute the code; it only needs to read it. In terms of software, the process of “reading without executing” is known as “parsing,” and it turns out that many projects parse JavaScript code.

Back at Test262, all those tests were written with JavaScript runtimes in mind, not JavaScript parsers. I’m still proud of the work Bocoup has done (and is doing), but back in 2014, I felt like we were missing a huge audience. If JavaScript parsers were also using the project, not only would they become more correct, but they would be better-positioned to contribute tests of their own, so everyone else would benefit, too.

That’s why I set out to disseminate Test262 among JavaScript parsers.

Refactoring

First thing’s first: the tests needed to clearly state their “pass” and “failure” conditions. This might sound like an obvious requirement for a well-established suite like Test262. Why hadn’t we addressed it yet? The problem was that the expectations were expressed in a way that could only be verified by runtimes. For instance:

/*---
description: IdentifierReference to a non-existent binding produces a ReferenceError
negative: ReferenceError
---*/

thisVariableDoesNotExist;

This is a “negative” test because it expects an error to be reported. Loosely speaking, it verifies that a JavaScript ReferenceError is thrown whenever code tries to use an undefined variable. A JavaScript parser can’t reliably pass this test since it can’t reliably detect when a variable is defined. Compare that with a test like this:

/*---
description: Number values are not valid assignment targets
negative: ReferenceError
---*/

0 = 0;

This test also expects a ReferenceError, but unlike the one before it, it MUST be reported by a JavaScript parser.

In order to help its consumers make this distinction, Test262 needed to provide additional information about expected errors. Specifically, it needed to specify if the errors should occur while parsing or while executing. In that way, JavaScript parsers would know that the first test should be parsed without error (it is valid JavaScript, after all) and the second should produce a parsing error.

So after some public discussion and a few false starts, I extended the test metadata to specify the “phase” of the expected error. Put in these terms, that second test would look like this:

/*---
description: Number values are not valid assignment targets
negative:
  phase: early
  type: ReferenceError
---*/

0 = 0;

With this metadata in place, Test262 was ready for use by parsers!

Dissemination

JSHint made for a good first choice since it’s a project I’ve been maintaining for a few years. Compared to designing and implementing spanning changes to the widely-used Test262 project, adding a bit of testing infrastructure to JSHint was refreshingly straightforward. The most notable aspect was the number of tests the project was failing–a whopping 1,406 tests, or just under 6% of the test suite. We had our work cut out for us in JSHint, but that’s a topic for another post. For now, I felt confident bringing Test262 to other parsers.

Next up was the parser of the popular Babel tool. The reviewers were kind and courteous, and we had a solution merged in just under a month.

Both JSHint and Babel are Node.js-based projects, so there ended up being a lot in common between their integrations. Since there were still two more Node.js-based parsers on the list, I abstracted the shared code into a standalone library, test262-stream. This creates a Node.js object stream that emits test material for a given file system directory containing the Test262 repository.

The next parsers on the list were Esprima and Acorn, and test262-stream made the patches much more manageable. Although it can be tough to convince open source project maintainers to accept your feature branch, adding tests is decidedly easier. The Esprima maintainers helped merge my patch in short order, but the story with Acorn was a little more interesting.

Folks had noticed my contribution to Babel, and they had already repurposed it for use in Acorn. Only in the world of free software will you compete with your own work! Stranger still, the Acorn maintainers ultimately selected the other patch. I felt a little like Charlie Chaplin in the apocryphal Charlie Chaplin lookalike contest, but my ego could stand to take a hit or two. The important thing was that Acorn was using Test262.

More recently, I’ve been pitching Test262 integration to syntax highlighting projects. The idea is that in order to highlight JavaScript source code, these projects must first recognize each syntax fragment. I used to think that entails following a subset of the language’s parsing rules. With still more metadata, I thought Test262 could be used to validate these projects, too.

The thing is, you can’t tokenize JavaScript without parsing it. Any syntax highlighter that wants to be correct enough to satisfy Test262 (even a new version tailored to their needs) will need to implement a full-blown parser, anyway. Most highlighting projects (and their consumers) will probably agree that lexing JavaScript is a lot like horse shoes and hand grenades–being “close” is good enough.

“Incredibly inconvenient”

Meanwhile! Other developers were starting to scrutinize Test262 with an eye toward parsing. They found a great many tests about parsing rules were structured in a way that only runtimes could use. Basically: the tests used eval.

Here’s an example:

/*---
description: with statement in strict mode throws SyntaxError
---*/
"use strict";

assert.throws(SyntaxError, function() {
  eval("with ({}) { throw new Error();}");
});

That’s a valid test, but the interesting parsing part is tied up in the string that’s being passed to eval, and parsers can’t do much with it. We could use the new extended metadata to rewrite the test like this:

/*---
description: with statement in strict mode throws SyntaxError
negative:
  phase: parse
  type: SyntaxError
---*/
"use strict";

with ({}) {}

In additional to being meaningful to parsers, this made the test more direct for runtimes (eval is a pretty complicated function).

Isiah Meadows was kind enough to provide an extensive list of candidates for refactoring, so I set out to remove the parser-hostile pattern from Test262.

This seemed to scream for a tool-assisted approach, so I experimented with some light (and ugly) scripts built from regular expressions. Unfortunately, applying them required a lot of oversight because only some of the tests were as straightforward as the example above. Many others were more like these tests for for statements–they were misnamed and needed to be relocated. Some were actually duplicates and had to be removed (e.g. these tests for line terminators). Still others were incomplete and needed to be extended in order to offer full coverage (e.g. these tests for the directive prologue).

All told, the effort took over a year. Throughout, longtime Test262 maintainers Rick Waldron and Leo Balter performed the crucial yet unglamorous job of code review. These were some of the most boring patches I’ve ever written; their dullness could only be worse for a reviewer. All the same, Rick and Leo looked over each one without delay.

In the years since enabling consumption by parsers, we’ve made a few improvements to the test structure.

First, we began requiring that all tests for parsing errors include a throw statement as their very first statement.

/*---
description: Invalid use of RegExp quantifier
negative:
  phase: early
  type: SyntaxError
---*/

throw "Test262: This statement should not be evaluated.";
/?/;

This doesn’t matter much for a parser because the code is invalid with or without the throw statement. The new statement ensures that runtimes reject the code without executing it. Believe it or not, that actually caught bugs. Even today, V8 (the engine that powers the Google Chrome browser and Node.js) does not report invalid regular expression literals as “early” errors–it produces a runtime exception when the faulty literals are evaluated.

Prior to this change, V8 had been passing tests like the one above. With the inclusion of the throw statement, it could no longer pass those tests.

Later, we recognized that one kind of “early” errors is not necessarily detectable by parsers. When JavaScript code imports one or more modules, all of the modules are retrieved and parsed before any code is executed. If one module imports another, and that second includes a syntax error, then a syntax error should be reported.

A test along these lines might look like this:

/*---
description: Syntax error during module resolution
negative:
  phase: early
  type: SyntaxError
---*/

import 'FIXTURE.js';

The code in this test is syntactically valid. The test only works if a file named FIXTURE.js is also available, and if that file has invalid syntax.

While JavaScript parsers could resolve module references and detect problems like this, none of them currently do, and more importantly, it’s not clear if they should. In recognition of this, we further expanded the test metadata by introducing a new phase for negative tests and named it “resolution”. This allows parsers to further refine the way they interpret the tests according to whether or not they resolve module references.

Conclusion

Today, all of Test262’s 70,000+ tests are useful to JavaScript runtimes and JavaScript parsers. Thanks to the conventions we established, all new tests satisfy this criteria as a matter of course. Thanks to the tooling we developed, more projects share the same infrastructure for interpreting the tests (the general-purpose test262-harness now relies on test262-stream, so even runtimes make use of the new project).

More than any specific test change, these shifts in process have been the most satisfying. Although much of the work was pretty mundane, I’m grateful for the help I received from implementers and other web standards experts. They were consistently welcoming and enthusiastic, and that really helped me see this project through.

I’ve reached my goals for this project, but I still enjoy the work of integrating codebases with public test suites. Please contact me if you have a project that could benefit from Test262 (or from the web-platform-tests, for that matter). I’d love to help you get involved!

My left eye sees the past

My left eye sees the past

My left eye sees the past

Just Call Me Johnny 262-Seed

Refactoring

Dissemination

“Incredibly inconvenient”

Conclusion

Refactoring

Dissemination

“Incredibly inconvenient”

Refinements

Conclusion