This is the spec for the JavaScript syntax tree returned by jsparse
.
It's notable that every token from the source code appears in order in this syntax tree, which
is a Concrete Syntax Tree (CST) rather than an abstract AST.
{{spacer}}
The tree consists of nodes
and tokens. A node consists of a name followed
by zero or more child nodes or tokens. A token consists of
one or more characters from the source code.
For example, the following tree, which might have come from parsing 1 + 2
,
contains nodes named "binary",
"number", and "number"
and the tokens 1, +,
and 2:
{{#nodespec}}["binary", ["number", `1`], `+`, ["number", `2`]]{{/nodespec}}
{{spacer}}
The following notation is used throughout this document to give schematic definitions of
the different types of nodes.
An italicized word refers to a list of one or more possible
nodes or tokens that could fill a certain spot. For example, the following is the
definition of the "binary" node:
{{#nodespec}}["binary", expression, binaryOp, expression]{{/nodespec}}
The expression spot could be filled by one of over 20 types
of expression nodes, and if you look up binaryOp you'll see
it refers to any of a list of binary operator tokens.
A vertical bar (|) separates alternatives, of which exactly
one must be present in the tree. An ellipsis (...) stands
for a sequence of zero or more of the preceding item. For example:
{{#nodespec}}["program", (statement | functionDecl), ...]{{/nodespec}}
This definition says that a program node contains zero or more items, each of which may be either a
statement or a functionDecl.
A question mark (?) indicates that the previous item
(or items enclosed in parentheses) may or may not present.
Token classes like IDENTIFIER match any token in the class. The BOOLEAN, NUMBER, STRING, and REGEX classes are all literals. An example STRING would be "hello".
{{spacer}}
Some definitions with ellipses may seem strangely permissive, such as:
{{#nodespec}}["varStmnt", `var`, (varDecl | `,`), ..., semi]{{/nodespec}}
This seems to say that var;
and var,,x,
are valid statements.
The reason for this style of definition is to discourage reliance on exact comma
positions, which may occasionally vary between JavaScript implementations and versions.
For example, the 5th edition of ECMAScript allows a trailing comma in object literals whereas
the 3rd edition doesn't.
For most applications, the commas aren't that important, and you can read all the varDecls
by just skipping any commas you encounter.
{{#nodespec}}program:{{/nodespec}}
{{#nodespec}}["program", (statement | functionDecl), ...]{{/nodespec}}
{{#nodespec}}statement:{{/nodespec}}
{{#nodespec}}["expressionStmnt", expression, semi]{{/nodespec}}
Function calls and assignments are expressions.
{{#nodespec}}["emptyStmnt", `;`]{{/nodespec}}
Only an actual ; token can create an empty statement.
{{#nodespec}}["blockStmnt", `{`, statement, ..., `}`]{{/nodespec}}
{{#nodespec}}["varStmnt", `var`, (varDecl | `,`), ..., semi]{{/nodespec}}
You can assume at least one varDecl is present.
{{#nodespec}}["ifStmnt", `if`, `(`, expression, `)`, statement, (`else`, statement)?]{{/nodespec}}
{{#nodespec}}["whileStmnt", `while`, `(`, expression, `)`, statement]{{/nodespec}}
{{#nodespec}}["doStmnt", `do`, statement, `while`, `(`, expression, `)`, semi]{{/nodespec}}
{{#nodespec}}["forStmnt", `for`, `(`, forSpec, `)`, statement]{{/nodespec}}
{{#nodespec}}["returnStmnt", `return`, (expression | nil), semi]{{/nodespec}}
{{#nodespec}}["continueStmnt", `continue`, (IDENTIFIER | nil), semi]{{/nodespec}}
{{#nodespec}}["breakStmnt", `break`, (IDENTIFIER | nil), semi]{{/nodespec}}
{{#nodespec}}["throwStmnt", `throw`, expression, semi]{{/nodespec}}
{{#nodespec}}["withStmnt", `with`, `(`, expression, `)`, statement]{{/nodespec}}
{{#nodespec}}["switchStmnt", `switch`, `(`, expression, `)`, `{`, (case | default), ..., `}`]{{/nodespec}}
There's at most one default clause, but it can be anywhere in the list of cases and defaults.
{{#nodespec}}["tryStmnt", `try`, statement, (catch | nil), (finally | nil)]{{/nodespec}}
The statement is always a blockStmnt.
{{#nodespec}}["labelStmnt", IDENTIFIER, `:`, statement]{{/nodespec}}
{{#nodespec}}["debuggerStmnt", `debugger`, semi]{{/nodespec}}
{{#nodespec}}functionDecl:{{/nodespec}}
{{#nodespec}}["functionDecl", `function`, IDENTIFIER, `(`, (IDENTIFIER | `,`), ..., `)`, `{`, (statement | functionDecl), ..., `}`]{{/nodespec}}
Different from a functionExpr only in that the function name is required and not optional.
{{#nodespec}}expression:{{/nodespec}}
{{#nodespec}}["this", `this`]{{/nodespec}}
{{#nodespec}}["null", `null`]{{/nodespec}}
{{#nodespec}}["number", NUMBER]{{/nodespec}}
{{#nodespec}}["boolean", BOOLEAN]{{/nodespec}}
{{#nodespec}}["regex", REGEX]{{/nodespec}}
{{#nodespec}}["string", STRING]{{/nodespec}}
{{#nodespec}}["identifier", IDENTIFIER]{{/nodespec}}
{{#nodespec}}["parens", `(`, expression, `)`]{{/nodespec}}
{{#nodespec}}["array", `[`, (expression | `,`), ..., `]`]{{/nodespec}}
All commas are significant because of element elision, and any
combination of commas and expressions is possible.
The array [,,,7,,8,,]
has 7 and 8 as
its 3rd and 5th elements.
{{#nodespec}}["object", `{`, (prop | `,`), ..., `}`]{{/nodespec}}
{{#nodespec}}["functionExpr", `function`, (IDENTIFIER | nil), `(`, (IDENTIFIER | `,`), ..., `)`, `{`, (statement | functionDecl), ..., `}`]{{/nodespec}}
{{#nodespec}}["dot", expression, `.`, identifierName]{{/nodespec}}
{{#nodespec}}["bracket", expression, `[`, expression, `]`]{{/nodespec}}
{{#nodespec}}["call", expression, `(`, (expression | `,`), ..., `)`]{{/nodespec}}
{{#nodespec}}["new", `new`, expression]{{/nodespec}}
{{#nodespec}}["newcall", `new`, expression, `(`, (expression | `,`), ..., `)`]{{/nodespec}}
{{#nodespec}}["unary", unaryOp, expression]{{/nodespec}}
{{#nodespec}}["binary", expression, binaryOp, expression]{{/nodespec}}
{{#nodespec}}["postfix", expression, postfixOp]{{/nodespec}}
{{#nodespec}}["ternary", expression, `?`, expression, `:`, expression]{{/nodespec}}
{{#nodespec}}["assignment", expression, assignmentOp, expression]{{/nodespec}}
There is no simple constraint on what the first expression can be.
A left-hand-side expression could be a simple identifier like foo
,
or it could be any expression ending in a property access, such as foo().bar[baz]
.
Parentheses are also allowed around the left-hand side, so surprisingly ((x)) = 3
is completely legal and equivalent to x = 3
. This is because
JavaScript evaluates the left-hand side all the way down to the final variable reference.
{{#nodespec}}["comma", (expression | `,`), ...]{{/nodespec}}
You can assume there are at least two expressions, since there would have been no comma otherwise.
{{#nodespec}}nil:{{/nodespec}}
{{#nodespec}}["nil"]{{/nodespec}}
Serves as a placeholder for optional parts of nodes.
{{#nodespec}}semi:{{/nodespec}}
{{#nodespec}}[";"] | `;`{{/nodespec}}
An optional semicolon at the end of a statement. Most semicolons in JavaScript can legally
be omitted if a line break follows. If the semicolon was omitted, a ";"
node takes its place.
{{#nodespec}}prop:{{/nodespec}}
{{#nodespec}}["prop", propertyName, `:`, expression]{{/nodespec}}
{{#nodespec}}propertyName:{{/nodespec}}
{{#nodespec}}["idPropName", identifierName]{{/nodespec}}
{{#nodespec}}["strPropName", STRING]{{/nodespec}}
{{#nodespec}}["numPropName", NUMBER]{{/nodespec}}
A property name in an object literal can be given as a bare identifier, a quoted string literal,
or a number literal. These nodes indicate which it is.
{{#nodespec}}varDecl:{{/nodespec}}
{{#nodespec}}["varDecl", IDENTIFIER, (`=`, expression)?]{{/nodespec}}
{{#nodespec}}forSpec:{{/nodespec}}
{{#nodespec}}["forSpec", (expression | nil), `;`, (expression | nil), `;`, (expression | nil)]{{/nodespec}}
{{#nodespec}}["forVarSpec", `var`, (varDecl | `,`), ..., `;`, (expression | nil), `;`, (expression | nil)]{{/nodespec}}
{{#nodespec}}["forInSpec", expression, `in`, expression]{{/nodespec}}
{{#nodespec}}["forVarInSpec", `var`, varDecl, `in`, expression]{{/nodespec}}
There are technically four types of for-loops in JavaScript.
These definitions suggest some
odd possibilities, and interestingly, they work. The form for (x.foo in y)
will actually
set x.foo
each time through the loop, and for (var x = bar() in y)
will call
bar()
and assign it first thing,
even if x
is immediately overwritten before the first iteration of the loop.
The semicolons are mandatory, even at a line break.
{{#nodespec}}case:{{/nodespec}}
{{#nodespec}}["case", `case`, expression, `:`, statement, ...]{{/nodespec}}
{{#nodespec}}default:{{/nodespec}}
{{#nodespec}}["default", `default`, `:`, statement, ...]{{/nodespec}}
{{#nodespec}}catch:{{/nodespec}}
{{#nodespec}}["catch", `catch`, `(`, IDENTIFIER, `)`, statement]{{/nodespec}}
The statement is always a blockStmnt.
{{#nodespec}}finally:{{/nodespec}}
{{#nodespec}}["finally", `finally`, statement]{{/nodespec}}
The statement is always a blockStmnt.
{{#nodespec}}identifierName:{{/nodespec}}
{{#nodespec}}(IDENTIFIER | KEYWORD | BOOLEAN | `null`){{/nodespec}}
As of ECMAScript 5th edition, keywords and reserved words can be used in some places
where an identifier is expected. For example, x.return()
or
{true: 'yes'}
.
{{#nodespec}}unaryOp:{{/nodespec}}
{{#nodespec}}(`delete` | `void` | `typeof` | `++` | `--` | `+` | `-` | `~` | `!`){{/nodespec}}
{{#nodespec}}binaryOp:{{/nodespec}}
{{#nodespec}}(`*` | `/` | `%` | `+` | `-` | `<<` | `>>` | `>>>` | `<` | `>` | `<=` | `>=` | `instanceof` | `in` | `==` | `!=` | `===` | `!==` | `&` | `^` | `|` | `&&` | `||`){{/nodespec}}
{{#nodespec}}postfixOp:{{/nodespec}}
{{#nodespec}}(`++` | `--`){{/nodespec}}
{{#nodespec}}assignmentOp:{{/nodespec}}
{{#nodespec}}(`=` | `*=` | `/=` | `%=` | `+=` | `-=` | `<<=` | `>>=` | `>>>=` | `&=` | `^=` | `|=`){{/nodespec}}