disallow keywords as identifiers
Summary:
In PHP and Hack, some keywords are reserved in the sense that they cannot be used as identifier. For example, control flow keywords like `if` are reserved: when used as a function or class name, an error message like "unexpected token T_IF" is issued by php or HPHPC. On the other hand, other keywords, like `self` or `bool` or `attribute`, can usually be used as an identifier. To be more precise, this is what is allowed where:
* `function <name>() {}`: all non-reserved keywords are allowed for `<name>`
* `class <name> {}`: all non-reserved keywords are allowed for `<name>`, except for types like `bool`, in which case the error is not `Unexpected token T_BOOL` but `bool is reserved`.
* `const <name> = ...`: all keywords are allowed, even reserved ones like `if`
* `function <name>() {}` as a method definition: same, all keywords are allowed
* enums: all keywords are allowed
* as a consequence of these, all keywords, even reserved ones, are allowed after a `$`, `->` or `::` symbol, but in any other place where we used to call `next_token_as_name` (see below), we should actually not allow reserved keywords.
We use to have:
* `next_token` (get next token such that non-keywords words like `blah` have token kind `Name` while keywords like `bool` have the corresponding keyword token kind, like `Bool`)
* `next_token_as_name` (get next token, such that all words, including keywords, have token kind `Name` (but a symbol like `==>` would have token kind `Arrow` obviously))
So we now need a third alternative:
* `next_token_non_reserved_as_name`: get next token such that non-keywords words like `blah`, and non-reserved keywords like `bool`, have token kind `Name` while reserved keywords like `if` have the corresponding keyword token kind, like `If`.
Similarly, we used to have:
* `require_name`: get next token with `next_token` and check if it is a `Name`
* `require_name_allow_keyword`: get next token with `next_token_as_name` and check if it is a `Name` (IOW, just check we don't have a symbol with punctuation like `==>`)
We now need in addition:
* `require_name_allow_non_reserved`: get next token with `next_token_for_identifier` and check that we get a `Name`
Non-reserved keywords are flagged as `allowed_as_identifier` in `Full_fidelity_schema` (suggestions welcome for the name of this flag), so that the generated `Full_fidelity_token_kind.from_string` has additional guards. In terms of performance, this additional guard costs a boolean check each time a non-reserved keyword is lexed. Alternatively, we could simply run a full additional match on tokens each time we call `next_token_for_identifier`. It's not obvious to me which is less costly, nor whether that matters at all actually.
Reviewed By: jamesjwu
Differential Revision:
D8768546
fbshipit-source-id:
dca0d04debba7a14127ee675911c8c5bba3ecf86