How to handle left recursion in syntax highlightning?

I want to handle the optional type in the syntax highlightning, shown below (expressed in PEG grammar):

type -> optional_type | array_type | identifier
optional_type -> type?
array_type -> [type]
identifier -> regularExpression("[a-zA-Z_][a-zA-Z0-9_]*")

To parse the optional_type, we need to first look ahead and find the ?. I don’t know how to do so.

Are you asking how to do lookaheads with regular expressions? The regex lookahead pattern is (?=...) so to match type in type? the regex would be type(?=\?).

If you want to match all of type? but only capture type, then you could use (type)\?, and then use a capture syntax XML element like <capture number="1" name="mysyntax.type.optional" />. For an example, see Nova matching documentation.

If you don’t know about it yet, regex101.com is an incredibly useful tool for testing your regexes. It also has a reference in the lower right for finding the syntax for regex patterns.

It seems that this approach works only if type can be parsed by a regular expression.

But often, we want type to be a scope defined by a collection (defined in Nova syntax), so it can be recursively reused.

Yes, scopes are defined by regular expression matches. So for a scope you want to reuse, you put it in a separate collection. Here’s an example collection from what you have above:

<collection name="types">
  <scope name="mysyntax.type.optional">
    <expression>\b([a-zA-Z_][a-zA-Z0-9_]*)\?</expression>
    <capture number="1" name="mysyntax.type.optional.identifier" />
  </scope>
  <scope name="mysyntax.type.array">
    <expression>\[([a-zA-Z_][a-zA-Z0-9_]*)\]</expression>
    <capture number="1" name="mysyntax.type.array.identifier" />
  </scope>
  <scope name="mysyntax.type.identifier">
    <expression>(?<!\[)\b[a-zA-Z_][a-zA-Z0-9_]*\b(?!\]|\?)</expression>
  </scope>
</collection>

You then can use this collection in subscopes (for a scope with starts-with and ends-with elements) with an include element. So say you want to create a variable definition scope for a syntax that looks like this:

var name1: SomeType = some expression;
var name2: [SomeType];

This scope may look like the following:

<scope name="mysyntax.definition.variable">
  <starts-with>
    <expression>\b(var)\s+([a-zA-Z_][a-zA-Z0-9_]*)(:)?\s*</expression>
    <capture number="1" name="mysyntax.keyword.construct.variable" />
    <capture number="2" name="mysyntax.identifier.name" />
  </starts-with>
  <ends-with>
    <expression>(\=)(?!\=)|(\;)</expression>
    <capture number="1" name="mysyntax.operator" />
    <capture number="2" name="mysyntax.semicolon" />
  </ends-with>
  <subscopes>
    <include syntax="self" collection="types" />
  </subscopes>
</scope>

The types collection can be reused/included in other scopes like function parameter types, return types, etc. Moreover, if this variable definition scope is in a collection, then that definition collection can be used in other subscopes to achieve the “recursive” behavior.