Matlab anonymous functions and quantified instances of a scope

mcclurgm · November 21, 2020, 12:18am

Hi all,
I’m trying to implement a Matlab language. It’s mostly going fine, and I can implement a basic form that highlights my most-used syntax very quickly! However, now that I’m starting to experiment with more advanced syntaxes and scoping, I’m running into some conceptual issues.

I’ll describe an example of what I’m trying to accomplish first. Matlab has an anonymous function syntax (much like Python’s lambda or Mathematica’s #...& syntax). It looks like this:

@(x,y) x*y

The @ operator starts the syntax form, immediately followed by the list of arguments in parentheses ((x,y) in this example), then a single expression as the body.

Here’s my current implementation (which neither works nor is generic enough to be useful):

<scope name="matlab.expression.anonymous-function">
    <symbol type="function" anonymous="true">
        <context behavior="subtree" />
    </symbol>
    <starts-with>
        <expression>@(?=\()</expression>
    </starts-with>
    <ends-with>
        <expression>\)</expression>
    </ends-with>
    <subscopes>
        <include syntax="self" collection="arguments" />
    </subscopes>
</scope>

I am having some conceptually related problems: I always end up with scope capturing much more than I need. One problem is that I keep capturing multiple sets of parameters. I have a matlab.arguments scope that starts with $ and ends with $ and defines argument symbols. It perfectly captures the argument clause. But if I create a case like @(x) (x^2), it captures both the (x) and (x^2) as argument lists, not body. This is problematic for my current implementation, since it means that the body will guaranteed be picked up as another arguments.

The trickier problem for me is that like with other languages, an anonymous function can be defined in another expression. Here’s an example of what I mean:

q = integral(@(x) x.^2,0,1);

Conceptually, the body of expression can contain only a single expression, but I am unsure how to express that in Nova’s language spec. Nova seems to require that I have a well-defined set of characters that terminate a scope, while Matlab terminates its expression more conceptually (similarly to the way that you’d see in a Backus-Naur form spec). I suppose I could do something like ends-at: (?=;|$|,|\)), but this is very brittle. I could also artificially require that the body expression be contained in parentheses, but this seems uncommon in the Matlab world. (This is what I’ve done now, but would like to get rid of it.)

This leads me to what I expect to be able to do: define a scope such that it contains a specified number of instances of a subscope. So the anonymous function scope would be defined something as such:

Starts with @
Contains a single arguments scope.
Contains a single expression (as I’ve implemented it now, this would be an include from the expression collection).
Ends after these are complete.

I realize this may be reaching into the “XY problem” territory, but I would appreciate help with this. If there’s a simple solution to the anonymous function problems, that would be great. And if quantified expressions aren’t currently possible, I’d consider it a feature request. In my mind, this would make it much simpler to adapt language syntax specs (for example in BNF) into the Nova API.

(Sorry that I went on for so long!)

logan · November 23, 2020, 5:53pm

I believe a feature that we are planning for Nova 4 will help here, and follows pretty closely to what you describe.

Currently, the parser repeats subscopes continuously until the ends-with expression of a parent is matched. However, with the new modifications to the parser coming, you’ll be able to specify that subscopes should be matched in order, allowing procedural constructs to be parsed much more effectively.

In your specific case, you’ll be able to specify that your “arguments” collection should only match once before the parent is closed, which is how pretty much all procedural languages would like to be parsed.

We hope to have this feature ready by early December.

mcclurgm · November 23, 2020, 8:25pm

Awesome, thanks for the reply! That sounds like exactly what I was looking for.

One thing I’d like to clear up. Would it be possible to close the parent at the end of one of these? I can’t tell if, once it matches the one argument, I could tell it to then just close the parent. Would that be possible, or would it continue (just matching nothing) until it reaches the ends-with expression?

logan · November 23, 2020, 9:18pm

Yes. It would support both matching the ends-with expression (if it is provided), or simply closing out the parent with a zero-length range if there are no further subscope matches to make.