Hi all,
I’m trying to implement a Matlab language. It’s mostly going fine, and I can implement a basic form that highlights my most-used syntax very quickly! However, now that I’m starting to experiment with more advanced syntaxes and scoping, I’m running into some conceptual issues.
I’ll describe an example of what I’m trying to accomplish first. Matlab has an anonymous function syntax (much like Python’s lambda
or Mathematica’s #...&
syntax). It looks like this:
@(x,y) x*y
The @
operator starts the syntax form, immediately followed by the list of arguments in parentheses ((x,y)
in this example), then a single expression as the body.
Here’s my current implementation (which neither works nor is generic enough to be useful):
<scope name="matlab.expression.anonymous-function">
<symbol type="function" anonymous="true">
<context behavior="subtree" />
</symbol>
<starts-with>
<expression>@(?=\()</expression>
</starts-with>
<ends-with>
<expression>\)</expression>
</ends-with>
<subscopes>
<include syntax="self" collection="arguments" />
</subscopes>
</scope>
I am having some conceptually related problems: I always end up with scope capturing much more than I need. One problem is that I keep capturing multiple sets of parameters. I have a matlab.arguments
scope that starts with \(
and ends with \)
and defines argument symbols. It perfectly captures the argument clause. But if I create a case like @(x) (x^2)
, it captures both the (x)
and (x^2)
as argument lists, not body. This is problematic for my current implementation, since it means that the body will guaranteed be picked up as another arguments.
The trickier problem for me is that like with other languages, an anonymous function can be defined in another expression. Here’s an example of what I mean:
q = integral(@(x) x.^2,0,1);
Conceptually, the body of expression can contain only a single expression, but I am unsure how to express that in Nova’s language spec. Nova seems to require that I have a well-defined set of characters that terminate a scope, while Matlab terminates its expression more conceptually (similarly to the way that you’d see in a Backus-Naur form spec). I suppose I could do something like ends-at: (?=;|$|,|\))
, but this is very brittle. I could also artificially require that the body expression be contained in parentheses, but this seems uncommon in the Matlab world. (This is what I’ve done now, but would like to get rid of it.)
This leads me to what I expect to be able to do: define a scope such that it contains a specified number of instances of a subscope. So the anonymous function scope would be defined something as such:
- Starts with
@
- Contains a single
arguments
scope. - Contains a single
expression
(as I’ve implemented it now, this would be an include from theexpression
collection). - Ends after these are complete.
I realize this may be reaching into the “XY problem” territory, but I would appreciate help with this. If there’s a simple solution to the anonymous function problems, that would be great. And if quantified expressions aren’t currently possible, I’d consider it a feature request. In my mind, this would make it much simpler to adapt language syntax specs (for example in BNF) into the Nova API.
(Sorry that I went on for so long!)