Code folding, symbol contexts and scope

Many thanks to everyone at Panic for creating what’s already a fantastic editor, and an extension framework that is (mostly!) a joy to work with. It’s particularly great to see such extensive and well-written documentation. I really feel motivated to integrate my workflows with Nova.

Having said that, I’m struggling to fully understand how code folding, symbol contexts and scope interact. I’ve published a couple of versions of my R extension so far, but I’ve been going around in circles somewhat when trying to handle function definitions. For background, R allows unbraced single-statement functions like

add1 <- function(x) x+1

and compound functions with curly braces, viz.

add2 <- function(x)
{
    return (x + 2)
}

Arguments can be fairly complex, with or without default values, and can potentially include anonymous functions, calls to other functions, etc.

My strategy so far has been to use a match scope to capture assignment with the function keyword on the right-hand side, a start–end scope to capture the bracketed arguments, and a pair of scopes to match the braces at the beginning and end of blocks, where present. This allows the function name to be symbolicated, and the function code block to fold between the braces, but (presumably) because the function itself has no end-point in this arrangement, argument symbols don’t seem to be available within the function body.

I have looked at various examples from other syntax definitions, but every alternative I’ve tried has produced less successful results than the current one. Whatever I try I can’t get function and argument symbolication and code folding all to work properly. Perhaps I’m thinking about the concepts wrongly – apologies if so – but any pointers would be welcome!

As a subquestion, I was surprised that

<collection name="blocks">
<scope name="r.block.start">
    <symbol type="block">
        <context behavior="start" group="block" />
    </symbol>
    <expression>\{</expression>
    <capture number="0" name="r.bracket.brace" />
</scope>

<scope name="r.block.end">
    <symbol type="block">
        <context behavior="end" group="block" />
    </symbol>
    <expression>\}</expression>
    <capture number="0" name="r.bracket.brace" />
</scope>
</collection>

produces code folding between brace-pairs, but

<collection name="blocks">
<scope name="r.blocks">
    <symbol type="block">
        <context behavior="subtree" group="block" />
    </symbol>
    <starts-with>
        <expression>\{</expression>
        <capture number="0" name="r.bracket.brace" />
    </starts-with>
    <ends-width>
        <expression>\}</expression>
        <capture number="0" name="r.bracket.brace" />
    </ends-width>
    <subscopes>
        <include syntax="self" collection="all" />
    </subscopes>
</scope>
</collection>

does not. In fact, the latter approach (which is the first one I tried) doesn’t match on braces at all, judging from the syntax inspector. This is awkward, because the two-scopes approach doesn’t allow for subscopes so I can’t control parsing within the block.

Sorry to go on for so long!

Hello! Sorry it’s taken me a bit to get to your questions.

For the subtree issues specifically, in your example, you have the ends-with tag spelled incorrectly (ends-width). I think that might be what’s causing it to not match, as the parser discards it if there isn’t a valid end expression. This is definitely something we could be reporting better to extensions (we currently have a feature request out for better [or any] syntax validation).

If this helps your case somewhat, please do let me know if any of the other parts of your question still apply or you are still having trouble!

:man_facepalming:

What a ridiculous mistake on my side – sorry Logan! Thanks for spotting it, though, and I do agree that validation would be very helpful, as I’ve run into other simple issues that a validator would probably have highlighted. Anyway, fixing the typo does indeed resolve the “subquestion”.

I think the wider question still stands, but I’ll try and phrase it more concisely. Even with separate definitions for single-expression functions and block functions, I don’t see how to properly mark their beginning and end points so that argument symbols can be properly scoped for completion, without the parser over-eagerly preferring one syntax or the other and hence producing an incorrect parse. Is it possible, for example, to have a subscope or include that can only match once? Can a started symbol context be abandoned somehow? Is there another way that I can express the end of a single-expression function without there being any special syntax that demarcates it, and without also incorrectly consuming definitions with a block?

I realise this is a complex question, so I entirely understand if you’ve got better things to do than help me unravel it(!), but I thought it was worth asking, in case it’s one of those things that is “easy if you know how”. Thanks again.

I run xmllint to catch syntax errors in my XML files, it comes with macOS in /usr/bin. If we had some kind of XML schema, it could also catch simple mistakes like misspelling an element, wrong nesting, etc.

I don’t know how difficult it is to create an XML schema? Both RelaxNG and W3C XML Schema files are supported.

I have a similar issue with my Ada syntax which also supports “expression functions”, leading to three similar syntactic constructs:

-- Forward declaration.
function Sum(X, Y: Integer) return Integer;

-- Expression function.
function Sum(X, Y: Integer) return Integer is
  (X + Y);

-- Normal function definition with body.
function Sum(X, Y: Float) return Float is
   Result: Float;
begin
   Result := X + Y;
   return Result;
end Sum;

I want a symbol context that spans the whole function body to get proper code folding and completion for the function arguments I the body. My current solution looks like this (captures elided for brevity):

<scope name="ada.definition.function">
    <symbol type="function">
        <!-- Do not start symbol for declarations or expression functions. -->
        <filter match-end="(?&lt;=is)(?!\s*\()"/>
        <context behavior="start" group-by-name="true">
            <auto-close string="end " completion="$name;"/>
        </context>
    </symbol>
    <starts-with>
        <expression>\b(function|procedure)\s+(\w+)</expression>
    </starts-with>
    <ends-with>
        <expression>\b(is)\b|(;)</expression>
    </ends-with>
    <subscopes>
        <include syntax="self" collection="parameter-profile"/>
    </subscopes>
</scope>

This scope matches the beginning of all three constructs, but the <filter match-end=... element means that the symbol context only applies to the full-body function definition. (Thankfully, the regex lookahead can see across newlines to spot the ( on the line after is in the expression function).

Unfortunately, this means that I don’t get any symbol defined for expression functions, so they don’t appear in the Symbols sidebar nor in completions.

I tried using two symbol elements with mutually exclusive filters:

    <symbol type="function">
        <!-- Only expression functions with a `(` following `is` . -->
        <filter match-end="(?&lt;=is)(?=\s*\()"/>
    </symbol>
    <symbol type="function">
        <!-- Only function bodies. -->
        <filter match-end="(?&lt;=is)(?!\s*\()"/>
        <context behavior="start" group-by-name="true">
            <auto-close string="end " completion="$name;"/>
        </context>
    </symbol>

Unfortunately, this doesn’t seem to work – Nova only uses one of the symbol elements as far as I can tell.

I thought about using <context unclosed="truncate"...:

    <symbol type="function">
        <!-- Definitions only: expression functions or full bodies. -->
        <filter match-end="(?&lt;=is)"/>
        <context unclosed="truncate" behavior="start" group-by-name="true">
            <auto-close string="end " completion="$name;"/>
        </context>
    </symbol>

In this case, the symbol would be created for expression functions, but since the end Sum; never appears, the context is truncated to just the function header. Is this an appropriate use of that feature?