Symbolication of method implementations

I’m trying to create symbols for a Rust extension, and for anyone unfamiliar Rust defines struct methods (and enum methods, cuz Rust be wildin’) in an implementation block outside the scope of the struct definition. It looks like this:

struct MyStruct {
    prop: String,
}

impl MyStruct {
    fn my_method(&self) {
        // do something
    }
}

Is there a way for me to do something like one of the following?

  • Nest my_method under the MyStruct symbol
  • Grab the struct name from the parent scope so I can provide a better display name for the method symbol (e.g., MyStruct.my_method)

The first option (single symbol) would be preferred.

Code for my feeble attempt to do the second option above
<scope name="rust.definition.implementation">
  <starts-with>
    <expression>(impl)</expression>
    <capture number="0" name="rust.keyword.construct" />
  </starts-with>
  <ends-with />
  <subscopes anchored="true">
    <include syntax="self" collection="generics" optional="true" />
    <include syntax="self" collection="types" name="rust.type.implementation.name" />
    <include syntax="self" collection="generics" optional="true" />
    <scope name="rust.block.implementation">
      <starts-with>
        <expression>\{</expression>
        <capture number="0" name="rust.bracket" />
      </starts-with>
      <ends-with>
        <expression>\}</expression>
        <capture number="0" name="rust.bracket" />
      </ends-with>
      <subscopes>
        <scope name="rust.definition.method">
          <symbol type="method">
            <display-name>
              <component selector="rust.type.implementation.name" />
              <component variable="name" />
            </display-name>
          </symbol>
          <expression>\b(?:(pub)\s+)?(fn)\s+([a-zA-Z_][a-zA-Z0-9_]*)\b</expression>
          <capture number="1" name="rust.keyword.modifier.visibility" />
          <capture number="2" name="rust.keyword.construct" />
          <capture number="3" name="rust.identifier.method.name" />
        </scope>
      </subscopes>
    </scope>
  </subscopes>
</scope>

The code above doesn’t work, I just see the method name without the struct name. Any suggestions would be greatly appreciated. Thanks!

1 Like

My attempt at nesting methods kind of worked with the start-next-end context. I set the struct definition as behavior="start" and the implementation block as behavior="next", set group-by-name="true", and didn’t set a symbol type for the implementation block (if I did, then it would create a nested struct with the same name with nested methods). There are a few issues I can’t resolve:

  • Because there can be multiple impl blocks per struct/enum, I can’t set an end for the behavior. I couldn’t seem to end the symbol, so anything defined afterwards was also nested under the struct symbol. Setting the unclosed attribute had no effect.
  • The Rust compiler allows structs to be defined after struct methods. In this case, the methods aren’t nested below the struct.

Due to the above, I’m not sure if the structs are actually being combined, or if it just looks like it because the untyped symbol doesn’t nest things, and then the start symbol nests everything below it.

Relevant start-next-end syntax definitions for structs
<scope name="definition.struct.rust">
  <symbol type="struct">
    <context behavior="start" group-by-name="true" unclosed="truncate" />
  </symbol>

and the impl scope:

<scope name="rust.definition.implementation">
  <symbol>
    <context behavior="next" group-by-name="true" foldable="false" unclosed="truncate" />
  </symbol>
  <starts-with>
    <expression>(impl)</expression>
    <capture number="0" name="rust.keyword.construct" />
  </starts-with>
  <ends-with />
  <subscopes anchored="true">
    <include syntax="self" collection="generics" optional="true" />
    <scope name="rust.identifier.type.struct">
      <expression>\b([a-zA-Z_][a-zA-Z0-9_]*)\b</expression>
      <capture number="1" name="rust.identifier.type.struct.name" />
    </scope>
    <include syntax="self" collection="generics" optional="true" />
    <scope name="rust.block.implementation">
      <starts-with>
        <expression>\{</expression>
        <capture number="0" name="rust.bracket" />
      </starts-with>
      <ends-with>
        <expression>\}</expression>
        <capture number="0" name="rust.bracket" />
      </ends-with>
      <subscopes>
        <include syntax="self" collection="method-definition" />
      </subscopes>
    </scope>
  </subscopes>
</scope>

Revisiting this. @logan is this a bug? If I define a bunch of structs one after the other (NOT nested), then here’s what I see:

Problems I see above:

  • Structs (which have behavior="start") nest everything that follows under themselves, well past when their scope ends
  • I added type="struct" to the impl block, so note that group-by-name="true" is not being honored for ‘TypedData’ above.
  • Again, changing the unclosed attribute had no effect on this nesting behavior

Has anyone else successfully implemented start-next-end contexts?

Unfortunately, without seeing the grammar itself, I can’t say for certain. Would you be able to link me to it?

Here’s the current definition of structs:

And here is the definition of struct methods:

For reference, here’s the Rust Book chapter on Method Syntax – in case that’s what you meant by “the grammar itself”.

It seems that the reason things are being collected recursively is that your function-blocks collection includes the entire grammar recursively (using syntax="self"). While the parse tree is likely being constructed properly, using a “start” and “next” symbol behavior without an explicit “end” somewhere is likely not designed to do what you intend.

Truncating symbolic ranges with the “truncate” option is only applied at the end of a document when there are no more symbols to apply, not when a new “start” context is encountered. By then, each Rust struct has already been embedded within its (incorrect) “parent,” hence why your screenshot shows it recursively nesting.

My suggestion would be to symbolicate the ending bracket } in some way as an "end" marker. It should automatically end whatever container is active (be it struct or method). Another possibility might be to symbolicate the struct and impl separately.

Unfortunately, this is a consequence of how Rust separates the struct definition and impl definitions, as the parser wasn’t designed to handle this behavior (as we haven’t seen a counterpart in other languages so far.) I can investigate if change to this in the future might be helpful!

Thank you so much for looking into this and for your detailed write-up!

I mentioned above that I don’t think I can designate an "end" marker because the syntax allows multiple impl blocks for structs. The answer for now seems to be to not try and nest methods under structs, so I appreciate the help!

I don’t think Rust is entirely unique in attaching methods to structs outside of the struct definition. I can at least point to Go as an example where methods are attached to structs by defining a “receiver” for the function (for reference). I would agree that this approach is uncommon.

For @logan or anyone else constructing symbol queries with Tree Sitter – does Tree Sitter enable anything new as far as combining symbol definitions? Am I correct that the scope.groupByName setting only applies to queries using @start or @end captures?

It would be great if scope.groupByName could “merge” symbols with the same name with a @subtree scope. The @start and @end captures don’t seem to work well with impl blocks – these blocks often are not contiguous and there can be several of them. I think it would be a lot cleaner and more understandable and navigable to have all methods nested under a single struct symbol, rather than symbols for each impl block.

I’d love to hear if anyone has other ideas!