Especially Conditional Injections

I’ve observed how injections can be made conditioned to certain regions of a syntax using queries. But I have a more advanced case.

I work in modding Minecraft. Minecraft has its own special language, Molang, for handling imperative work in a declarative environment. In particular, Molang, a C-like language, is provided as the content of strings in JSON files. But only a small subset of strings in such JSON files actually use Molang. Ideally, I’d like to author a syntax (or syntaxes) that highlight JSON strings as normal for object keys that wouldn’t support Molang and highlight the contents of strings using Molang for fields that would support it.

Unfortunately, the only paths I see is either to use the most abominable queries ever for injections or to highlight Molang in every string field in this custom JSON, even if it isn’t supported in those fields. Is there another path for me to take? Or am I stuck with this dilemma?

Depending on how a Tree-sitter parser for JSON + Molang builds its syntax tree, you may be able to do this with query predicates, especially if the JSON key-value pairs which use Molang in their string value are predefined / known.

If, for example, the syntax tree is built such that there is a discrete node for the JSON key and its value as siblings to each other in the tree, you can target both simultaneously and then filter on the key.

A potential pseudo-code query, in which the JSON keys “foo” and “bar” are those which use Molang in their values:

(
  (json_key) @key
  ":"
  (string) @injection.content
  (#match? @key "^foo|bar$")
  (#set! injection.language molang)
)

This is likely not accurate to whatever tree the Molang parser is actually generating, but perhaps something like this would be possible?

I also just realized that my suggestion makes the assumption that these files, JSON + Molang, are not presented as standard JSON files (e.g. with a .json file extension). If that’s the case, then Nova’s built-in JSON parser will be in effect by default and that presents further challenges to highlighting Molang within specific keys in any JSON file.

Yeah, it’s going to get complicated and won’t be very future-proof, but I can manage with such queries and a custom JSON implementation in the meantime.

Thank you.

This is certainly not the first or last case where something like this might be nice or necessary for certain frameworks or tooling (using a specific injected language within another well known language, without the outer language having support for code fences like, say, Markdown has). If you think of specific functionality or developer features that might make this easier to accomplish, feel free to suggest them. I’ll make some notes to think about this further as well.

Tree-sitter has been… interesting. It took a bit to get it set up, and even once I did, it didn’t produce the results I was expecting. But after toying with it for many hours, I think I have some semblance of a strategy for constructing the grammars and actually meaningfully using them in Nova.

I had been working on an extension that used a syntax’s tokens for selection expansion. That’s compelling… if the grammar of a language is robust and the queries are useful. There’s no universe where I could accommodate every language and every situation. I can’t account for the grammars, but I could theoretically account for the queries. Unless I missed something, I didn’t see queries being accessible to the scripting interfaces whatsoever. If they were, my extensions could be a lot more powerful without requiring some massive endeavor on my part.

Using another example, I was able to set up preferences for an extension I’m working on that featured a text box that would forward custom-authored JSON schemas to the attached LSP. This was incredibly powerful and meant that the burden of customization was rightfully placed on the user of my extension, not on me. If queries were usable from the scripting interface, I could use the same pattern (or a number of other patterns due to the scripting interface’s flexibility) to really make powerful, customizable extensions with custom injection points, fold regions… or theoretically whatever.

I’m not sure if exactly such is on the table, but something — token types, parsing results (node tree), or queries — being exposed to the scripting interface would be huge. Having just symbols accessible to scripting has proven very insufficient (at least for my needs).

With regard to accessing the parse tree and tokens via the JavaScript API, what sort of query operations would you be looking to do?

If you are looking to just read out parts of the parse tree with a query and then, say, build a list of items for something, that’s potentially possible.

You specifically mention making custom injection points and fold regions. If your intent would be to perform a query against a document’s parse tree and then build a custom set of injection points to hand back to the parser, that would be far more complex for us to handle in a performant manner, since bridging to the JavaScript API is costly.

Fold regions are less impacted by this, I think, as they can be delivered asynchronously after a short delay without the user taking too much notice. Injection points for sublanguages, though, are likely to be more of a concern since the parser needs that information up front as it is recursively updating the document’s parse tree(s) with each keystroke. It wouldn’t necessarily be impossible, but I do wonder if there would be a more performant way that didn’t necessarily require invoking extension code.

Some further context that might be useful: extension code in Nova is executed in a secondary process and all calls back and forth into the IDE are done over an IPC bridge. Furthermore, parsing with tree-sitter is performed in a tertiary process for third-party grammars (those not bundled with Nova), as it is a requirement for properly conforming to macOS security and codesigning validation. As such, any want for the extension API to interact with the parse tree must cross those IPC bridges which introduces the need for all operations therein to be asynchronous given the amount of data a parse tree may contain.

I think the most general thing I can do with all this knowledge is to make a custom JSON language and introduce potential injections into any JSON string for Molang. Maybe I could even extend this to commands or localization keys or whatever else would have a parseable format in add-on JSON. Is that possible? To use predicates to peek at the start of the string content to figure out which injected language to use? If it is, I think this is the most sensible solution, and I can roll up an LSP to throw errors if the “type” of string content wouldn’t be appropriate that JSON key.

Finally, I still believe access to the parse tree from a scripting interface could be useful. Again, symbols are interesting… and not good enough for many things.

Sorry for my own confusion, I’m still not quite sure what specifically you’re trying to accomplish.

You mention commands and localization keys—how exactly do you see those fitting in with JavaScript access to the parse tree?

In cases where you want your language server to be able to flag instances where the wrong language is used in a key-pair, it’s likely much more appropriate for the language server itself to know what keys are valid instead of accessing Nova’s parse tree and forwarding that along, since that sort of document intelligence should be on the server’s side in the first place. Most language servers perform the parsing and validation of the document contents within themselves to have this sort of knowledge.

Ah, sorry. It was two different scenarios. I’d say to ignore the final paragraph in my last response.

The various injected languages in add-on JSON would be commands, localization keys, and Molang. I was wondering if it was possible for me to just let any add-on JSON string be one of these three injected languages. Maybe I could use predicates and regular expressions to compare the start of the strings to try to assign one of the three injected languages (or not)?