Feedback Wanted: On Parsers, Grammars, & Other Such Things

I’m so happy and excited about this!

I’m so excited for this! I’ve written language extensions for other editors (Atom, VSCode, et cetera) and when I tried to write one for Nova I really struggled. I think I probably started one five separate times and finally gave up completely.

Off-hand, it seems like the tree-sitter option makes the most sense, as it’s more agnostic and has a community/ecosystem around it.

From a developer perspective, I would love to be able to find an existing extension for another editor, fork/copy the theme file into a Nova extension, and have it Just Work™️.

Alternatively, if there was a system (like tree-sitter) that I could reference in order to write my own grammar/themes, that would also work.

Lastly, I think that no matter what is chosen (even doing nothing and sticking with the current custom grammar format), the real step change comes from how good the documentation is going to be. For example I read the tree-sitter documentation and it feels pretty technical to me. After reading it, it’s not really obvious to me how I would go about making a theme for it. Then I tried to find existing tree-sitter themes and tried to get a look at what the theme format actually looks like and even that I sort of struggled with. Great guides/walkthroughs around what developers commonly will want to do (e.g. taking a theme file from another editor’s language extention and porting it to Nova) is what I think the secret sauce is to creating a vibrant developer community.

Thanks for working on this!

5 Likes

Being struggling on my language extensions for Nova and glad to see the post.

From the extension developers’ view, the textmate grammar sucks, which is far away from the real syntax structure of a common programming language, hence requires tremendous efforts to make a not-that-bad syntax highlighting extension. It’s really hard to translate from a language specification which is always written in the format of PEG to the textmate format. Nova’s current XML-based syntax highlighting also suffers from the same problem in the textmate.

I also believe that LSP semantic tokenization is not the future of syntax highlighting of a code editor, it just works like a patch of the current poorly-designed syntax highlighting system.

Compared to the textmate, the Treesitter generates the entire syntax tree, and the grammar of it is more similar to the common language specification we found. The Treesitter is much more developer-friendly. I believe that the Treesitter is right way to go.

3 Likes

I don’t have a strong opinion on the path forward, but thought I’d share a few of my thoughts. I haven’t looked into Tree Sitter, so I don’t know much about it beyond your description above. It does sound technically interesting, and I’ll always applaud efforts to not let Microsoft run away with setting all the standards since it doesn’t have a good track record.

That said, Tree Sitter does seem to be the path most at odds with your The Stress of Building and Maintaining Language Grammars section above. If Nova is mostly alone as an actively-developed editor using Tree Sitter, then extensions would be less likely to benefit from the work of other editors in defining and maintaining language grammars. Many language extension devs could likely find themselves in the same situation as now of starting from scratch (though with better tools).

Speaking for myself, I wouldn’t mind the extra work of Tree Sitter if necessary. What’s most important to me is for Nova to work toward better LSP support, as language servers offer a lot of targeted help that would be difficult for me to replicate. I get a good number of issues inquiring why my extension lacks a particular LSP feature. If Tree Sitter is the path least likely to monopolize Panic’s focus, then that’s an important benefit in my opinion.

3 Likes

Hey Logan, thank you for this extensive insight into the state of things at Panic, and on your thinking. As someone who, for a year, has struggled to get a language extension for a deceivingly complex language off the ground, I would say that, in order of the options:

  1. The current language parsing engine of Nova is incredibly flexible and powerful. One of the reasons my extension still isn’t published is, in fact, that Nova’s engine allows me to get language parsing to a level of detail and correctness I haven’t encountered in published extensions yet. Sublime’s newest iteration of their engine maybe could enable this too, and a sufficiently complex Tree-sitter grammar no doubt could too, but other engines definitely cannot. I’ll get back to that in a ’mo. Now, as you clearly explain, this impressive technical feat comes with a lot of baggage re. maintenance and ecosystem integration. So, both despite and because of the huge amount of work I have sunk into creating a language grammar on par with “the best out there”, I think switching away from Nova’s current engine is warranted.

  2. However, both for the technical legacy reasons you name and for the fact they are technically incredibly, frustratingly limited in their ability to express non C-ish languages, Textmate grammars would be the worst choice. I know they are alluring due to VSCode’s continued support for them, but switching to such a legacy format would essentially downgrade Nova two or three notches in language support, from “best in class, small ecosystem”, to “lots of mostly crappy support”. When it comes to modern and off-beat languages, Textmate grammars simply sh*t the bed. It’s not their fault – age and neglect often come with incontinence –, but it is a fact.

  3. Which leaves Tree-sitter. Full disclosure: I like Tree-sitter, despite the fact the initial investment in creating a grammar is even higher than for Nova, and I like it particularly because you cannot create crappy “works for a handful of situations” parsers in it. Tree-sitter parsers are real parsers. In fact, I think I am on the record wishing Nova had used it off the bat [nope; seems I’m misremembering].

Tree-sitter has been quietly gathering steam since its inception for Atom, and its prognosed demise with the sunsetting of the latter has not come to pass:

BTW, for the latter: read Federico Viticci’s rave review on how Runestone handles huge, complex, faulty files without breaking stride. That is a non-dev noticing how well Tree-sitter implements its original goals of creating a workable AST out of anything thrown at it, correct or not. The number of grammars also has been quietly growing, including one dear to my heart.

If all of this reads like a pitch, you are not wrong: I think that if Panic decide to switch out Nova’s parser engine for something out there, Tree-sitter is the way to go. Compared to it, going TM grammars would be an admission of defeat (“we can’t support a good syntax engine, so here you get a crappy one you know and only love if you never looked at it in anger”).

[EDIT: backfilled links and corrected typos rather than fighting the Discourse app any longer; my heartfelt apologies to anybody whose notifications reflects my struggles to get this post out].

4 Likes

I am sure we can all agree how important the community/ecosystem is! I am not a LSP developer per se, but based on what has been discussed here and by @Logan, carrying on as it is, is out the way… its just an extra unnecessary step, patch work and reinventing the wheel sort of problem. Very hard to maintain and keep up. Just not efficient and I am sure you guys rather spend all that time to work on other important features than playing catch up games that to be honest never worked perfectly. Option one seems to be outdated and by reading the comments here not the favourite! Moving to tree sitter seems quite exciting. The fact that it is being actively developed and prob favoured/maintained by ex atom people (who hate Microsoft for ruining and lying about atom) sounds promising. As long a project is maintained and there is a “community” around it + open source I am sure you would be successful whatever parser/engine you choose even if not tree sitter!

Future looks promising!
Good luck! :crossed_fingers:

1 Like

The Tree Sitter implementation sounds the best for me.

1 Like

For me, the main pain point about Nova’s custom engine is that it has bad debugging support. To this day I haven’t found a solution to my problem writing a Zig grammar. A proper grammar engine should recognize the possibility of infinite recursion in a grammar and make this an error. Nova’s apparently doesn’t. Also because it’s a custom, proprietary engine, I have no possibility of inspecting what the engine does when this happens. This makes me unable to finish that plugin without direct support from Panic.

Here’s my first impression of Tree Sitter:

  • Tree Sitter says that it is robust, i.e. produces useful results even in case of errors. This is a huge improvement to the current engine, which requires grammar authors to manually add things like <cut-off> to handle errors graciously.
  • Regarding symbolication which Tree Sitter doesn’t support, I recommend having a look at what JetBrains does in their IDEs: The have a Program Structure Interface on top of their parsers which provides such features. This also serves as example that separating parsing from analysis works well in an IDE.
  • While the usage of JS for grammars (really?) is a questionable decision, the fact that these are compiled and not interpreted directly makes it quite likely that TS will be faster than the current regex-based engine.
  • Tree Sitter grammars are not overly complicated. Their complexity is necessary for doing a good job. Sure we all would like to just throw some keywords into a magic tool that produces a readily usable syntax that does everything we want, but that’s impossible. Decades of research into parsers and grammars have provided knowledge how to parse input quickly and handle error cases well, and the sad fact is that for the most part, this knowledge is ignored by overly simple syntax highlighting engines in editors. Those will get you quick adoption because writing plugins for additional languages is simple, but it will also impose a limit on features that your editor can support.

So yeah, I am wholly supporting the adoption of Tree Sitter.

4 Likes

I’ve worked on at least two different attempts at Clojure language support (syntax and navigation, more recently LSP), and Tree Sitter sounds like the best next step to me too. Thanks for this insight into the company’s thoughts and process and for gathering community feedback on this!

2 Likes

Hi Logan;

While I’ve not worked with Tree-Sitter, I’ve developed using the TextMate grammar as well as Nova, and while I’ve found Nova considerably more painful to develop for than TextMate (lack of complete documentation and lack of tooling to support grammar development being the biggest current pain points), it’s also considerably more flexible than TextMate ever was.

While implementing TextMate could provide some “free” language grammars, it’s also a significant limitation for the future, since, why develop a new grammar with the new features, when an existing TextMate grammar is right there?

I think what would be ideal here is to provide Tree-Sitter support and tooling to make it easier to port my grammar to Tree-Sitter, so that I’m not thrown straight into “I’d like new features but also doing Tree-Sitter right now is a bit too much.”

Looking forward to seeing what comes of this!

1 Like

I’m super excited by the idea of a parser that can benefit from other ecosystems and grow/evolve on its own. Obviously there are risks depending on someone else’s project, but as Logan and others have said, it seems the benefits outweigh the costs. Count me in on being excited about Tree-sitter, even if it means I need to rewrite a couple of my plugins.

Here are a couple other things I haven’t seen mentioned yet:

  • Parser combinators. Someone can tell me if this is exactly what tree sitter is and maybe it’s a moot point. But the idea is that you can build up extremely small rules (like parse a number, combined (combinator) with parse a symbol, to then parse a mathematical equation). It might be cool to have people build these parser combinators that run inside Nova to do the parsing.
  • ENBF. I’ve seen this format for describing a syntax a couple times. For example, here’s an ENBF for cooklang. Maybe there’s a tool out there to convert these into Tree-sitter or something else that makes language definitions more portable/approachable.

Anyway, not suggestions so much as other ideas in this space that I haven’t seen mentioned yet. Wonderful write-ups everyone. What a great and exciting discussion.

Hi there, I asked for tree-sitter over a year ago when I developed my Polis theme. I run into many issues like language in language theming (say a shell script containing JavaScript, or having Elixir in Markdown code blocks). The biggest issue to me was, and still is, tracking down variables and giving them the same color, making it easier to follow how data flows. The current engine doesn’t support that type of coloring.

// 'a' would always have the same color
// 'b' would also have the same color but different from 'a'
const getSmaler = (a, b) => {
    if ( a =< b) {
        return a
    } else {
        return b
    }
}
// Nova currently renders 'a' and 'b' with the same color

Another issue to me, like some mentioned, stated that creating a new grammar is a pain. I wanted to make a vlang.io/ grammar, but I didn’t got far and gave up.

(+1) Tree-Sitter

Allow me to follow up on this: Seeing that TreeSitter defines grammar rules as JS function that may refer to other functions, these are, in fact, parser combinators.

Concerning EBNF, yeah that’s been around a long time, particularly in computer science. I would argue that the format is subpar for this use-case because it misses a lot of convenience TreeSitter provides, e.g.

  • precedence
  • associativity
  • hiding rules in the syntax tree
  • naming subtrees
3 Likes

FYI, I posted up an extension for C using the tree-sitter grammar. I wrote that extension in about an hour, maybe less. Tree-Sitter is fantastic. Admittedly that extension only provides for syntax highlighting at the moment (I intend to add folding and symbolication as soon as I figure out the particular queries Nova needs), but still this is a huge leap forward with Nova 10.

I also have a D extension but I need to tweak it slightly before I post it.

One thing I can say, is that the captures that Nova uses are not particularly well suited for C family languages. I would like to see a few additional captures added that would help us with highlighting such languages and give more freedom to Theme authors. For example, there is no good way to tag primitive data types like “int”, and there isn’t particularly good support for highlighting include files or package imports. We wind up settling in those regards for a closest match.

3 Likes

Btw, if anyone comes across this – my C-Dragon extension is a lot more friendly for C, C++, and Objective-C. D-Velop does the same for D, and Go-Bee does it for Go. (All three were written by me.) There are still things I wish Nova would add to make highlighting / captures a bit better for these strongly typed languages though.

1 Like