Excluding a Node in injection.content subtree

the3emran · May 28, 2023, 9:03pm

Hi,

I am a bit stuck, and I was wondering if someone experienced with tree-sitter could help.

Please consider the following:

<div>
    {{hello world}}
</div>

Here is the tree-sitter parse for the code above, based on the grammar I wrote:

(blade [0, 0] - [3, 0]
  (html_element [0, 0] - [2, 6]
    (html_start_tag [0, 0] - [0, 5])
    (php_statement [1, 4] - [1, 19]
      (text [1, 6] - [1, 17]))
    (html_end_tag [2, 0] - [2, 6])))

this is my injection.scm

(blade
    [
        (text)
        (html_element)
    ] @injection.content
    (#set! injection.combined)
    (#set! injection.language html))


(php_statement
    (text) @injection.content
    (#set! injection.combined)
    (#set! injection.language php))

The problem I am facing is, that the html_element have recursively more html_element subtree nodes as well as the php_statement nodes as shown in the above example. It in fact has the whole language definitions using the repeat(_definition)

the @injection.content capture, sets the node and its subtree as the injection region, as a result the {{hello world}} is parsed as html, even though the tree-sitter parse cli picks it up as the correct php_statement

Is there a way to exclude a node from the subtree when using the @injection.content? basically saying, "hey do not include php_statement or some other nodes, but consider all the other subtree nodes as html?

I initially tried to create a text node using a regex (like the tree-sitter-php), and setting them as html injection region, but the problem is because I can not use lookahead and lookbehind, I am unable to use such tactics. They conflict with blade components like <x-component/> because they also include <.

Thanks!

the3emran · June 13, 2023, 6:59pm

I Managed to fix my problem using lexical precedence. It was a mission to get it working though

So if anyone else is stuck writing the grammar, you are better off creating a childless universal general text node. However it is easier said than done as I am pretty sure your grammar will end up conflicting with your important language tokens. so you really need to have your precedences bang on.

I will post the repo link once the grammar is public on github as I am sure it will be very helpful for current and future frameworks where super unconventional, token conflicting, injection points exists.