I have a couple of questions regarding packaging Tree-sitter parsers for Fedora.
Some background. Tree-sitter is a library for writing parsers for source code, for use in, for example, syntax highlighters and text editors. Nearly 500 different parsers are available separately (https://github.com/tree-sitter/tree-sitter/wiki/List-of-parsers). However, we only have one of them packaged for Fedora so far.
Official bindings are available for using the parsers from a number of languages (https://tree-sitter.github.io/tree-sitter/#language-bindings). The Tree-sitter project has tooling to automatically generate these bindings, which are committed to the Git repository of each parser. On release, the bindings are uploaded to the native repository for each language (i.e., crates.io for Rust, pypi.org for Python, etc.).
First question. Should we be building all bindings we care about for a parser from a single SRPM (using the upstream source code), rather than making a bunch of duplicate SRPMs (one for each language, from the language-specific releases)? I think a single SRPM per parser makes more sense, but it does have the drawback of making the .spec files more complicated, and prevents us from generating them with, say, rust2rpm.
This brings me to my second question. Spec files for different Tree-sitter parsers are pretty-much identical to each other, so should we generate them using macros?
I’ve put together some draft macros (https://src.fedoraproject.org/rpms/tree-sitter/pull-request/1#request_diff), and built them into a tree-sitter-srpm-macros package (https://copr.fedorainfracloud.org/coprs/mavit/tree-sitter/). Here is an example complete spec file using these macros:
``` Name: tree-sitter-c Version: 0.21.4 Release: %autorelease License: MIT URL: https://github.com/tree-sitter/%%7Bname%7D Source: %{url}/archive/v%{version}/%{name}-%{version}.tar.gz BuildSystem: treesitter
%{treesitter -l C}
%changelog %autochangelog ```
I’m sure the macros are in need of review, but does this seem like a generally good approach?
On Tue, Nov 5, 2024 at 2:29 PM Peter Oliver via packaging packaging@lists.fedoraproject.org wrote:
I have a couple of questions regarding packaging Tree-sitter parsers for Fedora.
Hi,
Thank you for starting this discussion!
Some background. Tree-sitter is a library for writing parsers for source code, for use in, for example, syntax highlighters and text editors. Nearly 500 different parsers are available separately (https://github.com/tree-sitter/tree-sitter/wiki/List-of-parsers). However, we only have one of them packaged for Fedora so far.
Official bindings are available for using the parsers from a number of languages (https://tree-sitter.github.io/tree-sitter/#language-bindings). The Tree-sitter project has tooling to automatically generate these bindings, which are committed to the Git repository of each parser. On release, the bindings are uploaded to the native repository for each language (i.e., crates.io for Rust, pypi.org for Python, etc.).
First question. Should we be building all bindings we care about for a parser from a single SRPM (using the upstream source code), rather than making a bunch of duplicate SRPMs (one for each language, from the language-specific releases)? I think a single SRPM per parser makes more sense, but it does have the drawback of making the .spec files more complicated, and prevents us from generating them with, say, rust2rpm.
This is kind of against the spirit (or the rules) for packaging projects that are available on PyPI and on crates.io. For Rust crates, we even have a MUST NOT rule for building rust-*-devel packages from non-crates.io sources.
There is one exception for Rust crates - when they're part of a larger project and not feasible to be packaged separately. But that is obviously not the case here - we already have tree-sitter crates from crates.io packaged in a compliant way.
So in this case the "convenient" solution of building all the bindings from the upstream project is not compliant with the packaging guidelines for Rust crates (even a MUST NOT rule violated right now).
*If* it is guaranteed that the code published on crates.io is the same as the one in tagged upstream releases, then I think we *could* make an exception here. However, even then, the packages would need to be compliant with the Rust packaging guidelines.
I don't think this is easy to do in the current form. For example, the spec file in the "rust" branch here: https://src.fedoraproject.org/rpms/tree-sitter-java/blob/rust/f/tree-sitter-... has problems - it can't be generated by rust2rpm, and it's currently *only* correct because the project has no feature flags other than "default".
This brings me to my second question. Spec files for different Tree-sitter parsers are pretty-much identical to each other, so should we generate them using macros?
This sounds like it would be a good idea, especially if all the packages would be essentially identical other than having a different "tree-sitter-$lang" name and source repo. If we end up making this an acceptable way to build the Rust bindings, macros to build and generate the "rust-*-devel" subpackages correctly would definitely help too.
Fabio
On Wed, 6 Nov 2024, Fabio Valentini via packaging wrote:
we already have tree-sitter crates from crates.io packaged in a compliant way.
Not that it really changes the argument, but I don’t think we have any for Tree-sitter parsers, yet, which are what we’re discussing here.
*If* it is guaranteed that the code published on crates.io is the same as the one in tagged upstream releases, then I think we *could* make an exception here.
Guarantee is a strong word, but if there are differences, then I think we can consider that a bug. Official Tree-sitter parsers have automation to build the various release artefacts from a git tag (e.g., https://github.com/tree-sitter/tree-sitter-java/blob/master/.github/workflow...).
However, even then, the packages would need to be compliant with the Rust packaging guidelines.
I don't think this is easy to do in the current form. For example, the spec file in the "rust" branch here: https://src.fedoraproject.org/rpms/tree-sitter-java/blob/rust/f/tree-sitter-... has problems - it can't be generated by rust2rpm, and it's currently *only* correct because the project has no feature flags other than "default".
I’m tempted to say that we could cross this bridge when we come to it (if ever). Given that the Rust code and Cargo configuration is copied from a template when the parser developer runs the `tree-sitter generate` command, I think variation from parser to parser will be rare, and hence we can make a one-size-fits-all .spec in a simple-minded way.
I accept, though, that it would not be ideal to have Rust packaging assumptions hard-coded into something that isn’t part of the Rust packaging tooling.
There’s a trade-off here between consistent Rust packaging, versus getting Rust subpackages “for free” when Tree-sitter parsers are packaged for other languages (and vice versa). I think that’s something for people with more Rust packaging experience than me to weigh up.
On Wed, 6 Nov 2024, Fabio Valentini via packaging wrote:
On Tue, Nov 5, 2024 at 2:29 PM Peter Oliver via packaging packaging@lists.fedoraproject.org wrote:
This brings me to my second question. Spec files for different Tree-sitter parsers are pretty-much identical to each other, so should we generate them using macros?
This sounds like it would be a good idea, especially if all the packages would be essentially identical other than having a different "tree-sitter-$lang" name and source repo.
I have submitted a package review for a new package containing these macros, tree-sitter-srpm-macros.
https://bugzilla.redhat.com/show_bug.cgi?id=2333124
There appears to be little immediate enthusiasm for building Python or Rust binding subpackages, so I have made that an opt-in feature that I have labelled as experimental.
packaging@lists.fedoraproject.org