My day job currently involves working on a Python CLI (and potentially a backing socket-activated service) that needs to run across Fedora/RHEL/CentOS/SCLs, *without* accidentally exposing a Python level API that we might inadvertently end up needing to support.
(Note: this CLI is not being, and will likely never be, proposed for incorporation into Fedora itself - it's a tool to help migrate applications between different operating system versions without doing an in-place upgrade, so the update cycles need to be decoupled from those of the operating system itself)
At the moment, we offer two different ways of installing that:
1. via pipsi, which uses the system Python, but has no access to system level Python libraries 2. via RPM, which has access to system level Python libraries, exposes the application's internal libraries for import by other applications (which we don't really want to do) and also requires that *all* dependencies be available as system packages
Both approaches have significant downsides:
* the pipsi based approach is *too* decoupled from the host OS, installing things into the virtual environment even when a perfectly acceptable version is already installed and maintained as a system package. It also means we can't benefit from distro level patches to packages like requests, so the app is decoupled from the system certificate store * the RPM based approach isn't decoupled from the OS *enough*, so we can't readily do things like selectively installing private copies of newer versions of dependencies on RHEL/CentOS, while using the system packages on Fedora. It also means the Python packages implementing the application itself are globally available for import rather than only being usable from within the application
While we haven't implemented it yet, the approach I'm considering to tackle this problem [1] involves integrating creation of an app-specific private virtual environment into the definition of the application RPM, with the following details:
* unlike pipsi, this virtual environment would be configured to allow access to the system site packages, giving us the best of both worlds: we'd use system packages if readily available, otherwise we'd stick our own pinned dependency in the virtual env and treat it as part of the application (and hence the app developers' responsibility to keep up to date) * we'd come up with some way of turning the Python level dependencies into additional entries in the RPM's Sources list, and then turn those into a local sdist index during the %prep phase. That way, we'd support offline builds automatically, and be well positioned to have pip autofill any gaps where system level dependencies didn't meet the needs of the application * we'd deliberately omit some of the packages injected into the virtual environment from the resulting RPM (most notably: we'd either remove pip, wheel, and setuptools, or else avoid installing them in the first place)
Where I think this idea crosses over into being a suitable topic for the Fedora Python SIG relates to the current modularity initiatives and various problems we've faced over the years around separating the challenges of "provide an application that happens to be written in Python" and "provide a supported Python API as part of the system Python installation".
Some examples:
* the helper library for the "mock" CLI tool had to be renamed to "mockbuild" to fix a conflict with the upstream "mock" testing library * despite officially having no supported public API, people still write "import pip" instead of running the pip CLI in a subprocess * ditto for the yum CLI (and even for DNF, some non-trivial changes were recently needed to better separate the "supported for third party use with defined backwards compatibilty guarantees" APIs from the "for internal use by the DNF CLI and may change at any time" APIs
All of those could have been avoided if the recommended structure for "applications that happen to be written in Python" included a virtual environment that isolated the "private to the application" Python modules (including the application's own source code) from the "intended for third party consumption" public APIs.
In the near term, my own focus is going to be on figuring out the details of this structure specifically for LeApp, but I wanted to raise the notion here early so I didn't go down any paths that would later prove to be an absolute deal-breaker for updating the distro level recommendations.
Cheers, Nick.
[1] https://github.com/leapp-to/prototype/issues/126
Hi Nick,
Very interesting topic and certainly these ideas are worth exploring.
(Subscribing to this thread so I can revisit it more thoroughly at the future).
Regards,
Charalampos Stratakis Associate Software Engineer Python Maintenance Team, Red Hat
----- Original Message ----- From: "Nick Coghlan" ncoghlan@gmail.com To: "Fedora Python SIG" python-devel@lists.fedoraproject.org Sent: Sunday, May 21, 2017 6:07:26 AM Subject: Speculative idea: incorporating venv into our Python application packaging advice
My day job currently involves working on a Python CLI (and potentially a backing socket-activated service) that needs to run across Fedora/RHEL/CentOS/SCLs, *without* accidentally exposing a Python level API that we might inadvertently end up needing to support.
(Note: this CLI is not being, and will likely never be, proposed for incorporation into Fedora itself - it's a tool to help migrate applications between different operating system versions without doing an in-place upgrade, so the update cycles need to be decoupled from those of the operating system itself)
At the moment, we offer two different ways of installing that:
1. via pipsi, which uses the system Python, but has no access to system level Python libraries 2. via RPM, which has access to system level Python libraries, exposes the application's internal libraries for import by other applications (which we don't really want to do) and also requires that *all* dependencies be available as system packages
Both approaches have significant downsides:
* the pipsi based approach is *too* decoupled from the host OS, installing things into the virtual environment even when a perfectly acceptable version is already installed and maintained as a system package. It also means we can't benefit from distro level patches to packages like requests, so the app is decoupled from the system certificate store * the RPM based approach isn't decoupled from the OS *enough*, so we can't readily do things like selectively installing private copies of newer versions of dependencies on RHEL/CentOS, while using the system packages on Fedora. It also means the Python packages implementing the application itself are globally available for import rather than only being usable from within the application
While we haven't implemented it yet, the approach I'm considering to tackle this problem [1] involves integrating creation of an app-specific private virtual environment into the definition of the application RPM, with the following details:
* unlike pipsi, this virtual environment would be configured to allow access to the system site packages, giving us the best of both worlds: we'd use system packages if readily available, otherwise we'd stick our own pinned dependency in the virtual env and treat it as part of the application (and hence the app developers' responsibility to keep up to date) * we'd come up with some way of turning the Python level dependencies into additional entries in the RPM's Sources list, and then turn those into a local sdist index during the %prep phase. That way, we'd support offline builds automatically, and be well positioned to have pip autofill any gaps where system level dependencies didn't meet the needs of the application * we'd deliberately omit some of the packages injected into the virtual environment from the resulting RPM (most notably: we'd either remove pip, wheel, and setuptools, or else avoid installing them in the first place)
Where I think this idea crosses over into being a suitable topic for the Fedora Python SIG relates to the current modularity initiatives and various problems we've faced over the years around separating the challenges of "provide an application that happens to be written in Python" and "provide a supported Python API as part of the system Python installation".
Some examples:
* the helper library for the "mock" CLI tool had to be renamed to "mockbuild" to fix a conflict with the upstream "mock" testing library * despite officially having no supported public API, people still write "import pip" instead of running the pip CLI in a subprocess * ditto for the yum CLI (and even for DNF, some non-trivial changes were recently needed to better separate the "supported for third party use with defined backwards compatibilty guarantees" APIs from the "for internal use by the DNF CLI and may change at any time" APIs
All of those could have been avoided if the recommended structure for "applications that happen to be written in Python" included a virtual environment that isolated the "private to the application" Python modules (including the application's own source code) from the "intended for third party consumption" public APIs.
In the near term, my own focus is going to be on figuring out the details of this structure specifically for LeApp, but I wanted to raise the notion here early so I didn't go down any paths that would later prove to be an absolute deal-breaker for updating the distro level recommendations.
Cheers, Nick.
[1] https://github.com/leapp-to/prototype/issues/126
python-devel@lists.fedoraproject.org