Last week I stumbled upon the fact that the newest python-docutils does not pass its unittests if PyXML is installed. Looking into the issue brought me to the conclusion that retiring PyXML may be the best thing to do as rrakus (the current PyXML maintainer) was interested in doing in February:
http://lists.fedoraproject.org/pipermail/devel/2012-February/163039.html
The main reason to actively rid ourselves of the package as opposed to simply letting it die a slow death is that the python2 stdlib will replace its own xml module with PyXML if PyXML is installed. Since PyXML is dead upstream, the code there is older and buggier than the code in the stdlib that it is replacing (the root of my docutils unittest failure, for instance).
With dmalcolm's help, I've taken a look at finding all the things using PyXML and figuring out how possible it would be to Retire PyXML:
https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML#Dep_analysis
Most of the dependent packages are likely false deps. They can all be fixed just by removing the requires: PyXML from the spec file. But a handful of packages (9 by my count) actually use PyXML and would be broken by removal. I'm going to start looking at patches for these but I'm pretty sure that at least comoonics will require more work and a lot of cooperation from upstream.
How do people feel about this work? * Seems generally like a good thing? * Seems like we can sacrifice comoonics and possibly some of the other packages on the Coding Required list if they aren't ported in time for F-18? * Need to explore other options such as stopping the python stdlib from replacing its xml module with PyXML and patching our packages that can't be ported to import pyxml explicitly?
Thanks, -Toshio
On Wed, 2012-07-25 at 13:16 -0700, Toshio Kuratomi wrote:
Last week I stumbled upon the fact that the newest python-docutils does not pass its unittests if PyXML is installed. Looking into the issue brought me to the conclusion that retiring PyXML may be the best thing to do as rrakus (the current PyXML maintainer) was interested in doing in February:
[...snip excellent summary...]
- Need to explore other options such as stopping the python stdlib from replacing its xml module with PyXML and patching our packages that can't be ported to import pyxml explicitly?
I did some investigating of how to do this.
The replacement of xml with PyXML in the stdlib happens in xml/__init__.py (see e.g. /usr/lib64/python2.7/xml/__init__.py): as "xml" is imported, it tries to import _xmlplus, and if _xmlplus.version_info >= (0, 8, 4) it replaces sys.modules[__name__] with _xmlplus, effectively replacing "xml" with "_xmlplus".
For example, with PyXML installed:
import xml xml.__file__
'/usr/lib64/python2.7/site-packages/_xmlplus/__init__.pyc' Note how "xml" now refers to PyXML.
However, we can subvert this, for example, by hacking up _xmlplus's version_info so that the test mentioned above fails. In a fresh process:
import _xmlplus _xmlplus.version_info = (0, 0, 0) import xml xml.__file__
'/usr/lib64/python2.7/xml/__init__.pyc' and so "xml" is now really using the stdlib's "xml", despite PyXML being installed.
So maybe if we have code that really needs to use xml, not PyXML (even if the latter is installed), how about the following code fragment?
# Neutralize _xmlplus, if present; we want "xml" to be the # stdlib's "xml" try: import _xmlplus # Prevent _xmlplus from being used in place of "xml" _xmlplus.version_info = (0, 0, 0) except ImportError: pass # _xmlplus aka PyXML not installed
With the above:
import xml xml.__file__
'/usr/lib64/python2.7/xml/__init__.pyc' and it's using the stdlib's "xml".
This does potentially break PyXML for the lifetime of the process, so you could run into nasty situations where some modules you're using really want "xml" to be "xml", and others want "xml" to be "_xmlplus", but hopefully we won't run into that...
Hope this is helpful Dave
On 07/26/2012 09:10 AM, David Malcolm wrote:
I did some investigating of how to do this.
The replacement of xml with PyXML in the stdlib happens in xml/__init__.py (see e.g. /usr/lib64/python2.7/xml/__init__.py): as "xml" is imported, it tries to import _xmlplus, and if _xmlplus.version_info >= (0, 8, 4) it replaces sys.modules[__name__] with _xmlplus, effectively replacing "xml" with "_xmlplus".
I suggest a different tactic. Try replacing the line:
_xmlplus.__path__.extend(__path__)
with
_xmlplus.__path__[0:0] = __path__
At the moment, the replacement works as follows:
1. Both the stdlib and PyXML package dirs are on the package path 2. In the case of name clashes, prefer the PyXML version
By changing it to put the stdlib path first, you would instead set up the following:
1. Both the stdlib and PyXML package dirs are on the package path 2. In the case of name clashes, prefer the stdlib version
(This should work, because PyXML doesn't import anything implicitly in __init__.py)
Since Toshio reports that the current behaviour is genuinely causing problems, you could probably make a case for this approach as an upstream bugfix, especially if you implement it in Fedora first with no apparent ill-effects for affected applications.
Cheers, Nick.
On Thu, 2012-07-26 at 16:03 +1000, Nick Coghlan wrote:
On 07/26/2012 09:10 AM, David Malcolm wrote:
I did some investigating of how to do this.
The replacement of xml with PyXML in the stdlib happens in xml/__init__.py (see e.g. /usr/lib64/python2.7/xml/__init__.py): as "xml" is imported, it tries to import _xmlplus, and if _xmlplus.version_info >= (0, 8, 4) it replaces sys.modules[__name__] with _xmlplus, effectively replacing "xml" with "_xmlplus".
I suggest a different tactic. Try replacing the line:
_xmlplus.__path__.extend(__path__)
with
_xmlplus.__path__[0:0] = __path__
At the moment, the replacement works as follows:
- Both the stdlib and PyXML package dirs are on the package path
- In the case of name clashes, prefer the PyXML version
By changing it to put the stdlib path first, you would instead set up the following:
- Both the stdlib and PyXML package dirs are on the package path
- In the case of name clashes, prefer the stdlib version
(This should work, because PyXML doesn't import anything implicitly in __init__.py)
Since Toshio reports that the current behaviour is genuinely causing problems, you could probably make a case for this approach as an upstream bugfix, especially if you implement it in Fedora first with no apparent ill-effects for affected applications.
I'm nervous about this - it seems like this would be a change of behavior: what about code that's out there that assumes that PyXML is preferred? We can't control the code that people install into random virtualenvs with pip. If we unilaterally change the stdlib, that would seem to increase the confusion.
With my proposed approach, you have to opt-in, your code can say: when I say "xml", I really mean "xml", not "_xmlplus".
Or am I missing something? Dave
On 07/27/2012 07:28 AM, David Malcolm wrote:
With my proposed approach, you have to opt-in, your code can say: when I say "xml", I really mean "xml", not "_xmlplus".
You can do much the same thing at the application level without patching the stdlib:
import xml xml.__path__.reverse() # If both are available, prefer stdlib over PyXML
The key point is to keep both path fragments, and just rearrange the order so the standard lib if first. That way the PyXML-only stuff will still be accessible, but the stdlib will be preferred for any name conflicts in package level components that get imported after the path reversal.
Cheers, Nick.
On Fri, Jul 27, 2012 at 12:27:31PM +1000, Nick Coghlan wrote:
On 07/27/2012 07:28 AM, David Malcolm wrote:
With my proposed approach, you have to opt-in, your code can say: when I say "xml", I really mean "xml", not "_xmlplus".
You can do much the same thing at the application level without patching the stdlib:
import xml xml.__path__.reverse() # If both are available, prefer stdlib over PyXML
The key point is to keep both path fragments, and just rearrange the order so the standard lib if first. That way the PyXML-only stuff will still be accessible, but the stdlib will be preferred for any name conflicts in package level components that get imported after the path reversal.
Thanks, Nick!
I just patched docutils with this and submitted the patch to docutils upstream. Works in my testing.
I've still got the wiki page for removing PyXML open: https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML
with bug reports for all the packages that require PyXML but I guess it's not as critical since there's a workaround. I still haven't heard back from rrakus about whether he is still a big believer in deprecating the PyXML package in Fedora so I guess we'll see what happens.
-Toshio
On 07/31/2012 03:16 PM, Toshio Kuratomi wrote:
On Fri, Jul 27, 2012 at 12:27:31PM +1000, Nick Coghlan wrote:
On 07/27/2012 07:28 AM, David Malcolm wrote:
With my proposed approach, you have to opt-in, your code can say: when I say "xml", I really mean "xml", not "_xmlplus".
You can do much the same thing at the application level without patching the stdlib:
import xml xml.__path__.reverse() # If both are available, prefer stdlib over PyXML
The key point is to keep both path fragments, and just rearrange the order so the standard lib if first. That way the PyXML-only stuff will still be accessible, but the stdlib will be preferred for any name conflicts in package level components that get imported after the path reversal.
Thanks, Nick!
I just patched docutils with this and submitted the patch to docutils upstream. Works in my testing.
It occurs to me that this will behave a little strangely if two different libraries in the same process try to make the same change.
A more robust check would be:
if "_xmlplus" in xml.__path__[0]: xml.__path__.reverse()
Cheers, Nick.
python-devel@lists.fedoraproject.org