Hi all, looks like PyXML package is deprecated since python itself provides xml mechanisms. When you look deeper, python's xml provides: "dom", "parsers", "sax", "etree" and PyXML provides: 'dom', 'marshal', 'parsers', 'sax', 'schema', 'utils', 'xpath', 'xslt'
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
RR
On 21.2.2012 18:48, Roman Rakus wrote:
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
What packages require PyXML? Could they be rebuilt just with xml tools in stock Python (I think so)? Did you try?
Matěj
On 02/22/2012 11:11 AM, Matej Cepl wrote:
On 21.2.2012 18:48, Roman Rakus wrote:
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
What packages require PyXML? Could they be rebuilt just with xml tools in stock Python (I think so)? Did you try?
Matěj
I guess rebuilding isn't enough. Only running affected scripts can show errors. Packages requiring PyXML in F16 follows:
$ repoquery --whatrequires PyXML SOAPpy-0:0.11.6-12.fc16.noarch bkchem-0:0.14.0-3.pre2.fc15.noarch comoonics-cdsl-py-0:0.2-18.noarch comoonics-cluster-py-0:0.1-24.noarch fedora-business-cards-0:0.2.4.3-2.fc15.noarch grc-0:0.70-7.fc15.noarch heartbeat-0:3.0.4-1.fc15.1.x86_64 inksmoto-0:0.7.0-5.fc15.noarch libopensync-plugin-google-calendar-1:0.22-5.fc15.x86_64 openxcap-0:1.1.2-3.fc15.noarch pida-0:0.5.1-13.fc15.x86_64 pypar2-0:1.4-7.fc15.noarch python-MythTV-0:0.24.1-4.fc16.x86_64 python-MythTV-0:0.24.2-1.fc16.x86_64 python-ZSI-0:2.0-9.fc15.noarch python-nova-0:2011.3-4.fc16.noarch python-nova-0:2011.3.1-2.fc16.noarch python-webdav-library-0:0.3.0-1.fc16.noarch salt-0:0.9.6-2.fc16.noarch spacewalk-backend-tools-0:1.4.39-1.fc16.noarch subscription-manager-0:0.99.4-1.fc16.x86_64 synce-sync-engine-0:0.15.1-1.fc16.x86_64 xen-0:4.1.1-8.fc16.x86_64 xen-0:4.1.2-6.fc16.x86_64 zeroinstall-injector-0:1.2-1.fc16.noarch
RR
Quoting Roman Rakus (2012-02-22 11:21:38)
On 02/22/2012 11:11 AM, Matej Cepl wrote:
On 21.2.2012 18:48, Roman Rakus wrote:
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
What packages require PyXML? Could they be rebuilt just with xml tools in stock Python (I think so)? Did you try?
Matěj
I guess rebuilding isn't enough. Only running affected scripts can show errors. Packages requiring PyXML in F16 follows:
$ repoquery --whatrequires PyXML ... python-webdav-library-0:0.3.0-1.fc16.noarch
Fixed in rawhide/F17. Thanks for pointing it out
On 22.2.2012 11:21, Roman Rakus wrote:
fedora-business-cards-0:0.2.4.3-2.fc15.noarch
I wonder why this is on the list: if I am not mistaken, it doesn't use anything else than xml.dom.minidom (in generate.py), which was already present in python 2.4 (the oldest Python currently living in Fedora/EPEL universe). And yes PyXML is hard-coded in Requires: (maybe this is just one more example why hard coded Requires are evil).
Putting the maintainer on Cc: to ask for the reasons why this package wants PyXML at all.
Best,
Matěj
On Mon, Feb 27, 2012 at 01:03:32AM +0100, Matej Cepl wrote:
On 22.2.2012 11:21, Roman Rakus wrote:
fedora-business-cards-0:0.2.4.3-2.fc15.noarch
I wonder why this is on the list: if I am not mistaken, it doesn't use anything else than xml.dom.minidom (in generate.py), which was already present in python 2.4 (the oldest Python currently living in Fedora/EPEL universe). And yes PyXML is hard-coded in Requires: (maybe this is just one more example why hard coded Requires are evil).
Putting the maintainer on Cc: to ask for the reasons why this package wants PyXML at all.
I'll remove the dependency sometime this week.
On Tue, Feb 21, 2012 at 06:48:11PM +0100, Roman Rakus wrote:
Hi all, looks like PyXML package is deprecated since python itself provides xml mechanisms. When you look deeper, python's xml provides: "dom", "parsers", "sax", "etree" and PyXML provides: 'dom', 'marshal', 'parsers', 'sax', 'schema', 'utils', 'xpath', 'xslt'
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
It's not as simple as saying that a library provides something that has the same names as modules in the stdlib, you also have to figure out compatibility and whether removing it will cause any problems for software that Fedora ships.
Looking at the sourceforge page, the authors of PyXML are heavily involved in python core development so it's likely that they worked to merge the useful bits of PyXML into the stdlib before they abandoned it: http://sourceforge.net/projects/pyxml/
However, it also looks like PyXML is a collection of works that the sourceforge authors didn't necessarily originate. With that in mind, they may not have been able to get permission of the various original authors to merge a particular module into the python stdlib. However, there may exist independent upstream versions of those modules that would be better to ship than shipping PyXML in those cases.
The best way to proceed is likely similar to how I looked at whether it would be okay to retire python-sqlite2: Take all the packages that depend on PyXML and grep through their sources to find where they use the PyXML modules (rpm -ql PyXML shows that everything in PyXML is in an "_xmlplus" python package so you'll see things like "import _xmlplus.dom" and "from _xmlplus import dom". grepping for _xmlplus will probably work). In some cass, you'll likely find this was an old dep and the source no longer uses it. Others may be conditionalized:
try: from xml import sax except ImportError: from _xmlplus import sax
Testing these packages to know that they behave properly is good but at least you can be confident that the upstream code is intended to work with the stdlib modules instead of the PyXML modules.
If you find any code that's using _xmlplus unconditionally, you'll have to write patches to use the stdlib or separate modules. Then test your changes and send the patches upstream. Given the upstream note that PyXML is deprecated and the authors do not intend for people to use it, this category would hopefully be very small. But you won't know until you look.
-Toshio
On 02/23/2012 04:54 PM, Toshio Kuratomi wrote:
On Tue, Feb 21, 2012 at 06:48:11PM +0100, Roman Rakus wrote:
Hi all, looks like PyXML package is deprecated since python itself provides xml mechanisms. When you look deeper, python's xml provides: "dom", "parsers", "sax", "etree" and PyXML provides: 'dom', 'marshal', 'parsers', 'sax', 'schema', 'utils', 'xpath', 'xslt'
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
It's not as simple as saying that a library provides something that has the same names as modules in the stdlib, you also have to figure out compatibility and whether removing it will cause any problems for software that Fedora ships.
Looking at the sourceforge page, the authors of PyXML are heavily involved in python core development so it's likely that they worked to merge the useful bits of PyXML into the stdlib before they abandoned it: http://sourceforge.net/projects/pyxml/
However, it also looks like PyXML is a collection of works that the sourceforge authors didn't necessarily originate. With that in mind, they may not have been able to get permission of the various original authors to merge a particular module into the python stdlib. However, there may exist independent upstream versions of those modules that would be better to ship than shipping PyXML in those cases.
The best way to proceed is likely similar to how I looked at whether it would be okay to retire python-sqlite2: Take all the packages that depend on PyXML and grep through their sources to find where they use the PyXML modules (rpm -ql PyXML shows that everything in PyXML is in an "_xmlplus" python package so you'll see things like "import _xmlplus.dom" and "from _xmlplus import dom". grepping for _xmlplus will probably work). In some cass, you'll likely find this was an old dep and the source no longer uses it. Others may be conditionalized:
try: from xml import sax except ImportError: from _xmlplus import sax
Look at stdlib xml - it tries to import _xmlplus. And it will replace stdlib with nonstd. It's kind of "what?". I can try to report bug on it.
Testing these packages to know that they behave properly is good but at least you can be confident that the upstream code is intended to work with the stdlib modules instead of the PyXML modules.
If you find any code that's using _xmlplus unconditionally, you'll have to write patches to use the stdlib or separate modules. Then test your changes and send the patches upstream. Given the upstream note that PyXML is deprecated and the authors do not intend for people to use it, this category would hopefully be very small. But you won't know until you look.
-Toshio
Currently I'm going through packages and using pylint on *.py files on %preped sources. And using --deprecated-modules option of pylint. I will post results, report bugs and so on...
Anyway, you're right that better is to ask upstream how different are stdlib and _xmlplus.
RR
On Thu, Feb 23, 2012 at 05:19:01PM +0100, Roman Rakus wrote:
Look at stdlib xml - it tries to import _xmlplus. And it will replace stdlib with nonstd. It's kind of "what?". I can try to report bug on it.
Ugh. Yeah -- so it looks like the code there assumes that python-2.7 xml libraries don't have any changes that aren't in the 0.8.4 release of PyXML.
This is untrue, but many of the changes won't affect what the code does: "if value in dict" vs "if dict.has_key(value)" type stuff and work arounds for older python releases. There are a few changes which might affect the output of the code but someone would need to analyze it more to know for sure.
The method used also makes it harder for people to simply grep the sources and tell if code really depends on PyXML or not as an import of the stdlib's xml module may be getting _xmlplus instead of the stdlib code.
Currently I'm going through packages and using pylint on *.py files on %preped sources. And using --deprecated-modules option of pylint. I will post results, report bugs and so on...
It's great that you're doing this. Hopefully PyXML has been deprecated long enough that you'll find most code only needs the stdlib to function.
-Toshio
On Thu, Feb 23, 2012 at 9:09 AM, Toshio Kuratomi a.badger@gmail.com wrote:
On Thu, Feb 23, 2012 at 05:19:01PM +0100, Roman Rakus wrote:
Currently I'm going through packages and using pylint on *.py files on %preped sources. And using --deprecated-modules option of pylint. I will post results, report bugs and so on...
It's great that you're doing this. Hopefully PyXML has been deprecated long enough that you'll find most code only needs the stdlib to function.
To revive a very old thread --
Did you get anywhere with this? I've just found that the latest version of docutils doesn't run if pyxml is installed (due to the stdlib replacing its own implementations with pyxml's implementation if pyxml is installed.) For now, in rawhide, I've added a Conflicts on PyXml but that's not going to work into the future (not the least because you can't install packages that (perhaps bogusly) require pyxml at the same time as docutils.)
I can see several ways forward --
* Deprecate pyxml as you were thinking of doing * Locally patch the python stdlib to not replace its implementation of xml with pyxml's. I don't know if upstream python will take that as python-2.7 is in maintainance mode but they may as it's somewhat agreed that this importing of pyxml is not kosher in upstream (and has been removed in python-3.x) * I can file a bug and we can figure out how to fix PyXML so that docutils works when it's installed. (We may have to do this anyway as there's a chance I may need to push a docutils update to older Fedora for some bugfixes.)
Thanks, -Toshio
On Fri, Jul 20, 2012 at 4:28 PM, Toshio Kuratomi a.badger@gmail.com wrote:
I can see several ways forward --
- Deprecate pyxml as you were thinking of doing
- Locally patch the python stdlib to not replace its implementation of
xml with pyxml's. I don't know if upstream python will take that as python-2.7 is in maintainance mode but they may as it's somewhat agreed that this importing of pyxml is not kosher in upstream (and has been removed in python-3.x)
Some followup. I've made a page for removing pyxml:
https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML
dmalcolm added the dep tree. I remember that fedora-business-card is a false dep from when this came up in February. I'll remove the dep shortly. Since SOAPpy affects so much stuff I had a look. It seems like the dep is false there as well. The README says that PyXML is needed but inspection of the code and the ChangeLog, ReleaseNotes, and the upstream scm point to that being a documentation bug; PyXML requirement was supposedly removed in 2003.
-Toshio
Just finished analyzing all of the packages that claim a PyXML dependency:
https://fedoraproject.org/wiki/User:Toshio/Remove_PyXML#Dep_analysis
There seem to be a hefty number of packages where we can remove the dependency (the Easy fixes section). I've opened bugs for those in case the maintainers know something about the PyXML dep that I'm unaware of.
Due to the fact that PyXML overwrites the python stdlib xml module the possibility does exist that I've missed some feature that PyXML adds to some portion of the stdlib xml module. As an example of what level of replacement that can occur: Both modules have sax.handler. The sax handlers can both be configured by setting sax.handler.feature_* attributes. However, PyXML has feature_namespace_prefixes while the stdlib module does not. If you run across one of these in course of trying to remove the PyXML dependency, please let me know, update the wiki page, add to the bug report what's blocking things, etc. I can try to check for these things across all of the PyXML-using packages if I'm made aware that they exist.
The Require Coding section is more problematic. I'll list them all here:
* bkchem: uses xpath. If this is all, it's probably doable to write a patch that uses lxml's xpath instead. Since it's a plugin, it's also possible to stop shipping that plugin. * libopensync-plugin-google-calendar: uses xpath. Writing a patch shoul dbe as doable as bkchem * python-ZSI: I think we could fix with a package update. Upstream has a 2.1alpha1 release that Debian and Gentoo both ship that would fix this for us. * subscription-manager looks like it just needs a port to a different library (the code isn't even xml related.. it's date parsing) * spacewalk-backend: both uses of PyXML-only code were in test cases. One looks like it could be ported to some other library (it's for converting between character encodings). The other one is for writing xml using a SAX api. I haven't looked at that one too hard yet. * openxcap makes use of a non-stdlib feature of the sax reader. I haven't looked into this one too hard either but I suspect there's not just a drop-in replacement for this. Some sort of custom code will have to be written to handle what they're trying to do. * comoonics* -- these packages make heavy use of the PyXML-only API. I think someone will need to spend a while getting to know the code and porting it to use a different library. We could also drop the comoonics packages until upstream ports.
rrakus, dmalcolm, others -- how do you want to proceed? I can open bugs for the remaining packages and we can plow forward on this but some of the packages needing code changes may not make it for F18 and have to be blocked. We can also pursue one of the alternatives (for instance, making the python2 stdlib not replace itself with PyXML and patching those package swhich can't be ported in time to import _xmlplus instead of import xml.
-Toshio
On 23.2.2012 16:54, Toshio Kuratomi wrote:
It's not as simple as saying that a library provides something that has the same names as modules in the stdlib, you also have to figure out compatibility and whether removing it will cause any problems for software that Fedora ships.
Completely agree with what you were writing, just to note that IIRC PyXML was the first Python XML library, so I would believe plenty of projects use it just because of conservatism/laziness/fear of change.
Matěj
On Tue, Feb 21, 2012 at 06:48:11PM +0100, Roman Rakus wrote:
Hi all, looks like PyXML package is deprecated since python itself provides xml mechanisms. When you look deeper, python's xml provides: "dom", "parsers", "sax", "etree" and PyXML provides: 'dom', 'marshal', 'parsers', 'sax', 'schema', 'utils', 'xpath', 'xslt'
So, PyXML duplicates dom, parsers and sax (and looks like python's is in better shape). Is any package using marshall, schema or any other not in python itself?
Deprecate PyXML or just remove duplicated parts?
PyXML is not maintained by upstream for many years it should not be used hence. Distribution specific PyXML-0.8.4-python2.6.patch included in srpm is warning of ongoing issues.
Anyway it provides many features which are missing in python stdlib as far as I know (for example xpath or magnificent HTML-to-DOM reader).
There are alternatives outside of stdlib described here: http://wiki.python.org/moin/PythonXml
(lxml is my personal favorite, however it is not compatible with PyXML and it isn't pure python).
I believe PyXML should be kept unchanged (my personal code rely on that as well) but deprecated and its users should be strongly encouraged to switch to some alternative if stdlib doesn't satisfy their requirements.