On Thu, Jul 26, 2012 at 09:26:30AM -0400, Steve Gordon wrote:
Hi all,
Has anyone attempted converting PyDoc to Docbook XML recently with success? I've
tried a project called HappyDoc but it seems to be unmaintained and pretty poorly
documented, though it claims on the project site to be able to generate DocBook XML.
I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it
to XHTML with tidy:
tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html
...and then turning it into Docbook with an XSLT [1]:
java -cp "/usr/share/java/xalan-j2.jar" org.apache.xalan.xslt.Process -XSL
html2db.xsl -IN <module>.html > <module>.xml
The resulting XML still has a *lot* of errors according to XMLLINT, even before applying
the DTD rules. Anyone had any success with a variation of this or maybe a different method
entirely?
Do you need to be able to turn anything that pydoc displays into docbook or
just certain things that you have written? Is this for an ongoing
conversion or a one-off? pydoc itself is a somewhat naive displayer of text
that's in docstrings. docstrings are just text formatted so that humans can
look at it and assign meaning based on their past experience. So simply
converting from pydoc's output to docbook isn't likely to work well.
There's two different approaches I can think of to take -- I haven't used
either but they have a higher likelihood of working than raw pydoc:
* Happydoc is the only tool I've found that extracts python API
documentation and spits out docbook. happydoc parses python source files
to do this. This allows doing things like extracting both comments and
docstrings but it also means it does not work with C extensions, only pure
python code. I haven't played around with happydoc in years so I don't
know how good it is at guesing what the semantic meaning of docstrings are
in order to output nice docbook.
http://happydoc.sourceforge.net/
* Sphinx is a tool that is widely used by python modules (the python stdlib
uses docs handcoded in sphinx/restructuredtext. Other python modules use
sphinx to extract documentation from docstrings). You write your
docstrings in restructuredtext with many extensions to mark the semantic
content of your document. Then the sphinx toolchain is used to convert
that into an output format. There is not currently a builder for docbook
so if you went this route, someone would have to write a builder that does
that. The advantage that balances that is that sphinx is a semantic markup
system so you have well-defined markup entities that you can convert from
rather than guessing.
http://sphinx.pocoo.org/contents.html
-Toshio