Re: Converting PyDoc to Docbook XML?

Thursday, 26 July 2012

On Thu, Jul 26, 2012 at 09:26:30AM -0400, Steve Gordon wrote:
...
 Hi all,

 Has anyone attempted converting PyDoc to Docbook XML recently with success? I've
tried a project called HappyDoc but it seems to be unmaintained and pretty poorly
documented, though it claims on the project site to be able to generate DocBook XML.

 I have also tried generating the PyDoc HTML with `pydoc -w <module>`, converting it
to XHTML with tidy:

     tidy -indent -m -asxhtml -clean -bare -omit ovirtsdk.infrastructure.brokers.html

 ...and then turning it into Docbook with an XSLT [1]:

     java -cp "/usr/share/java/xalan-j2.jar" org.apache.xalan.xslt.Process -XSL
html2db.xsl -IN <module>.html > <module>.xml

 The resulting XML still has a *lot* of errors according to XMLLINT, even before applying
the DTD rules. Anyone had any success with a variation of this or maybe a different method
entirely?
  Do you need to be able to turn anything that pydoc displays into docbook or
just certain things that you have written?  Is this for an ongoing
conversion or a one-off?  pydoc itself is a somewhat naive displayer of text
that's in docstrings.  docstrings are just text formatted so that humans can
look at it and assign meaning based on their past experience.  So simply
converting from pydoc's output to docbook isn't likely to work well.

There's two different approaches I can think of to take -- I haven't used
either but they have a higher likelihood of working than raw pydoc:

* Happydoc is the only tool I've found that extracts python API
  documentation and spits out docbook.  happydoc parses python source files
  to do this.  This allows doing things like extracting both comments and
  docstrings but it also means it does not work with C extensions, only pure
  python code.  I haven't played around with happydoc in years so I don't
  know how good it is at guesing what the semantic meaning of docstrings are
  in order to output nice docbook.
  http://happydoc.sourceforge.net/

* Sphinx is a tool that is widely used by python modules (the python stdlib
  uses docs handcoded in sphinx/restructuredtext.  Other python modules use
  sphinx to extract documentation from docstrings).  You write your
  docstrings in restructuredtext with many extensions to mark the semantic
  content of your document.  Then the sphinx toolchain is used to convert
  that into an output format.  There is not currently a builder for docbook
  so if you went this route, someone would have to write a builder that does
  that.  The advantage that balances that is that sphinx is a semantic markup
  system so you have well-defined markup entities that you can convert from
  rather than guessing.
  http://sphinx.pocoo.org/contents.html

-Toshio

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Converting PyDoc to Docbook XML?