On Fri, 2012-02-24 at 13:44 +0000, Daniele Varrazzo wrote:
Hello,
I'm using the CPython static analyzer on psycopg2 and it's being quite
helpful: so far it has discovered several quirks. They are mostly
relative to unlikely code paths, such as out of memory during module
import, nonetheless I'm trying to fix everything. Here is something to
report:
Thanks for trying it.
I should note that even the modules within Python's own standard library
don't seem to gracefully handle all possible out-of-memory errors during
initialization. In the mass analysis I've been doing on all python
extensions in Fedora 17 I wrote a triaging script (see
misc/fedora/makeindex.py), and I rate any errors that happen during
module initialization at a lower priority than those at other times.
I'll try to answer the various questions inline below.
1. Running the repos head out-of-the-box, the program stops quite
early with an error. The attached steal-ref.patch resolves it, but I
don't know if the fix is correct.
2. The library uses PyWeakref_GetObject, returning a borrowed ref:
I've implemented it, with tests, in the other attached patch.
Thanks! This looks very good.
I spotted one minor issue with the patch, in the handling of the case of
a NULL argument:
self.state.raise_any_null_ptr_func_arg(stmt, 0, v_op,
why=invokes_Py_TYPE_via_macro(fnmeta,
'PyWeakref_Check'))
The above is the logic to use for a function that will segfault when
passed NULL for a given argument: the analysis will backtrack (if
necessary) to consider the NULL vs non-NULL case, and for the NULL case,
an exception will be raised which will be converted to a compile-time
warning.
Looking at the implementation of PyWeakref_GetObject in Python 2.6 and
2.7 (in hg), I see this logic:
PyObject *
PyWeakref_GetObject(PyObject *ref)
{
if (ref == NULL || !PyWeakref_Check(ref)) {
PyErr_BadInternalCall();
return NULL;
}
So it looks like this function gracefully handles a NULL input value
without segfaulting the interpreter: instead, an exception is raised via
PyErr_BadInternalCall (in recent versions of 2.6 and 2.7 at least)
So I've committed and pushed your patch (as
1a2b735554c9c3cc4d416f827512309a54709b1e), but I've changed the error
handling to this:
if isinstance(v_op, UnknownValue):
self.state.raise_split_value(v_op, stmt.loc)
if v_op.is_null_ptr():
s_failure = self.state.mkstate_concrete_return_of(stmt, 0)
s_failure.cpython.bad_internal_call(stmt.loc)
return [Transition(self.state,
s_failure,
'%s() fails due to NULL argument' %fnmeta.name)]
and added a test case for this (also the log of the transitions in
stdout.txt covering the case of a NULL input changed for the other two
cases, reflecting the above change).
3. The run still doesn't complete: after a lot of output it
tracebacks with:
What kind of code do you see this on? (fwiw the gcc error for the
traceback should have the file/line number of the statement which the
analyzer was working on when the traceback happened)
....
File "/home/piro/fs/gcc-python-plugin/libcpychecker/absinterp.py",
line 2451, in eval_rhs
return self.eval_rvalue(rhs[0], stmt.loc)
File "/home/piro/fs/gcc-python-plugin/libcpychecker/absinterp.py",
line 1498, in eval_rvalue
region = self.get_field_region(expr, loc)#.target, expr.field.name)
File "/home/piro/fs/gcc-python-plugin/libcpychecker/absinterp.py",
line 1660, in get_field_region
Looks like it's trying to evaluate:
some_ptr->some_field
where some_ptr is an UnknownValue
self.raise_split_value(ptr)
Here it tries to split the analysis into the NULL vs non-NULL cases
File
"/home/piro/fs/gcc-python-plugin/libcpychecker/absinterp.py",
line 1984, in raise_split_value
check_isinstance(ptr_rvalue.gcctype, gcc.PointerType)
Here it's
checking that it is indeed a pointer...
File "/home/piro/fs/gcc-python-plugin/gccutils.py",
line 629, in
check_isinstance
raise TypeError('%s / %r is not an instance of %s' % (obj, obj, types))
TypeError: None / None is not an instance of <type 'gcc.PointerType'>
...but it isn't; the type is None, for some reason.
I haven't debugged it yet: will try when I arrive to that point,
for
the moment I'm fixing psycopg instead.
4. Another weird function psycopg uses is _PyString_Resize
(_PyBytes_Resize in Python 3): the function looks no more public (it
was in the API docs for Python 2.0), and is used in the
reimplementation of the % operator for bytes that psycopg uses
internally (see [1], as _Bytes_Resize). This weird function takes a
PyString as input and returns 0 on success, else returns -1 *and
steals a reference*. The quirkness is reported in [2]. I guess it
would take a "CPython.make_transitions_for_success_or_steal()" method,
but haven't seen yet how to implement it.
You can make the transitions individually. Have a look at e.g.
impl_PyObject_SetAttrString() which manually makes separate states for
the success and failure cases, then uses
self.state.make_transitions_for_fncall to make the transitions to those
states. Or see e.g. impl_PyObject_IsTrue which makes three new states,
then manually makes three transitions to them. See e.g.
impl_PyModule_AddObject which has an example of stealing a reference.
5. While at it, make_transitions_for_borrowed_ref_or_fail docstring
says it takes an optional name: it's probably a pasto from the above
docstring.
Oops, yes - good catch; fixed in git as
69d1ea676435b02ea1d1d6aaa4a84c6d3a23d768
6. Finally, haven't been able to create a ticket to
https://fedorahosted.org/gcc-python-plugin/: I should probably log in
but can't find how to create an account, unless opening a project on
fedora. Maybe I'm just sleepy.
I believe that the Trac is tied into Fedora's account system, so you
need a Fedora account:
https://admin.fedoraproject.org/accounts/user/new
(reload that page if the captcha is too insane)
Will try to provide further patches as I go ahead cleaning up
psycopg.
The branch where I'm cleaning up is at [3] (in case you want to
reproduce the reported issues).
Excellent - thanks!
Thank you very much for the tool: it seems very useful.
You're welcome - I'm very happy to see it get used.