Dear Jiri,

I was under enormous stress the last weeks, so:

I thank you very much for the research and all the code!

I will have more time soon and look forward to add your changes!
(and look at running latrace on a real powerpc (without args parsing first) soon!)

Many thanks,
Bernhard

On Mit, 2010-10-06 at 11:21 +0200, Jiri Olsa wrote:
hi,

I finally got the gdb running... haleluya :)

you can try vfork-setjmp branch in
git://fedorapeople.org/home/fedora/jolsa/public_git/latrace.git

the fix is probably just temporary, since I'm not sure there's
better way to fix it directly in glibc

I'll keep you posted... I'm attaching the description of the issues,
any ideas are welcome ;)

once I run of ideas, I'll make a glibc BZ and unleash some anger ;)

wbr,
jirka


----------------------------------------------------------------------
setjmp calls family:

This is caused by the runtime linker to call the pltexit handler.

In this case:

        - the 'framesize' portion of stack is copied on top of the
          current stack
        - setjmp is called over the new stack
        - pltexit handler is called

The setjmp function works by storing environment and return IP
to the jmp_buf buffer. The longjmp afterwards reads the jmp_buf
and jumps to its return IP.

The point is that setjmp gets the return IP in the stack via:

        movq (%rsp), %rax       /* Save PC we are returning to now.  */

So if it is called from the pltexit handler path,
it will get wrong return IP, since it does not expect LD_AUDIT stuff
on the stack..

this might be probably fixed within the setjmp function itself,
to check if the auditing is active... I'm looking on that ;)


----------------------------------------------------------------------
vfork:

there're 2 main thing about vfork:
- once vfork is called the parrent is stoped and child continues.
- both parent and child share the same address space

So whatever function called right after the vfork call within
the child path, will mess the return vfork address for the parent.

And again only in the pltexit path, since it's using stack for storing
return values.

I'm not sure this could be fixed in glibc..