Em Sat, 23 Feb 2013 06:47:39 +0100
poma <pomidorabelisima(a)gmail.com> escreveu:
On 02/23/13 02:30, Mauro Carvalho Chehab wrote:
> Em Sat, 23 Feb 2013 02:09:01 +0100
> poma <pomidorabelisima(a)gmail.com> escreveu:
>
>> On 02/23/13 01:09, Mauro Carvalho Chehab wrote:
>>> Em Fri, 22 Feb 2013 18:48:25 -0500
>>> Josh Boyer <jwboyer(a)redhat.com> escreveu:
>>>
>>>> On Sat, Feb 23, 2013 at 12:29:24AM +0100, poma wrote:
>>>>> On 02/22/13 19:15, Bruno Wolff III wrote:
>>>>>> On Fri, Feb 22, 2013 at 10:17:17 -0500,
>>>>>> Josh Boyer <jwboyer(a)redhat.com> wrote:
>>>>>>> Just a quick heads up that the 3.9 merge window kernels are
being built
>>>>>>> in rawhide now. I've tried to at least test boot
kernels on my machine
>>>>>>> before submitting them to koji, but testers beware. Merge
window
>>>>>>> kernels can be tricky.
>>>>>>
>>>>>
>>>>> Euphemism for broken? ;)
>>>>
>>>> No. A euphemism for that would be "operationally
challenged".
>>>>
>>>>>> I have git3 running on three machines and things seem fine.
I'll be
>>>>>> trying out git5 very soon.
>>>>>
>>>>> Linux version 3.9.0-0.rc0.git5.1.fc19.x86_64
>>>>> (mockbuild(a)bkernel01.phx2.fedoraproject.org) (gcc version 4.8.0
20130220
>>>>> (Red Hat 4.8.0-0.14) (GCC) ) #1 SMP Fri Feb 22 16:35:19 UTC 2013
>>>>> …
>>>>> =============================================
>>>>> [ INFO: possible recursive locking detected ]
>>>>> 3.9.0-0.rc0.git5.1.fc19.x86_64 #1 Not tainted
>>>>> ---------------------------------------------
>>>>> kworker/0:1/36 is trying to acquire lock:
>>>>> (hdl->lock){+.+...}, at: [<ffffffffa042d745>]
find_ref_lock+0x25/0x60
>>>>> [videodev]
>>>>>
>>>>> but task is already holding lock:
>>>>> (hdl->lock){+.+...}, at: [<ffffffffa0430888>]
>>>>> v4l2_ctrl_add_handler+0x78/0xf0 [videodev]
>>>>>
>>>>> other info that might help us debug this:
>>>>> Possible unsafe locking scenario:
>>>>>
>>>>> CPU0
>>>>> ----
>>>>> lock(hdl->lock);
>>>>> lock(hdl->lock);
>>>>>
>>>>> *** DEADLOCK ***
>>>>>
>>>>> May be due to missing lock nesting notation
>>>>>
>>>>> 3 locks held by kworker/0:1/36:
>>>>> #0: (events){.+.+.+}, at: [<ffffffff8108d870>]
>>>>> process_one_work+0x190/0x680
>>>>> #1: ((&wfc.work)){+.+.+.}, at: [<ffffffff8108d870>]
>>>>> process_one_work+0x190/0x680
>>>>> #2: (hdl->lock){+.+...}, at: [<ffffffffa0430888>]
>>>>> v4l2_ctrl_add_handler+0x78/0xf0 [videodev]
>>>>>
>>>>> stack backtrace:
>>>>> Pid: 36, comm: kworker/0:1 Not tainted
3.9.0-0.rc0.git5.1.fc19.x86_64 #1
>>>>> Call Trace:
>>>>> [<ffffffff810da843>] __lock_acquire+0x19e3/0x1a80
>>>>> [<ffffffff81021d69>] ? sched_clock+0x9/0x10
>>>>> [<ffffffff810acf15>] ? sched_clock_cpu+0xb5/0x100
>>>>> [<ffffffff810db092>] lock_acquire+0xa2/0x1f0
>>>>> [<ffffffffa042d745>] ? find_ref_lock+0x25/0x60 [videodev]
>>>>> [<ffffffff81711820>] mutex_lock_nested+0x80/0x3c0
>>>>> [<ffffffffa042d745>] ? find_ref_lock+0x25/0x60 [videodev]
>>>>> [<ffffffffa042d745>] ? find_ref_lock+0x25/0x60 [videodev]
>>>>> [<ffffffff810d8a7d>] ? trace_hardirqs_on_caller+0xfd/0x190
>>>>> [<ffffffffa042d745>] find_ref_lock+0x25/0x60 [videodev]
>>>>> [<ffffffffa042fce6>] handler_new_ref+0x46/0x1f0 [videodev]
>>>>> [<ffffffffa04308c7>] v4l2_ctrl_add_handler+0xb7/0xf0
[videodev]
>>>>> [<ffffffffa042a657>] v4l2_device_register_subdev+0x97/0x180
[videodev]
>>>>> [<ffffffffa04caa5e>] ivtv_gpio_init+0x13e/0x180 [ivtv]
>>>>> [<ffffffffa04c5cb4>] ivtv_probe+0x914/0x1bf0 [ivtv]
>>>>> [<ffffffff810d8904>] ? mark_held_locks+0xb4/0x130
>>>>> [<ffffffff81714636>] ? _raw_spin_unlock_irqrestore+0x36/0x70
>>>>> [<ffffffff810d8b1d>] ? trace_hardirqs_on+0xd/0x10
>>>>> [<ffffffff813890ae>] local_pci_probe+0x3e/0x70
>>>>> [<ffffffff810898a4>] work_for_cpu_fn+0x14/0x20
>>>>> [<ffffffff8108d8d9>] process_one_work+0x1f9/0x680
>>>>> [<ffffffff8108d870>] ? process_one_work+0x190/0x680
>>>>> [<ffffffff8108e0e1>] worker_thread+0x111/0x3c0
>>>>> [<ffffffff8108dfd0>] ? rescuer_thread+0x270/0x270
>>>>> [<ffffffff810938fd>] kthread+0xed/0x100
>>>>> [<ffffffff81093810>] ? insert_kthread_work+0x80/0x80
>>>>> [<ffffffff8171e16c>] ret_from_fork+0x7c/0xb0
>>>>> [<ffffffff81093810>] ? insert_kthread_work+0x80/0x80
>>>>> …
>>>>
>>>>> ------------[ cut here ]------------
>>>>> WARNING: at lib/dma-debug.c:947 check_for_stack+0xa0/0x100()
>>>>> …
>>>>> Pid: 38, comm: kworker/2:1 Not tainted
3.9.0-0.rc0.git5.1.fc19.x86_64 #1
>>>>> Call Trace:
>>>>> [<ffffffff81068670>] warn_slowpath_common+0x70/0xa0
>>>>> [<ffffffff810686ec>] warn_slowpath_fmt+0x4c/0x50
>>>>> [<ffffffff81378dd0>] check_for_stack+0xa0/0x100
>>>>> [<ffffffff81379180>] debug_dma_map_page+0x100/0x140
>>>>> [<ffffffff814e6fdb>] usb_hcd_map_urb_for_dma+0x55b/0x620
>>>>> [<ffffffff814e7335>] usb_hcd_submit_urb+0x295/0x8d0
>>>>> [<ffffffff810d625c>] ? lockdep_init_map+0x9c/0x470
>>>>> [<ffffffff814e8905>] usb_submit_urb+0x155/0x3d0
>>>>> [<ffffffff814e9274>] usb_start_wait_urb+0x74/0x190
>>>>> [<ffffffff814e9831>] usb_bulk_msg+0xc1/0x170
>>>>> [<ffffffffa058c646>] dvb_usbv2_generic_rw+0xc6/0x260
[dvb_usb_v2]
>>>>> [<ffffffffa05943be>] af9015_ctrl_msg+0x16e/0x260
[dvb_usb_af9015]
>>>>> [<ffffffff8136be2a>] ? debug_object_deactivate+0x8a/0x170
>>>>> [<ffffffffa0594bbc>] af9015_identify_state+0x3c/0x90
[dvb_usb_af9015]
>>>>> [<ffffffffa058b08d>] dvb_usbv2_init_work+0x6d/0x10e0
[dvb_usb_v2]
>>>>> [<ffffffff8108d8d9>] process_one_work+0x1f9/0x680
>>>>> [<ffffffff8108d870>] ? process_one_work+0x190/0x680
>>>>> [<ffffffff8108e0e1>] worker_thread+0x111/0x3c0
>>>>> [<ffffffff8108dfd0>] ? rescuer_thread+0x270/0x270
>>>>> [<ffffffff810938fd>] kthread+0xed/0x100
>>>>> [<ffffffff81093810>] ? insert_kthread_work+0x80/0x80
>>>>> [<ffffffff8171e16c>] ret_from_fork+0x7c/0xb0
>>>>> [<ffffffff81093810>] ? insert_kthread_work+0x80/0x80
>>>>> ---[ end trace b4a4208b0ed39626 ]---
>>>>
>>>> Mauro, any idea on these two?
>>>
>>> The first one looks to be a know issue: __initdev inside ivtv/cx18, is
delaying
>>> initialization to happen after registering the device. There's a patch
for it,
>>> pending upstream merge.
>>>
>>
>> This one is gone after after applying
>> 7d180f91eaabfbb813f1eda9e39e4445bfb826fc branch 'master' of
>>
git://linuxtv.org/media_build
>
> OK.
>
>>> The second one seems to be a bad lock. I think I saw some discussion about
>>> that at the media ML, but I can't remember the odd details. Perhaps
there's
>>> a fix for it already. It should be noticed that there's a pending pull
>>> request with about ~600 patches for 3.9 that is pending its merge.
>>> The ivtv fix should be there. I hope Linus will merge it soon.
>>>
>>
>> This one stayed;
>> …
>> ------------[ cut here ]------------
>> WARNING: at lib/dma-debug.c:947 check_for_stack+0xa7/0xf0()
>> Hardware name: M720-US3
>> ehci-pci 0000:00:04.1: DMA-API: device driver maps memory fromstack
>> [addr=ffff8801283d7be1]
>> Modules linked in: dvb_usb_af9015(OF+)… dvb_usb_v2(OF)… dvb_core(OF)
>> wm8775(OF) tda9887(OF) tda8290(OF)… tuner(OF)… cx25840(OF)… bttv(OF+)…
>> rc_core(OF) btcx_risc(OF) ivtv(OF) snd_bt87x… videobuf_dma_sg(OF)
>> videobuf_core(OF)… cx2341x(OF) tveeprom(OF) v4l2_common(OF)…
>> videodev(OF)… video i2c_algo_bit… (irrelevant omitted)
>> Pid: 38, comm: kworker/2:1 Tainted: GF O
>> 3.9.0-0.rc0.git5.1.fc18.x86_64 #1
>> Call Trace:
>> usb 4-2: pl2303 converter now attached to ttyUSB0
>> [<ffffffff81069b5f>] warn_slowpath_common+0x7f/0xc0
>> [<ffffffff81069c56>] warn_slowpath_fmt+0x46/0x50
>> [<ffffffff81386467>] check_for_stack+0xa7/0xf0
>> [<ffffffff81388228>] debug_dma_map_page+0x118/0x150
>> [<ffffffff814fa5d9>] usb_hcd_map_urb_for_dma+0x569/0x630
>> [<ffffffff811be091>] ? set_track+0x61/0x1b0
>> [<ffffffff814fa945>] usb_hcd_submit_urb+0x2a5/0x900
>> [<ffffffff810d814c>] ? lockdep_init_map+0xac/0x540
>> [<ffffffff814fc07f>] usb_submit_urb+0xff/0x3d0
>> [<ffffffff814fd042>] usb_start_wait_urb+0x82/0x1a0
>> [<ffffffff814fbdae>] ? usb_alloc_urb+0x1e/0x50
>> [<ffffffff814fd22c>] usb_bulk_msg+0xcc/0x180
>> [<ffffffffa05be7e8>] dvb_usbv2_generic_rw+0xd8/0x270 [dvb_usb_v2]
>> [<ffffffffa05cc5da>] af9015_ctrl_msg+0x1da/0x280 [dvb_usb_af9015]
>> [<ffffffff810ae7aa>] ? cpuacct_charge+0xfa/0x210
>> [<ffffffff810ae6c3>] ? cpuacct_charge+0x13/0x210
>> [<ffffffffa05ccb4c>] af9015_identify_state+0x3c/0x90 [dvb_usb_af9015]
>> [<ffffffffa05bcf18>] dvb_usbv2_init_work+0x68/0x11a0 [dvb_usb_v2]
>> [<ffffffff8108e36f>] ? process_one_work+0x19f/0x690
>> [<ffffffff8108e3db>] process_one_work+0x20b/0x690
>> [<ffffffff8108e36f>] ? process_one_work+0x19f/0x690
>> [<ffffffff8108ec10>] worker_thread+0x110/0x380
>> [<ffffffff8108eb00>] ? rescuer_thread+0x260/0x260
>> [<ffffffff8109569d>] kthread+0xed/0x100
>> [<ffffffff810955b0>] ? flush_kthread_worker+0x190/0x190
>> [<ffffffff8173882c>] ret_from_fork+0x7c/0xb0
>> [<ffffffff810955b0>] ? flush_kthread_worker+0x190/0x190
>> ---[ end trace 4f4bc1c2abe300b5 ]---
>> --
>
> Ah, you're running it on a non-x86 hardware, right? On (modern) x86 machines,
> almost all memories can do DMA, including the area used by stack. That's
> not true on other archs like arm.
>
uname -m
x86_64
Morning is wiser than evening. :)
Well, sometimes, even on x86, the stack may happen to be on a memory
area where DMA transfers aren't allowed (or maybe two drivers may be
trying to do DMA transfers at the stack at the same time).
> I suspect that Antti never tried to run this driver outside
x86.
>
> Let me look into it...
>
> static int af9015_ctrl_msg(struct dvb_usb_device *d, struct req_t *req)
> {
> #define BUF_LEN 63
> #define REQ_HDR_LEN 8 /* send header size */
> #define ACK_HDR_LEN 2 /* rece header size */
> struct af9015_state *state = d_to_priv(d);
> int ret, wlen, rlen;
> u8 buf[BUF_LEN];
> ...
> ret = dvb_usbv2_generic_rw(d, buf, wlen, buf, rlen);
>
>
>
> int dvb_usbv2_generic_rw(struct dvb_usb_device *d, u8 *wbuf, u16 wlen, u8 *rbuf,
> u16 rlen)
> {
> int ret, actual_length;
>
> ...
> ret = usb_bulk_msg(d->udev, usb_sndbulkpipe(d->udev,
> d->props->generic_bulk_ctrl_endpoint), wbuf, wlen,
> &actual_length, 2000);
>
>
> That's the problem: "buf" is at stack, and it is passed directly to
> usb_bulk_msg(). This doesn't work on -arm (and sometimes may cause
> data to be lost even on x86).
>
> In order to fix it, the driver should be doing, instead, something like:
>
> static int af9015_ctrl_msg(struct dvb_usb_device *d, struct req_t *req)
> {
> ...
> u8 *buf = kmalloc(BUF_LEN, GFP_ATOMIC);
>
> ...
>
> kfree(buf);
> ...
> }
>
> You shouldn't blame the merge windows kernel here. This never worked
> on arm, even on older kernel versions ;)
>
Given that a zillion of code is in the rush, to work at all is miracle
enough.
Agreed.
> A more permanent fix would be to allocate a temporary 80 bytes buffer
> embedded with dvb_usb_v2 structs, and change dvb_usbv2_generic_rw to
> always use this temporary buffer for the control URB transfers.
>
> Antti,
>
> Could you please tale a look in it?
>
> Thanks!
> Mauro
>
Mauro, with your patch 'af9015.c-3.9.0-0.rc0.git5.1.fc18.x86_64.diff'
now the media tree is fine
--- linux/drivers/media/usb/dvb-usb-v2/af9015.c.orig 2013-02-23
05:57:03.911801068 +0100
+++ linux/drivers/media/usb/dvb-usb-v2/af9015.c 2013-02-23 05:57:21.327733911 +0100
@@ -35,7 +35,7 @@
#define ACK_HDR_LEN 2 /* rece header size */
struct af9015_state *state = d_to_priv(d);
int ret, wlen, rlen;
- u8 buf[BUF_LEN];
+ u8 *buf = kmalloc(BUF_LEN, GFP_ATOMIC);
u8 write = 1;
buf[0] = req->cmd;
@@ -114,6 +114,8 @@
/* read request, copy returned data to return buf */
if (!write)
memcpy(req->data, &buf[ACK_HDR_LEN], req->data_len);
+
+ kfree(buf);
error:
return ret;
}
Yeah, getting hid of stack usage there is enough to solve such
problem ;)
, *but* network… here we go again, rock you
like a hurricane! :)
:) I can't you help much with the net stack. I barely touched on that
area of the Kernel.
------------[ cut here ]------------
WARNING: at lib/dma-debug.c:933 check_unmap+0x407/0x8a0()
Hardware name: M720-US3
skge 0000:01:09.0: DMA-API: device driver failed to check map
error[device address=0x0000000105d52522] [size=90 bytes] [mapped as single]
Modules linked in:
…uetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_conntrack_ipv4
nf_defrag_ipv6 nf_defrag_ipv4 xt_conntrack ip6table_filter nf_conntrack
ip6_tables… vhost_net tun macvtap… macvlan… nfsd auth_rpcgss nfs_acl
lockd… r8169… mii skge… sunrpc
Funny enough, it is also a DMA error...
I'm starting to suspect that maybe some change at the dma stack made
it a little more pedantic ;)
Two similar errors on different drivers is enough for me to take a deeper
look into it to solve the mystery ;)
Basically, those got generated by lib/dma-debug.c.
On the af9015 case, this is a simple test, done at debug_dma_map_page().
It checks if the area is in stack and print an error. I suspect that, on
x86, the driver will still run, on most cases.
In the case of the network driver, it detected that
debug_dma_mapping_error() was never called. Also, I believe that the
network driver will still work, and this is more a warning that bad
things might happen with the driver, if the check is not done.
This error indicator seems to be cleaned by this code, at
arch/x86/include/asm/dma-mapping.h:
static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
{
struct dma_map_ops *ops = get_dma_ops(dev);
debug_dma_mapping_error(dev, dma_addr);
if (ops->mapping_error)
return ops->mapping_error(dev, dma_addr);
return (dma_addr == DMA_ERROR_CODE);
}
From Documentation/DMA-API.txt:
int
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
In some circumstances dma_map_single and dma_map_page will fail to create
a mapping. A driver can check for these errors by testing the returned
dma address with dma_mapping_error(). A non-zero return value means the mapping
could not be created and the driver should take appropriate action (e.g.
reduce current DMA mapping usage or delay and try again later).
I _suspect_ that the network driver is calling dma_map_single() or
dma_map_page() but not calling dma_map_error().
The only network driver there on the list of drivers on your error is r8169.
Weird enough, at least on linux-next, r8169 seems ok:
mapping = dma_map_single(d, rtl8169_align(data), rx_buf_sz,
DMA_FROM_DEVICE);
if (unlikely(dma_mapping_error(d, mapping))) {
if (net_ratelimit())
netif_err(tp, drv, tp->dev, "Failed to map RX
DMA!\n");
goto err_out;
}
Perhaps it was not on r8169 driver, but on some other network driver that
does a similar logic, but without checking for errors.
$ git grep -L dma_mapping_error $(git grep -l -e dma_map_single -e dma_map_page
drivers/net/)
drivers/net/ethernet/broadcom/bcm63xx_enet.c
drivers/net/ethernet/brocade/bna/bnad.c
drivers/net/ethernet/cadence/at91_ether.c
drivers/net/ethernet/cadence/macb.c
drivers/net/ethernet/dec/tulip/de4x5.c
drivers/net/ethernet/freescale/fec.c
drivers/net/ethernet/freescale/fec_mpc52xx.c
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c
drivers/net/ethernet/freescale/gianfar.c
drivers/net/ethernet/freescale/ucc_geth.c
drivers/net/ethernet/i825xx/lib82596.c
drivers/net/ethernet/ibm/emac/core.c
drivers/net/ethernet/intel/e1000/e1000_ethtool.c
drivers/net/ethernet/lantiq_etop.c
drivers/net/ethernet/marvell/mv643xx_eth.c
drivers/net/ethernet/marvell/pxa168_eth.c
drivers/net/ethernet/mellanox/mlx4/en_tx.c
drivers/net/ethernet/natsemi/sonic.c
drivers/net/ethernet/octeon/octeon_mgmt.c
drivers/net/ethernet/realtek/8139cp.c
drivers/net/ethernet/renesas/sh_eth.c
drivers/net/ethernet/seeq/sgiseeq.c
drivers/net/ethernet/sfc/net_driver.h
drivers/net/ethernet/sgi/meth.c
drivers/net/ethernet/smsc/smc911x.h
drivers/net/ethernet/smsc/smc91x.h
drivers/net/ethernet/stmicro/stmmac/chain_mode.c
drivers/net/ethernet/stmicro/stmmac/ring_mode.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
drivers/net/ethernet/sun/niu.c
drivers/net/ethernet/sun/sunbmac.c
drivers/net/ethernet/sun/sunhme.c
drivers/net/ethernet/ti/cpmac.c
drivers/net/ethernet/ti/davinci_cpdma.c
drivers/net/ethernet/toshiba/ps3_gelic_net.c
drivers/net/ethernet/tundra/tsi108_eth.c
drivers/net/ethernet/xilinx/ll_temac_main.c
drivers/net/ethernet/xilinx/xilinx_axienet_main.c
drivers/net/fddi/defxx.c
drivers/net/wireless/brcm80211/brcmsmac/dma.c
Oh, that's scary... Ok, a few of the above may be false-positives, but
someone should take a deeper look on it ;)
Regards,
Mauro
Pid: 0, comm: swapper/3 Tainted: GF O
3.9.0-0.rc0.git5.1.fc18.x86_64 #1
Call Trace:
<IRQ> [<ffffffff81069b5f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81069c56>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81387a47>] check_unmap+0x407/0x8a0
[<ffffffff810ab62f>] ? try_to_wake_up+0x1ff/0x350
[<ffffffff813880fe>] debug_dma_unmap_page+0x5e/0x70
[<ffffffffa00750ac>] ? skge_intr+0x2c/0x570 [skge]
[<ffffffffa0073525>] skge_poll+0x1b5/0x9c0 [skge]
[<ffffffff815e7672>] net_rx_action+0x172/0x380
[<ffffffff810737e0>] __do_softirq+0x100/0x400
[<ffffffff81021dd3>] ? native_sched_clock+0x13/0x80
[<ffffffff81739c7c>] call_softirq+0x1c/0x30
[<ffffffff8101c435>] do_softirq+0xa5/0xe0
[<ffffffff81073cd5>] irq_exit+0xd5/0xe0
[<ffffffff8173a5b3>] do_IRQ+0x63/0xe0
[<ffffffff8172f732>] common_interrupt+0x72/0x72
<EOI> [<ffffffff8172f150>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff8172f154>] ? _raw_spin_unlock_irq+0x34/0x50
[<ffffffff8172f150>] ? _raw_spin_unlock_irq+0x30/0x50
[<ffffffff810a3ddc>] finish_task_switch+0x7c/0x120
[<ffffffff810a3d9f>] ? finish_task_switch+0x3f/0x120
[<ffffffff8172cd02>] __schedule+0x442/0xa00
[<ffffffff8172d5f9>] schedule+0x29/0x70
[<ffffffff81023abf>] cpu_idle+0x12f/0x140
[<ffffffff8171a70e>] start_secondary+0x269/0x26b
---[ end trace d41e441560f83f6a ]---
Mapped at:
[<ffffffff813881b9>] debug_dma_map_page+0xa9/0x150
[<ffffffffa0074c83>] skge_xmit_frame+0x163/0x560 [skge]
[<ffffffff815e8330>] dev_hard_start_xmit+0x260/0x6f0
[<ffffffff8160bb0e>] sch_direct_xmit+0xfe/0x2a0
[<ffffffff815e8a07>] dev_queue_xmit+0x247/0x970
…
--
Cheers,
poma
--
Cheers,
Mauro