[etherlab-users] Debug device crashes while sending PDOs under RTAI
Frank Heckenbach
f.heckenbach at fh-soft.de
Fri Apr 15 21:46:35 CEST 2011
Hi,
after solving my problem with the e1000 driver
(http://lists.etherlab.org/pipermail/etherlab-users/2011/001190.html),
I can now communicate with the slave devices, send and receive SDOs
and use EoE. However, when I tried PDO communication under RTAI,
using the RTAI example program, the first thing I got after
inserting the module was a kernel bug (reproducible). Afterwards, as
usual in such cases, I couldn't unload the module properly, and
after some attempts, the system would lock up completely and I had
to reboot.
Apr 11 20:34:46 (none) kernel: [344979.737086] ec_rtai_sample: Starting...
Apr 11 20:34:46 (none) kernel: [344979.737102] EtherCAT: Requesting master 0...
Apr 11 20:34:46 (none) kernel: [344979.737192] EtherCAT: Successfully requested master 0.
Apr 11 20:34:46 (none) kernel: [344979.737206] ec_rtai_sample: Registering domain...
Apr 11 20:34:46 (none) kernel: [344979.737256] ec_rtai_sample: Configuring PDOs...
Apr 11 20:34:46 (none) kernel: [344979.737293] ec_rtai_sample: Registering PDO entries...
Apr 11 20:34:46 (none) kernel: [344979.737328] ec_rtai_sample: Activating master...
Apr 11 20:34:46 (none) kernel: [344979.737362] EtherCAT: Domain0: Logical address 0x00000000, 7 byte, expected working counter 3.
Apr 11 20:34:46 (none) kernel: [344979.737387] EtherCAT: Datagram domain0-0: Logical offset 0x00000000, 7 byte, type LRW.
Apr 11 20:34:46 (none) kernel: [344979.737411] EtherCAT: Stopping EoE processing.
Apr 11 20:34:46 (none) kernel: [344979.737486] EtherCAT: Master thread exited.
Apr 11 20:34:46 (none) kernel: [344979.737519] EtherCAT: Starting EtherCAT-OP thread.
Apr 11 20:34:46 (none) kernel: [344979.737559] EtherCAT: Starting EoE processing.
Apr 11 20:34:46 (none) kernel: [344979.737573] ec_rtai_sample: Starting cyclic sample thread...
Apr 11 20:34:46 (none) kernel: [344979.737593] ec_rtai_sample: RT timer started with 597/597 ticks.
Apr 11 20:34:46 (none) kernel: [344979.737609] ec_rtai_sample: Initialized.
Apr 11 20:34:46 (none) kernel: [344979.738106]
Apr 11 20:34:46 (none) kernel: [344979.738107] LXRT CHANGED MODE (TRAP), PID = 3360, VEC = 6, SIGNO = 4.
Apr 11 20:34:46 (none) kernel: [344979.738146] ------------[ cut here ]------------
Apr 11 20:34:46 (none) kernel: [344979.738161] Kernel BUG at c013a453 [verbose debug info unavailable]
Apr 11 20:34:46 (none) kernel: [344979.738178] invalid opcode: 0000 [#1]
Apr 11 20:34:46 (none) kernel: [344979.738193] Modules linked in: ec_rtai_sample(F) ec_e1000 ec_master xt_tcpudp ipt_MASQUERADE ipv6 af_packet ipt_REJECT xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables rtai_math rtai_fifos dm_mod loop rt_e1000_new rtnet snd_hda_intel rtai_rtdm snd_pcm_oss snd_mixer_oss rtai_sem rtai_lxrt rtai_hal snd_pcm snd_timer snd_page_alloc snd_hwdep serio_raw snd i2c_i801 psmouse heci i2c_core e1000e intel_agp agpgart pcspkr soundcore evdev ext3 jbd mbcache sg sd_mod usbhid hid ata_generic ata_piix r8169 pata_jmicron libata scsi_mod ehci_hcd uhci_hcd usbcore fuse
Apr 11 20:34:46 (none) kernel: [344979.738403]
Apr 11 20:34:46 (none) kernel: [344979.738415] Pid: 3360, comm: U:HARD:0:14 Tainted: PF (2.6.24-16-rtai #1)
Apr 11 20:34:46 (none) kernel: [344979.738440] EIP: 0060:[<c013a453>] EFLAGS: 00010202 CPU: 0
Apr 11 20:34:46 (none) kernel: [344979.738459] EIP is at __ipipe_restore_root+0xc/0x22
Apr 11 20:34:46 (none) kernel: [344979.738475] EAX: 00000001 EBX: c02f5b80 ECX: 00000000 EDX: c020cb1c
Apr 11 20:34:46 (none) kernel: [344979.738496] ESI: 00000000 EDI: f7f75f00 EBP: c02f5bec ESP: dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.738511] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Apr 11 20:34:46 (none) kernel: [344979.738526] Process U:HARD:0:14 (pid: 3360, ti=dfa08000 task=dfbfaac0 task.ti=dfa08000)<0>
Apr 11 20:34:46 (none) kernel: [344979.738542] I-pipe domain Linux
Apr 11 20:34:46 (none) kernel: [344979.738554] Stack: c0153ee9 f7807cd0 c02dd400 f7f75f00 00000020 f7c6d020 0000004c c02f5b80
Apr 11 20:34:46 (none) kernel: [344979.738589] 0000002e 00000020 c020cb1c f7f1be08 00000000 f7f1be08 df909058 0000003c
Apr 11 20:34:46 (none) kernel: [344979.738625] 0000002e 0000003c f8eaf45d ffffffff df909084 f7fe7812 00000011 df909058
Apr 11 20:34:46 (none) kernel: [344979.738660] Call Trace:
Apr 11 20:34:46 (none) kernel: [344979.738680] [<c0153ee9>] kmem_cache_alloc+0x6e/0xa6
Apr 11 20:34:46 (none) kernel: [344979.738698] [<c020cb1c>] __alloc_skb+0x2d/0x10c
Apr 11 20:34:46 (none) kernel: [344979.738714] [<f8eaf45d>] ec_debug_send+0x31/0x17b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738738] [<f8ea37fd>] ecdev_receive+0x48/0x5b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738761] [<f8832887>] e1000_clean_rx_irq+0x2b8/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738782] [<f8832637>] e1000_clean_rx_irq+0x68/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738803] [<f88325cf>] e1000_clean_rx_irq+0x0/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738823] [<f882dde0>] e1000_intr+0xc9/0x15c [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738842] [<f8ea3635>] ec_device_poll+0x10/0x11 [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738863] [<f8eaaa14>] ecrt_master_receive+0x11/0xca [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738886] [<f8a374c2>] rt_schedule+0x3ca/0x742 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738912] [<f890e217>] run+0x26/0xcb [ec_rtai_sample]
Apr 11 20:34:46 (none) kernel: [344979.738928] [<f8a39a4a>] kthread_fun+0x113/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738952] [<f8a39937>] kthread_fun+0x0/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738973] [<c0104087>] kernel_thread_helper+0x7/0x10
Apr 11 20:34:46 (none) kernel: [344979.738989] =======================
Apr 11 20:34:46 (none) kernel: [344979.739002] Code: 0b eb fe fa 0f ba 35 a4 14 2e c0 00 83 3d a8 14 2e c0 00 74 08 83 c8 ff e8 f4 fb ff ff fb c3 81 3d 24 19 2e c0 80 31 38 c0 74 04 <0f> 0b eb fe 85 c0 74 09 0f ba 2d a4 14 2e c0 00 c3 e9 b2 ff ff
Apr 11 20:34:46 (none) kernel: [344979.739129] EIP: [<c013a453>] __ipipe_restore_root+0xc/0x22 SS:ESP 0068:dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.739351] ---[ end trace 97ed01d355d65d2b ]---
Apr 11 20:35:06 (none) kernel: [344999.105184] ec_rtai_sample: Stopping...
Apr 11 20:35:06 (none) kernel: [344999.105230] EtherCAT: Releasing master 0...
Apr 11 20:35:06 (none) kernel: [344999.105268] EtherCAT: Stopping EoE processing.
Apr 11 20:35:06 (none) kernel: [344999.105399] EtherCAT: Master thread exited.
Apr 11 20:35:06 (none) kernel: [344999.105460] EtherCAT: Starting EtherCAT-IDLE thread.
Apr 11 20:35:06 (none) kernel: [344999.105525] EtherCAT: Starting EoE processing.
Apr 11 20:35:06 (none) kernel: [344999.105563] EtherCAT: Released master 0.
Apr 11 20:35:06 (none) kernel: [344999.105600] ec_rtai_sample: Unloading.
Apr 11 20:35:06 (none) kernel: [344999.114247] EtherCAT DEBUG: Slave 1 is not configured.
Apr 11 20:35:06 (none) kernel: [344999.149701] EtherCAT DEBUG: Slave 0 is not configured.
Fortunately, the call trace quickly showed what went wrong: The
debug network interface tried to allocate some memory which crashes
when called from the cyclic RTAI task. I suppose that's what this
paragraph in the manual alludes to:
"Attention The socket buffers needed for the operation of debug
interfaces have to be allocated dynamically. Some Linux realtime
extensions do not allow this in realtime context!"
BTW, I think that's a little understatement. If something is not
allowed, I'd expect to get some kind of error message rather than a
system lockup. But I guess that's just a matter of wording in the
manual, since the problem itself isn't going away.
Anyway, since I need to use the debug device to analyze SDO traffic,
and I really don't like to reboot every time I forget to shut it
down before starting PDO transfers, I implemented the following
workaround: A flag to temporarily disable the debug device, which
can be set by ec_debug_disable(). I've modified rtai_sample.c to do
this in the cyclic task. Of course, one still cannot analyze PDO
packets this way, but at least other packets without crashing.
--- ethercat-1.4.0/include/ecrt.h.orig 2008-12-29 16:27:39.000000000 +0100
+++ ethercat-1.4.0/include/ecrt.h 2011-04-12 14:16:02.000000000 +0200
@@ -897,6 +897,12 @@
ec_sdo_request_t *req /**< SDO request. */
);
+/** Temporarily disable the debug interface.
+ */
+void ec_debug_disable(
+ int disable /**< 1 to disable, 0 to re-enable. */
+ );
+
/******************************************************************************
* Bitwise read/write macros
*****************************************************************************/
--- ethercat-1.4.0/master/debug.c.orig 2008-12-29 15:10:27.000000000 +0100
+++ ethercat-1.4.0/master/debug.c 2011-04-12 14:05:34.000000000 +0200
@@ -39,6 +39,8 @@
/*****************************************************************************/
+static int ec_debug_disabled = 0;
+
// net_device functions
int ec_dbgdev_open(struct net_device *);
int ec_dbgdev_stop(struct net_device *);
@@ -120,7 +122,7 @@
{
struct sk_buff *skb;
- if (!dbg->opened) return;
+ if (!dbg->opened || ec_debug_disabled) return;
// allocate socket buffer
if (!(skb = dev_alloc_skb(size))) {
@@ -142,6 +144,17 @@
netif_rx(skb);
}
+/*****************************************************************************/
+
+/**
+ Temporarily disable the debug interface.
+*/
+
+void ec_debug_disable(int disable)
+{
+ ec_debug_disabled = disable;
+}
+
/******************************************************************************
* NET_DEVICE functions
*****************************************************************************/
@@ -203,3 +216,11 @@
}
/*****************************************************************************/
+
+/** \cond */
+
+EXPORT_SYMBOL(ec_debug_disable);
+
+/** \endcond */
+
+/*****************************************************************************/
--- ethercat-1.4.0/examples/rtai/rtai_sample.c.orig 2008-12-29 16:19:16.000000000 +0100
+++ ethercat-1.4.0/examples/rtai/rtai_sample.c 2011-04-12 14:11:20.000000000 +0200
@@ -204,8 +204,12 @@
// receive process data
rt_sem_wait(&master_sem);
+ // disable the debug interface which is not RTAI-safe
+ ec_debug_disable(1);
ecrt_master_receive(master);
ecrt_domain_process(domain1);
+ // re-enable the debug interface
+ ec_debug_disable(0);
rt_sem_signal(&master_sem);
// check process data state (optional)
@@ -230,8 +234,12 @@
EC_WRITE_U8(domain1_pd + off_dig_out, blink ? 0x06 : 0x09);
rt_sem_wait(&master_sem);
+ // disable the debug interface which is not RTAI-safe
+ ec_debug_disable(1);
ecrt_domain_queue(domain1);
ecrt_master_send(master);
+ // re-enable the debug interface
+ ec_debug_disable(0);
rt_sem_signal(&master_sem);
rt_task_wait_period();
Another thing: When I looked through the code for other problematic
places WRT possible memory allocation in the cyclic task, I found
the following where I think a check for adapter->ecdev is needed. I
only looked at the e1000 driver for 2.6.24 because that's what we
will be using. The issue may also exist in other versions.
--- ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c.orig 2011-03-24 18:27:40.000000000 +0100
+++ ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c 2011-04-12 13:13:16.000000000 +0200
@@ -3386,7 +3386,8 @@
if (!__pskb_pull_tail(skb, pull_size)) {
DPRINTK(DRV, ERR,
"__pskb_pull_tail failed.\n");
- dev_kfree_skb_any(skb);
+ if (!adapter->ecdev)
+ dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
len = skb->len - skb->data_len;
Regards,
Frank
--
Dipl.-Math. Frank Heckenbach <f.heckenbach at fh-soft.de>
Systemprogrammierung, EDV-Beratung
Stubenlohstr. 6, 91052 Erlangen, Deutschland
Tel.: +49-9131-21359
More information about the Etherlab-users
mailing list