[etherlab-users] Debug device crashes while sending PDOs under RTAI

Frank Heckenbach f.heckenbach at fh-soft.de
Fri Apr 15 21:46:35 CEST 2011


Hi,

after solving my problem with the e1000 driver
(http://lists.etherlab.org/pipermail/etherlab-users/2011/001190.html),
I can now communicate with the slave devices, send and receive SDOs
and use EoE. However, when I tried PDO communication under RTAI,
using the RTAI example program, the first thing I got after
inserting the module was a kernel bug (reproducible). Afterwards, as
usual in such cases, I couldn't unload the module properly, and
after some attempts, the system would lock up completely and I had
to reboot.

Apr 11 20:34:46 (none) kernel: [344979.737086] ec_rtai_sample: Starting...
Apr 11 20:34:46 (none) kernel: [344979.737102] EtherCAT: Requesting master 0...
Apr 11 20:34:46 (none) kernel: [344979.737192] EtherCAT: Successfully requested master 0.
Apr 11 20:34:46 (none) kernel: [344979.737206] ec_rtai_sample: Registering domain...
Apr 11 20:34:46 (none) kernel: [344979.737256] ec_rtai_sample: Configuring PDOs...
Apr 11 20:34:46 (none) kernel: [344979.737293] ec_rtai_sample: Registering PDO entries...
Apr 11 20:34:46 (none) kernel: [344979.737328] ec_rtai_sample: Activating master...
Apr 11 20:34:46 (none) kernel: [344979.737362] EtherCAT: Domain0: Logical address 0x00000000, 7 byte, expected working counter 3.
Apr 11 20:34:46 (none) kernel: [344979.737387] EtherCAT:   Datagram domain0-0: Logical offset 0x00000000, 7 byte, type LRW.
Apr 11 20:34:46 (none) kernel: [344979.737411] EtherCAT: Stopping EoE processing.
Apr 11 20:34:46 (none) kernel: [344979.737486] EtherCAT: Master thread exited.
Apr 11 20:34:46 (none) kernel: [344979.737519] EtherCAT: Starting EtherCAT-OP thread.
Apr 11 20:34:46 (none) kernel: [344979.737559] EtherCAT: Starting EoE processing.
Apr 11 20:34:46 (none) kernel: [344979.737573] ec_rtai_sample: Starting cyclic sample thread...
Apr 11 20:34:46 (none) kernel: [344979.737593] ec_rtai_sample: RT timer started with 597/597 ticks.
Apr 11 20:34:46 (none) kernel: [344979.737609] ec_rtai_sample: Initialized.
Apr 11 20:34:46 (none) kernel: [344979.738106]
Apr 11 20:34:46 (none) kernel: [344979.738107] LXRT CHANGED MODE (TRAP), PID = 3360, VEC = 6, SIGNO = 4.
Apr 11 20:34:46 (none) kernel: [344979.738146] ------------[ cut here ]------------
Apr 11 20:34:46 (none) kernel: [344979.738161] Kernel BUG at c013a453 [verbose debug info unavailable]
Apr 11 20:34:46 (none) kernel: [344979.738178] invalid opcode: 0000 [#1]
Apr 11 20:34:46 (none) kernel: [344979.738193] Modules linked in: ec_rtai_sample(F) ec_e1000 ec_master xt_tcpudp ipt_MASQUERADE ipv6 af_packet ipt_REJECT xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_filter ip_tables x_tables rtai_math rtai_fifos dm_mod loop rt_e1000_new rtnet snd_hda_intel rtai_rtdm snd_pcm_oss snd_mixer_oss rtai_sem rtai_lxrt rtai_hal snd_pcm snd_timer snd_page_alloc snd_hwdep serio_raw snd i2c_i801 psmouse heci i2c_core e1000e intel_agp agpgart pcspkr soundcore evdev ext3 jbd mbcache sg sd_mod usbhid hid ata_generic ata_piix r8169 pata_jmicron libata scsi_mod ehci_hcd uhci_hcd usbcore fuse
Apr 11 20:34:46 (none) kernel: [344979.738403]
Apr 11 20:34:46 (none) kernel: [344979.738415] Pid: 3360, comm: U:HARD:0:14 Tainted: PF       (2.6.24-16-rtai #1)
Apr 11 20:34:46 (none) kernel: [344979.738440] EIP: 0060:[<c013a453>] EFLAGS: 00010202 CPU: 0
Apr 11 20:34:46 (none) kernel: [344979.738459] EIP is at __ipipe_restore_root+0xc/0x22
Apr 11 20:34:46 (none) kernel: [344979.738475] EAX: 00000001 EBX: c02f5b80 ECX: 00000000 EDX: c020cb1c
Apr 11 20:34:46 (none) kernel: [344979.738496] ESI: 00000000 EDI: f7f75f00 EBP: c02f5bec ESP: dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.738511]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Apr 11 20:34:46 (none) kernel: [344979.738526] Process U:HARD:0:14 (pid: 3360, ti=dfa08000 task=dfbfaac0 task.ti=dfa08000)<0>
Apr 11 20:34:46 (none) kernel: [344979.738542] I-pipe domain Linux
Apr 11 20:34:46 (none) kernel: [344979.738554] Stack: c0153ee9 f7807cd0 c02dd400 f7f75f00 00000020 f7c6d020 0000004c c02f5b80
Apr 11 20:34:46 (none) kernel: [344979.738589]        0000002e 00000020 c020cb1c f7f1be08 00000000 f7f1be08 df909058 0000003c
Apr 11 20:34:46 (none) kernel: [344979.738625]        0000002e 0000003c f8eaf45d ffffffff df909084 f7fe7812 00000011 df909058
Apr 11 20:34:46 (none) kernel: [344979.738660] Call Trace:
Apr 11 20:34:46 (none) kernel: [344979.738680]  [<c0153ee9>] kmem_cache_alloc+0x6e/0xa6
Apr 11 20:34:46 (none) kernel: [344979.738698]  [<c020cb1c>] __alloc_skb+0x2d/0x10c
Apr 11 20:34:46 (none) kernel: [344979.738714]  [<f8eaf45d>] ec_debug_send+0x31/0x17b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738738]  [<f8ea37fd>] ecdev_receive+0x48/0x5b [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738761]  [<f8832887>] e1000_clean_rx_irq+0x2b8/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738782]  [<f8832637>] e1000_clean_rx_irq+0x68/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738803]  [<f88325cf>] e1000_clean_rx_irq+0x0/0x4cc [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738823]  [<f882dde0>] e1000_intr+0xc9/0x15c [ec_e1000]
Apr 11 20:34:46 (none) kernel: [344979.738842]  [<f8ea3635>] ec_device_poll+0x10/0x11 [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738863]  [<f8eaaa14>] ecrt_master_receive+0x11/0xca [ec_master]
Apr 11 20:34:46 (none) kernel: [344979.738886]  [<f8a374c2>] rt_schedule+0x3ca/0x742 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738912]  [<f890e217>] run+0x26/0xcb [ec_rtai_sample]
Apr 11 20:34:46 (none) kernel: [344979.738928]  [<f8a39a4a>] kthread_fun+0x113/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738952]  [<f8a39937>] kthread_fun+0x0/0x181 [rtai_lxrt]
Apr 11 20:34:46 (none) kernel: [344979.738973]  [<c0104087>] kernel_thread_helper+0x7/0x10
Apr 11 20:34:46 (none) kernel: [344979.738989]  =======================
Apr 11 20:34:46 (none) kernel: [344979.739002] Code: 0b eb fe fa 0f ba 35 a4 14 2e c0 00 83 3d a8 14 2e c0 00 74 08 83 c8 ff e8 f4 fb ff ff fb c3 81 3d 24 19 2e c0 80 31 38 c0 74 04 <0f> 0b eb fe 85 c0 74 09 0f ba 2d a4 14 2e c0 00 c3 e9 b2 ff ff
Apr 11 20:34:46 (none) kernel: [344979.739129] EIP: [<c013a453>] __ipipe_restore_root+0xc/0x22 SS:ESP 0068:dfa09e90
Apr 11 20:34:46 (none) kernel: [344979.739351] ---[ end trace 97ed01d355d65d2b ]---
Apr 11 20:35:06 (none) kernel: [344999.105184] ec_rtai_sample: Stopping...
Apr 11 20:35:06 (none) kernel: [344999.105230] EtherCAT: Releasing master 0...
Apr 11 20:35:06 (none) kernel: [344999.105268] EtherCAT: Stopping EoE processing.
Apr 11 20:35:06 (none) kernel: [344999.105399] EtherCAT: Master thread exited.
Apr 11 20:35:06 (none) kernel: [344999.105460] EtherCAT: Starting EtherCAT-IDLE thread.
Apr 11 20:35:06 (none) kernel: [344999.105525] EtherCAT: Starting EoE processing.
Apr 11 20:35:06 (none) kernel: [344999.105563] EtherCAT: Released master 0.
Apr 11 20:35:06 (none) kernel: [344999.105600] ec_rtai_sample: Unloading.
Apr 11 20:35:06 (none) kernel: [344999.114247] EtherCAT DEBUG: Slave 1 is not configured.
Apr 11 20:35:06 (none) kernel: [344999.149701] EtherCAT DEBUG: Slave 0 is not configured.

Fortunately, the call trace quickly showed what went wrong: The
debug network interface tried to allocate some memory which crashes
when called from the cyclic RTAI task. I suppose that's what this
paragraph in the manual alludes to:

  "Attention  The socket buffers needed for the operation of debug
  interfaces have to be allocated dynamically. Some Linux realtime
  extensions do not allow this in realtime context!"

BTW, I think that's a little understatement. If something is not
allowed, I'd expect to get some kind of error message rather than a
system lockup. But I guess that's just a matter of wording in the
manual, since the problem itself isn't going away.

Anyway, since I need to use the debug device to analyze SDO traffic,
and I really don't like to reboot every time I forget to shut it
down before starting PDO transfers, I implemented the following
workaround: A flag to temporarily disable the debug device, which
can be set by ec_debug_disable(). I've modified rtai_sample.c to do
this in the cyclic task. Of course, one still cannot analyze PDO
packets this way, but at least other packets without crashing.

--- ethercat-1.4.0/include/ecrt.h.orig	2008-12-29 16:27:39.000000000 +0100
+++ ethercat-1.4.0/include/ecrt.h	2011-04-12 14:16:02.000000000 +0200
@@ -897,6 +897,12 @@
         ec_sdo_request_t *req /**< SDO request. */
         );
 
+/** Temporarily disable the debug interface.
+ */
+void ec_debug_disable(
+        int disable /**< 1 to disable, 0 to re-enable. */
+        );
+
 /******************************************************************************
  * Bitwise read/write macros
  *****************************************************************************/
--- ethercat-1.4.0/master/debug.c.orig	2008-12-29 15:10:27.000000000 +0100
+++ ethercat-1.4.0/master/debug.c	2011-04-12 14:05:34.000000000 +0200
@@ -39,6 +39,8 @@
 
 /*****************************************************************************/
 
+static int ec_debug_disabled = 0;
+
 // net_device functions
 int ec_dbgdev_open(struct net_device *);
 int ec_dbgdev_stop(struct net_device *);
@@ -120,7 +122,7 @@
 {
     struct sk_buff *skb;
 
-    if (!dbg->opened) return;
+    if (!dbg->opened || ec_debug_disabled) return;
 
     // allocate socket buffer
     if (!(skb = dev_alloc_skb(size))) {
@@ -142,6 +144,17 @@
     netif_rx(skb);
 }
 
+/*****************************************************************************/
+
+/**
+   Temporarily disable the debug interface.
+*/
+
+void ec_debug_disable(int disable)
+{
+    ec_debug_disabled = disable;
+}
+
 /******************************************************************************
  *  NET_DEVICE functions
  *****************************************************************************/
@@ -203,3 +216,11 @@
 }
 
 /*****************************************************************************/
+
+/** \cond */
+
+EXPORT_SYMBOL(ec_debug_disable);
+
+/** \endcond */
+
+/*****************************************************************************/
--- ethercat-1.4.0/examples/rtai/rtai_sample.c.orig	2008-12-29 16:19:16.000000000 +0100
+++ ethercat-1.4.0/examples/rtai/rtai_sample.c	2011-04-12 14:11:20.000000000 +0200
@@ -204,8 +204,12 @@
 
         // receive process data
         rt_sem_wait(&master_sem);
+        // disable the debug interface which is not RTAI-safe
+        ec_debug_disable(1);
         ecrt_master_receive(master);
         ecrt_domain_process(domain1);
+        // re-enable the debug interface
+        ec_debug_disable(0);
         rt_sem_signal(&master_sem);
 
         // check process data state (optional)
@@ -230,8 +234,12 @@
         EC_WRITE_U8(domain1_pd + off_dig_out, blink ? 0x06 : 0x09);
 
         rt_sem_wait(&master_sem);
+        // disable the debug interface which is not RTAI-safe
+        ec_debug_disable(1);
         ecrt_domain_queue(domain1);
         ecrt_master_send(master);
+        // re-enable the debug interface
+        ec_debug_disable(0);
         rt_sem_signal(&master_sem);
 		
         rt_task_wait_period();

Another thing: When I looked through the code for other problematic
places WRT possible memory allocation in the cyclic task, I found
the following where I think a check for adapter->ecdev is needed. I
only looked at the e1000 driver for 2.6.24 because that's what we
will be using. The issue may also exist in other versions.

--- ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c.orig	2011-03-24 18:27:40.000000000 +0100
+++ ethercat-1.4.0/devices/e1000/e1000_main-2.6.24-ethercat.c	2011-04-12 13:13:16.000000000 +0200
@@ -3386,7 +3386,8 @@
 				if (!__pskb_pull_tail(skb, pull_size)) {
 					DPRINTK(DRV, ERR,
 						"__pskb_pull_tail failed.\n");
-					dev_kfree_skb_any(skb);
+					if (!adapter->ecdev)
+						dev_kfree_skb_any(skb);
 					return NETDEV_TX_OK;
 				}
 				len = skb->len - skb->data_len;

Regards,
Frank
 
-- 
Dipl.-Math. Frank Heckenbach <f.heckenbach at fh-soft.de>
Systemprogrammierung, EDV-Beratung
Stubenlohstr. 6, 91052 Erlangen, Deutschland
Tel.: +49-9131-21359



More information about the Etherlab-users mailing list