[etherlab-dev] CCAT startup failures
Patrick Brünn
P.Bruenn at beckhoff.com
Mon Oct 26 09:21:50 CET 2015
Hi Scott,
To fix the startup stall, please try the appended patch.
I tested it over the weekend and found no more "BUG! Tx Ring full" in the kernel log this morning.
However, I cannot reproduce the hard lockups. Do you have any hint on how to provoke them?
My setup:
Beckhoff CX5130
Ubuntu 14.04
Kernel 4.1.7-rt8
Stable 1.5.2 Etherlab with recent CCAT v0.13 bundle
My test script:
while :
do
/etc/init.d/ethercat restart
sleep 5
done
The patch:
# HG changeset patch
# User Patrick Bruenn <p.bruenn at beckhoff.com>
# Date 1445844490 -3600
# Mon Oct 26 08:28:10 2015 +0100
# Branch stable-1.5
# Node ID 1d9be5716deec3c7e6a21dae85a8ad2ef8825b92
# Parent 0613017547feee7a2189fe71724ca5bcd1fcb61f
fix ccat startup stall. In case the dma memory was allocated above 4 GB limit, CCAT is unable to access it.
-> limit the dma_mask to 32-bit to omit this situation
diff -r 0613017547fe -r 1d9be5716dee devices/ccat/module.c
--- a/devices/ccat/module.c Tue Oct 13 12:24:16 2015 +0200
+++ b/devices/ccat/module.c Mon Oct 26 08:28:10 2015 +0100
@@ -288,9 +288,8 @@
goto cleanup_pci_device;
}
- if (!dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64))) {
- pr_debug("64 bit DMA supported, pci rev: %u\n", revision);
- } else if (!dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))) {
+ /* CCAT is unable to access memory above 4 GB */
+ if (!dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32))) {
pr_debug("32 bit DMA supported, pci rev: %u\n", revision);
} else {
pr_warn("No suitable DMA available, pci rev: %u\n", revision);
Regards,
Patrick
From: etherlab-dev [mailto:etherlab-dev-bounces at etherlab.org] On Behalf Of Tillman, Scott
Sent: Freitag, 23. Oktober 2015 05:52
To: etherlab-dev at etherlab.org
Subject: [etherlab-dev] CCAT startup failures
I am having problems with the startup of the CCAT driver (both v12 to the new v13). When I boot up the driver is loading (ccat shows via lsmod). When I start the master it swaps over to ec_ccat, but sometimes the master never begins communicating on the bus. In these situations I can often remove and reinsert the ec_ccat module and the master will start up. Occasionally I get a hard lockup of the system (Alt-SysRq-B won't reboot) when trying to unload or reload the module, but that's possibly unrelated. I'm looking for solutions to both problems (startup stall and hard lockups).
System details:
Beckhoff CX5130
Ubuntu 14.04 (Server) with ubuntu-desktop installed
Kernel 3.16.7 with RTAI 4.1 ipipe patch
Stable 1.5.2 Etherlab with recent CCAT v0.13 bundle
Some changes that are included in the bundle I've already posted
I am accessing the master via the user space library and using the LXRT library for accessing RTAI.
At this point I'm looking for suggestions on how to try to establish the root cause of these problems. I don't have much experience debugging kernel crashes, and what experience I do have is 15 years out of date. I have no prior exposure to the linux network subsystem.
Any help would be appreciated. Below are snippets of the syslog around various interesting conditions. Let me know if I can provide any other possibly useful data.
-Scott Tillman
Advisory Software Engineer
Bell + Howell
919-767-6745
What I know so far:
I occasionally see this constantly repeating in my syslog during the stalled state:
Oct 21 10:58:03 ubuntu kernel: [ 17.184386] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 21 10:58:03 ubuntu kernel: [ 17.237043] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 21 10:58:03 ubuntu kernel: [ 17.289702] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 21 10:58:03 ubuntu kernel: [ 17.342599] (unregistered net_device): BUG! Tx Ring full when queue awake!
etc. This continues until I rmmod ec_ccat.
Relevant logs on initial startup try in stalled state:
Oct 22 23:25:58 ubuntu dhclient: DHCPDISCOVER on CX5130_x001 to 255.255.255.255 port 67 interval 3 (xid=0x730db6c3)
Oct 22 23:25:58 ubuntu kernel: [ 12.476197] EtherCAT: Master driver 1.5.2 71593adce088+
Oct 22 23:25:58 ubuntu kernel: [ 12.481558] EtherCAT 0: Registering RTDM device EtherCAT.
Oct 22 23:25:58 ubuntu kernel: [ 12.481586] EtherCAT: 1 master waiting for devices.
Oct 22 23:25:58 ubuntu avahi-daemon[623]: Interface CX5130_ebus.IPv6 no longer relevant for mDNS.
Oct 22 23:25:58 ubuntu avahi-daemon[623]: Leaving mDNS multicast group on interface CX5130_ebus.IPv6 with address fe80::201:5ff:fe1a:d56d.
Oct 22 23:25:58 ubuntu avahi-daemon[623]: Withdrawing address record for fe80::201:5ff:fe1a:d56d on CX5130_ebus.
Oct 22 23:25:58 ubuntu avahi-daemon[623]: Withdrawing workstation service for CX5130_ebus.
Oct 22 23:25:58 ubuntu NetworkManager[964]: SCPlugin-Ifupdown: devices removed (path: /sys/devices/virtual/net/CX5130_ebus, iface: CX5130_ebus)
Oct 22 23:25:58 ubuntu kernel: [ 12.603567] ec_ccat: registered ec_ccat as gpio chip with #12 GPIOs.
Oct 22 23:25:58 ubuntu kernel: [ 12.603770] EtherCAT: Accepting 00:01:05:1A:D5:6D as main device for master 0.
Oct 22 23:25:58 ubuntu kernel: [ 12.603794] EtherCAT 0: Starting EtherCAT-IDLE thread.
Oct 22 23:25:58 ubuntu kernel: [ 12.603846] (unregistered net_device): NIC Link is Up
Oct 22 23:25:58 ubuntu kernel: [ 12.603870] EtherCAT 0: Link state of ecm0 changed to UP.
Oct 22 23:25:58 ubuntu kernel: [ 12.608008] ec_ccat: registered ccat_update0.
Oct 22 23:25:58 ubuntu kernel: [ 12.608031] ec_ccat: ccat_sram_probe: 0x0016 rev: 0x0000
Oct 22 23:25:58 ubuntu kernel: [ 12.655994] EtherCAT WARNING 0: 1 datagram TIMED OUT!
Oct 22 23:25:59 ubuntu NetworkManager[964]: <warn> error monitoring device for netlink events: error processing netlink message: Object busy
Oct 22 23:25:59 ubuntu avahi-daemon[623]: Joining mDNS multicast group on interface CX5130_x001.IPv6 with address fe80::201:5ff:fe1d:7dac.
Oct 22 23:25:59 ubuntu avahi-daemon[623]: New relevant interface CX5130_x001.IPv6 for mDNS.
Oct 22 23:25:59 ubuntu avahi-daemon[623]: Registering new address record for fe80::201:5ff:fe1d:7dac on CX5130_x001.*.
Oct 22 23:25:59 ubuntu kernel: [ 13.706520] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:26:00 ubuntu kernel: [ 14.748575] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:26:01 ubuntu dbus[443]: [system] Activating service name='org.freedesktop.UPower' (using servicehelper)
Oct 22 23:26:01 ubuntu dbus[443]: [system] Successfully activated service 'org.freedesktop.UPower'
Oct 22 23:26:01 ubuntu dhclient: DHCPDISCOVER on CX5130_x001 to 255.255.255.255 port 67 interval 5 (xid=0x730db6c3)
Oct 22 23:26:02 ubuntu kernel: [ 15.790895] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:26:02 ubuntu kernel: [ 15.893849] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:26:02 ubuntu kernel: [ 15.946642] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:26:02 ubuntu kernel: [ 16.000730] (unregistered net_device): BUG! Tx Ring full when queue awake!
At this time the 'ethercat master' command returns something like this:
Master0
Phase: Idle
Active: no
Slaves: 0
Ethernet devices:
Main: 00:01:05:1a:d5:6d (attached)
Link: UP
Tx frames: 63
Tx bytes: 3780
Rx frames: 0
Rx bytes: 0
Tx errors: 1294
Tx frame rate [1/s]: 0 0 0
Tx rate [KByte/s]: 0.0 0.0 0.1
Rx frame rate [1/s]: 0 0 0
Rx rate [KByte/s]: 0.0 0.0 0.0
Common:
Tx frames: 63
Tx bytes: 3780
Rx frames: 0
Rx bytes: 0
Lost frames: 63
Tx frame rate [1/s]: 0 0 0
Tx rate [KByte/s]: 0.0 0.0 0.1
Rx frame rate [1/s]: 0 4 2405
Rx rate [KByte/s]: 0.0 4.7 1455.5
Loss rate [1/s]: 0 0 0
Frame loss [%]: 0.0 100.0 100.0
Distributed clocks:
Reference clock: None
Application time: 0
2000-01-01 00:00:00.000000000
On an unsuccessful unload/reload of ec_ccat:
Oct 22 23:30:33 ubuntu kernel: [ 287.196371] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:30:33 ubuntu kernel: [ 287.249216] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:30:33 ubuntu kernel: [ 287.301650] EtherCAT 0: Master thread exited.
Oct 22 23:30:33 ubuntu kernel: [ 287.301675] EtherCAT 0: Releasing main device 00:01:05:1A:D5:6D.
Oct 22 23:30:40 ubuntu kernel: [ 293.382320] ec_ccat: registered ec_ccat as gpio chip with #12 GPIOs.
Oct 22 23:30:40 ubuntu kernel: [ 293.382606] EtherCAT: Accepting 00:01:05:1A:D5:6D as main device for master 0.
Oct 22 23:30:40 ubuntu kernel: [ 293.382624] EtherCAT 0: Starting EtherCAT-IDLE thread.
Oct 22 23:30:40 ubuntu kernel: [ 293.382691] (unregistered net_device): NIC Link is Up
Oct 22 23:30:40 ubuntu kernel: [ 293.382715] EtherCAT 0: Link state of ecm0 changed to UP.
Oct 22 23:30:40 ubuntu kernel: [ 293.384380] EtherCAT WARNING 0: 8 datagrams TIMED OUT!
Oct 22 23:30:40 ubuntu kernel: [ 293.386028] ec_ccat: registered ccat_update0.
Oct 22 23:30:40 ubuntu kernel: [ 293.386073] ec_ccat: ccat_sram_probe: 0x0016 rev: 0x0000
Oct 22 23:30:41 ubuntu kernel: [ 294.431521] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:30:42 ubuntu kernel: [ 295.472201] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:30:43 ubuntu kernel: [ 296.511475] EtherCAT WARNING 0: 20 datagrams TIMED OUT!
Oct 22 23:30:43 ubuntu kernel: [ 296.669281] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:30:43 ubuntu kernel: [ 296.721893] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:30:43 ubuntu kernel: [ 296.774468] (unregistered net_device): BUG! Tx Ring full when queue awake!
On a successful unload and reload of ec_ccat:
Oct 22 23:32:34 ubuntu kernel: [ 408.038227] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:32:34 ubuntu kernel: [ 408.090745] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:32:34 ubuntu kernel: [ 408.143279] (unregistered net_device): BUG! Tx Ring full when queue awake!
Oct 22 23:32:34 ubuntu kernel: [ 408.188016] EtherCAT 0: Master thread exited.
Oct 22 23:32:34 ubuntu kernel: [ 408.188049] EtherCAT 0: Releasing main device 00:01:05:1A:D5:6D.
Oct 22 23:32:37 ubuntu kernel: [ 411.085382] ec_ccat: registered ec_ccat as gpio chip with #12 GPIOs.
Oct 22 23:32:37 ubuntu kernel: [ 411.085704] EtherCAT: Accepting 00:01:05:1A:D5:6D as main device for master 0.
Oct 22 23:32:37 ubuntu kernel: [ 411.085726] EtherCAT 0: Starting EtherCAT-IDLE thread.
Oct 22 23:32:37 ubuntu kernel: [ 411.085788] (unregistered net_device): NIC Link is Up
Oct 22 23:32:37 ubuntu kernel: [ 411.085809] EtherCAT 0: Link state of ecm0 changed to UP.
Oct 22 23:32:37 ubuntu kernel: [ 411.086907] ec_ccat: registered ccat_update0.
Oct 22 23:32:37 ubuntu kernel: [ 411.086995] ec_ccat: ccat_sram_probe: 0x0016 rev: 0x0000
Oct 22 23:32:37 ubuntu kernel: [ 411.090003] EtherCAT WARNING 0: 2 datagrams TIMED OUT!
Oct 22 23:32:37 ubuntu kernel: [ 411.090014] EtherCAT WARNING 0: 1 datagram UNMATCHED!
Oct 22 23:32:37 ubuntu kernel: [ 411.094116] EtherCAT 0: 14 slave(s) responding on main device.
Oct 22 23:32:37 ubuntu kernel: [ 411.094128] EtherCAT 0: Slave states on main device: PREOP.
Oct 22 23:32:37 ubuntu kernel: [ 411.094226] EtherCAT 0: Scanning bus.
Oct 22 23:32:39 ubuntu kernel: [ 412.698595] EtherCAT 0: Bus scanning completed in 1604 ms.
Oct 22 23:32:39 ubuntu kernel: [ 412.698606] EtherCAT 0: Using slave 0 as DC reference clock.
Oct 22 23:32:39 ubuntu kernel: [ 412.700210] EtherCAT: Requesting master 0...
Oct 22 23:32:39 ubuntu kernel: [ 412.700226] EtherCAT: Successfully requested master 0.
Oct 22 23:32:39 ubuntu kernel: [ 412.709429] EtherCAT 0: Domain0: Logical address 0x00000000, 114 byte, expected working counter 32.
Oct 22 23:32:39 ubuntu kernel: [ 412.709440] EtherCAT 0: Datagram domain0-0-main: Logical offset 0x00000000, 114 byte, type LRW.
Oct 22 23:32:39 ubuntu kernel: [ 412.709833] EtherCAT 0: Master thread exited.
Oct 22 23:32:39 ubuntu kernel: [ 412.709858] EtherCAT 0: Starting EtherCAT-OP thread.
Oct 22 23:32:39 ubuntu kernel: [ 412.710010] EtherCAT WARNING 0: 1 datagram TIMED OUT!
Oct 22 23:32:39 ubuntu kernel: [ 412.742278] EtherCAT WARNING 0: No app_time received up to now, but master already active.
Oct 22 23:32:39 ubuntu kernel: [ 412.784239] EtherCAT 0: Domain 0: Working counter changed to 1/32.
Oct 22 23:32:39 ubuntu kernel: [ 412.895474] EtherCAT WARNING 0-2: Slave does not support changing the PDO mapping!
Oct 22 23:32:39 ubuntu kernel: [ 412.895484] EtherCAT WARNING 0-2: Currently mapped PDO entries: 0x7010:01/1 0x7010:02/1 0x7010:03/1 0x7010:04/1 0x0000:00/4 0x0000:00/8 0x7010:11/32. Entries to map: 0x7010:01/1 0x7010:02/1 0x7010:03/1 0x7010:04/1 0x0000:00/12 0x7010:11/32
Oct 22 23:32:39 ubuntu kernel: [ 413.102650] EtherCAT WARNING 0-3: Slave does not support changing the PDO mapping!
Oct 22 23:32:39 ubuntu kernel: [ 413.102660] EtherCAT WARNING 0-3: Currently mapped PDO entries: 0x6000:01/1 0x6000:02/1 0x6000:03/2 0x6000:05/2 0x6000:07/1 0x0000:00/1 0x0000:00/6 0x1800:07/1 0x1800:09/1 0x6000:11/16. Entries to map: 0x6000:01/1 0x6000:02/1 0x6000:03/2 0x6000:05/2 0x6000:07/1 0x0000:01/1 0x0000:02/6 0x1800:07/1 0x1800:09/1 0x6000:11/16
Oct 22 23:32:40 ubuntu kernel: [ 413.714330] EtherCAT WARNING 0: 6 datagrams UNMATCHED!
Oct 22 23:32:40 ubuntu kernel: [ 413.715135] EtherCAT WARNING: Datagram ffff880096230d98 (domain0-0-main) was SKIPPED 6 times.
Oct 22 23:32:40 ubuntu kernel: [ 413.787102] EtherCAT 0: Domain 0: 7 working counter changes - now 13/32.
Oct 22 23:32:40 ubuntu kernel: [ 414.508889] EtherCAT 0: Slave states on main device: OP.
Oct 22 23:32:41 ubuntu kernel: [ 414.792978] EtherCAT 0: Domain 0: 5 working counter changes - now 21/32.
I am aware of the PDO mapping warnings displayed (this system happened to have a new type of slice), but the stalls were occurring well before I added that device's support, plus the stalls are before any comms with the slaves anyway.
Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20151026/11bb6545/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-ccat-startup-stall.patch
Type: application/octet-stream
Size: 1107 bytes
Desc: fix-ccat-startup-stall.patch
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20151026/11bb6545/attachment-0001.obj>
More information about the etherlab-dev
mailing list