[etherlab-users] Bus Scan

Mon Jun 8 17:19:10 CEST 2015

In my case after calling (the user lib version of) ecrt_master_get_slave() the error_flag entry of ec_slave_info_t is zero even for the slaves whose Vendor ID and Product Code are zero.  For reference here are examples of the log entries I'm seeing when one of the slaves is left in this corrupted state:

Jun  8 10:41:25 LPS-3 kernel: [323780.968384] EtherCAT DEBUG 0: Scanning slave 3 on main link.
Jun  8 10:41:25 LPS-3 kernel: [323780.973710] EtherCAT DEBUG 0: TIMED OUT datagram ffff88002eae66e8, index 61 waited 20000 us
Jun  8 10:41:25 LPS-3 kernel: [323780.975570] EtherCAT DEBUG 0: UNMATCHED datagram:
Jun  8 10:41:25 LPS-3 kernel: [323780.975603] EtherCAT DEBUG: 04 61 04 00 30 01 02 00 00 00 02 00 01 00
Jun  8 10:41:25 LPS-3 kernel: [323780.979456] EtherCAT DEBUG 0-3: Slave has the System Time register.
Jun  8 10:41:25 LPS-3 kernel: [323780.983177] EtherCAT DEBUG 0-3: Port 0 link status changed to up.
Jun  8 10:41:25 LPS-3 kernel: [323780.983181] EtherCAT DEBUG 0-3: Port 0 loop status changed to open.
Jun  8 10:41:25 LPS-3 kernel: [323780.983184] EtherCAT DEBUG 0-3: Port 0 signal status changed to yes.
Jun  8 10:41:25 LPS-3 kernel: [323780.983187] EtherCAT DEBUG 0-3: Port 1 link status changed to up.
Jun  8 10:41:25 LPS-3 kernel: [323780.983190] EtherCAT DEBUG 0-3: Port 1 loop status changed to open.
Jun  8 10:41:25 LPS-3 kernel: [323780.983193] EtherCAT DEBUG 0-3: Port 1 signal status changed to yes.
Jun  8 10:41:25 LPS-3 kernel: [323781.276172] EtherCAT DEBUG 0: TIMED OUT datagram ffff88002eae66e8, index F1 waited 20000 us
Jun  8 10:41:25 LPS-3 kernel: [323781.276500] EtherCAT DEBUG 0: UNMATCHED datagram:
Jun  8 10:41:25 LPS-3 kernel: [323781.276504] EtherCAT DEBUG: 05 F1 04 00 02 05 04 00 00 00 80 01 78 00 01 00
Jun  8 10:41:25 LPS-3 kernel: [323781.335062] EtherCAT DEBUG 0: TIMED OUT datagram ffff88002eae66e8, index 09 waited 12000 us
Jun  8 10:41:25 LPS-3 kernel: [323781.335960] EtherCAT DEBUG 0: UNMATCHED datagram:
Jun  8 10:41:25 LPS-3 kernel: [323781.335970] EtherCAT DEBUG: 04 09 04 00 02 05 0A 00 00 00 40 00 8E 00 00 00
Jun  8 10:41:25 LPS-3 kernel: [323781.335990] EtherCAT DEBUG: 65 6C 20 36 01 00
Jun  8 10:41:25 LPS-3 kernel: [323781.352260] EtherCAT DEBUG 0: TIMED OUT datagram ffff88002eae66e8, index 0F waited 8000 us
Jun  8 10:41:25 LPS-3 kernel: [323781.356474] EtherCAT DEBUG 0: UNMATCHED datagram:
Jun  8 10:41:25 LPS-3 kernel: [323781.356499] EtherCAT DEBUG: 05 0F 04 00 02 05 04 00 00 00 80 01 94 00 01 00
Jun  8 10:41:25 LPS-3 kernel: [323781.356589] EtherCAT ERROR 0-3: Reception of SII read datagram failed: No response.
Jun  8 10:41:25 LPS-3 kernel: [323781.356596] EtherCAT ERROR 0-3: Failed to fetch SII contents.
Jun  8 10:41:25 LPS-3 kernel: [323781.356600] EtherCAT DEBUG 0: Scanning slave 4 on main link.

>From there it carries on with the rest of the bus scan, but slave 3 reports all zeros for just about everything.  It's clearly the timeouts that are leading to the invalid info, but it sounds like the master should be recording some kind of slave error and isn't.

I did find a patch from February 2013 that was submitted which added an ecrt_master_rescan(master) function (which also seems to mention the zeroed VendorID and ProductCode fields).  I'm assuming that the fact that such a simple patch wasn't merged implies there's some problem with it.  The lack of commentary on the mailing list makes it hard to say why it wasn't accepted.

I haven't found any other patches relating to the bus scan via the mailing list.  I'm not very familiar with mercurial, so maybe there is a way to check for patches that have been pushed but never merged?  Pointers to the bus scan time patches would be appreciated.

-Scott Tillman

From: Gavin Lambert [mailto:gavinl at compacsort.com]
Sent: Sunday, June 07, 2015 7:58 PM
To: Tillman, Scott
Cc: etherlab-users at etherlab.org
Subject: RE: Bus Scan

I've posted some patches in the past that make some additional information available to the master application.  But in general to detect errors you look at the "error_flag" from ecrt_master_get_slave() or ecrt_slave_config_state().  There are also some patches (made by others) that dramatically improve bus scan time.

It's also possible to open a separate handle to ioctl with as needed - that's what the command-line client does after all.  Just bear in mind that the ioctl API is more volatile and it's not intended for application use; and doing it concurrently with realtime access may harm performance.

If your network is changing, you have two choices:

1.       Drop out of operation phase, figure out the new configuration based on the auto-increment addresses of the devices that have actually shown up, and return to operation phase.

2.       Assign aliases to at least the "tree points" (the first slave of every group that can appear or disappear as a whole unit), and then configure the maximum configuration based on the relative offsets from these known aliases.  You can then remain in operation phase as devices appear and disappear from the network.  (Note however that you'll also have to create different domains for each group, or have some way to tell from the data itself whether a particular group has disappeared from a larger domain.)

Etherlab doesn't have a direct way to assign aliases, but they are persistent so you can use another master to assign them and Etherlab will recognise them.  In some cases you might be able to do a command-line CoE download using Etherlab to set an alias, but this depends on slave support.

From: etherlab-users [mailto:etherlab-users-bounces at etherlab.org] On Behalf Of Tillman, Scott
Sent: Saturday, 6 June 2015 09:14
To: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
Subject: [etherlab-users] Bus Scan

I am only beginning to grok the etherlab stack, but I've got some questions about bus scanning and topology discovery.

First, is there a reason that the user library doesn't have a method to trigger a bus scan?  There is an ioctl for it (EC_IOCTL_MASTER_RESCAN), and the ethercat tool uses that to trigger a rescan, but I don't see any way to reach this from the library (struct ec_master is opaque to the library user, so I can't just call ioctl).

When the bus scan fails (under a VM this happens a lot during SII reading) there doesn't seem to be any direct indication of this.  The best indicator I can find is that the VendorId and ProductCode fields are zeroed out for a slave that failed.  Shouldn't there be a status of some kind?

Since the bus scan takes a significant amount of time, is it possible to request a scan of only the slave(s) that failed?  It doesn't look like it, but I thought maybe I missed something.

When the master is active I understand the very low response timeout (500us).  However, when the master is deactivated we aren't guaranteed to be in real time context, so that's a *really* small timeout.  Isn't it reasonable to have a much longer timeout value for deactivated communications?  On my development VM (using a USB to Ethernet adapter) I have to increase the timeout more than 1000x to avoid timeouts and failures in the bus scan.  More generally, is there a reason that a user shouldn't be able to set the timeouts programmatically, so I can set it via config file based on deployment platform?

I see a wait queue tied to the bus scan completion, but there doesn't appear to be a way to use it to just wait until scan completion.  I'd like to be able to setup a thread that monitored the bus for changes and reacted, but it looks like that requires polling of the scan_busy flag?  At the moment it looks like the only way to reach this wait queue is via ec_master_enter_operation_phase.

When topologies change during active operation is there a way to use the slave port connectivity graph to indicate device configuration information?  I know what my maximal configuration will be, and I know that dynamic connect/disconnect will affect a specific port<->port edge.  That doesn't alter any configuration except devices sharing that edge.  Tying config info to auto-inc addresses seems fragile.  Has this or anything like it be discussed previously?

Thanks,
-Scott Tillman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20150608/cd165f07/attachment-0004.htm>