[etherlab-dev] EoE patchs and questions

Fri Feb 16 07:47:17 CET 2018

Those sound like great changes to have.

I suspect the EoE-OP thing came from an assumption that the slave had to be
in OP to transfer EoE frames; there was previously a similar assumption
regarding the DC reference clock that was fixed in
<https://sourceforge.net/p/etherlabmaster/code/ci/559f2f9c5b08700f2e4722f498
799236a2c9f78a/> [559f2f].  I don't have any experience with EoE myself but
a quick glance through the manual for EL6614 does suggest that it will
happily do EoE in PREOP and above.  Do you think there could be any older
slaves that might need OP for that?

The register write to 0x808 as a recovery from that condition seems a bit
peculiar - most of those registers are read-only while SM1 is enabled -
though you're writing 0 to 0x80E, which should disable the SM, which then
ought to stop it working entirely, unless something reconfigures it.

Perhaps inspecting other SM registers might be interesting?  Or see if
there's anything noticeable around that time in a Wireshark trace (if you
have some way to detect exactly when it stops)?  Does the problem still
happen with fewer patches applied?

From: Graeme Foot
Sent: Friday, 16 February 2018 19:01
To: etherlab-dev at etherlab.org
Subject: [etherlab-dev] EoE patchs and questions

Hi,

I've been setting up my system to use EoE (Ethernet over EtherCAT) with an
RTAI user space application.

I've updated my master to revision 33b922ec1871 (default branch) and applied
the gavinl (Gavin Lambert) patch set 20171108.

Linux 2.6.32.11

RTAI 3.8.1

Firstly I have a bit of a different use case for my EoE.  The current
implementation auto creates and removes the eoe interfaces as the EoE
capable slaves are configured and removed.  This means the interface is not
available until the slave is scanned, and is not available if it is removed.
The eoe interface is also temporarily destroyed on a bus rescan.  In my use
case I want to bridge the eoe interface to a real Ethernet interface.  So I
want the eoe interface to always exists whether the slave is plugged in or
not.

So the first patch does a few things:

1) adds explicit eoe_addif and eoe_delif tool functions so that you can
manually add/remove an eoe iface without the slave existing

2) no longer deletes and eoe iface if the slave disappears

3) will relink a slave to an eoe iface when it is configured

4) will let you configure eoe ifaces via the sysconfig/ethercat config file

5) will let you turn off auto creation of eoe ifaces via the
sysconfig/ethercat config file

6) no longer keeps slaves with EoE capability in OP mode when the master is
deactivated

The above is made possible by using the netif_carrier_on() and
netif_carrier_off() functions of the iface.  (The same as having a normal
network interface up, but not plugged in.)

The other thing the patch does is fix a race condition bug in the eoe iface
code.  The current implementation uses a struct list_head queue with a
semaphore to protect it between the iface tx callback and the ethercat
thread.  Sleeps are not allowed in the ifaces tx callback as it is in an
interrupt context.  To fix this I have changed the queue to a ring buffer so
that it no longer needs a lock.

FYI, when the race condition occurred I was getting:

BUG: scheduling while atomic

Call Trace:

[<c0146aa2>] ? ktime_get_real+0x0/0x29

[<c0146987>] ? ktime_get+0x0/0x88

Florian you may be interested in this patch, especially the bug fix part.

The second patch is so that I can run the EoE pump without callbacks.  As I
am using a user space RTAI application I cannot use callbacks as they would
need to call back from a kernel context to the user space context.  Instead
I am running a thread in my application that makes calls into EtherCAT in a
similar fashion to the masters EoE thread.  I have created two functions
(ecrt_master_eoe_is_open() and ecrt_master_eoe_process()) to call without
application locks as the locks only need to be around the
ecrt_master_receive() and ecrt_master_send_ext() calls.

Now for the question.  I have been hammering my test rig pretty hard with
various communications (pings with multiple fragments multiple times a
second from both directions, SDO calls to the EoE slave without a pause
approx. 100 per second).  Every now and then (after around 10 to 30 minutes
with the above tests) the receive mailbox (SM1) of the EoE slave stops
responding (slave to master).  CoE reads to the slave also fail.  The
transmit mailbox still continues to function.  The RX SM1 status register
continually returns a zero value.  I have found that if I send the command
below the receive mailbox starts to function again (until it doesn't):

  ethercat reg_write -p3 0x808 -tuint64 0

Has anyone else come across this?  At the moment I suspecting a Slave
firmware bug (EL6614).  Does anyone have any other ideas?

Regards,

Graeme Foot.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20180216/c29ff66c/attachment.html>