[etherlab-dev] ethercat-1.5: Various issues

Wed Jun 11 08:41:22 CEST 2014

On Friday, 6 June 2014 02:23, quoth Frank Heckenbach:
> I attach my complete set of patches, including the patches I've sent in
previous 
> mails (02-* to 08-*, slightly adjusted; 01-ethercat-1.5-header.patch was
applied 
> in your code already).

I've been experimenting with merging most of these into my local code
(ignoring some of the EoE-specific patches as I don't have any way to test
that at the moment), with the intent to re-release as 1.5.2-compatible
patches once it seems to be behaving itself.  While merging I came across a
few oddities that I was hoping you might be able to clarify:

>    (09-ethercat-1.5-mailbox-tag.patch and
>    10-ethercat-1.5-mailbox-allocate-buffer.patch contain the boring
>    parts (preparations, new data structures) that shouldn't change
>    the behaviour. The main change is in
>    11-ethercat-1.5-mailbox-buffer.patch.)

In patch 11, you have a change to master.c that checks for jiffies_poll >
jiffies_sent in addition to the original (jiffies_poll - jiffies_sent) >
timeout_jiffies.

What was the motivation for this?  It doesn't seem like it will be
wraparound-safe, and the safe version of it would be redundant with the
original code.

>    After I backported code from your repository to add locking in
>    ec_master_clear_slaves()
>    (18-ethercat-1.5-locking-fix-backport.patch)

I'm curious where this was backported from.  The current 1.5.2 code doesn't
have this.

>    I see that in newer versions (e.g. commit 53b5128e1313), you
>    apparently reverted the callback mechanism from send/receive
>    callbacks back to lock/unlock callbacks as it was in 1.4. I also
>    prefer the latter since they can be used more generally.

The specified commit does not seem to be related to callbacks, was made in
2011, and the latest 1.5.2 code still has send/receive callbacks.  So this
confuses me.

>  To avoid this, I fetch the mailbox once before using it for the
>  first time, ignoring any result, whether empty or not.
>  (26-ethercat-1.5-clear-mailbox.patch)

In this patch, you log a message saying that data was cleared if the fetch
datagram working counter != 1.  Shouldn't that be != 0 or == 1 instead?
AFAIK the working counter will be 0 if the mailbox is already empty and 1 if
it fetched and discarded the mailbox contents.

Also, this blindly clears the mailbox whenever the slave is rescanned.  It's
possible for the slave to be rescanned during operation (eg. if the number
of responding slaves on the network changes).  I'm not sure if this will
have any negative consequences for pending CoE/FoE/EoE requests (and
presumably unsolicited EoE received packets) or if these are just
abandoned/reset on rescan anyway (which might also be a problem, but at
least not a new one).  I haven't looked closely enough at the code in
question to be sure.

>  The next problem then is that some code (e.g.
>  ec_fsm_master_exec()) just assumes that the FSM has a datagram to
>  send out in every state, so it always returns 1 unless it's
>  waiting for a reply. With my previous change, this isn't the case
>  anymore, and it cannot be -- unless I'd block the FSM completely
[...]
>  (27-ethercat-1.5-coe-lock.patch)

This one was also a bit tricky.  Since the patch was made, it looks like
ec_fsm_coe_exec had already been changed to include the concept of not
sending a datagram and returning 0 -- except that most of the places that it
gets called just ignore the return value and then return 1 from the
higher-level state machine anyway, and in other places assume that not using
the datagram means that the FSM has completed.  So I'm not sure what's up
with that, although I suspect the original issue you were trying to resolve
remains.