[etherlab-dev] ethercat-1.5: Various issues

Wed Jun 11 20:42:19 CEST 2014

> On Friday, 6 June 2014 02:23, quoth Frank Heckenbach:
> > I attach my complete set of patches, including the patches I've sent in
> previous 
> > mails (02-* to 08-*, slightly adjusted; 01-ethercat-1.5-header.patch was
> applied 
> > in your code already).
> 
> I've been experimenting with merging most of these into my local code
> (ignoring some of the EoE-specific patches as I don't have any way to test
> that at the moment), with the intent to re-release as 1.5.2-compatible
> patches once it seems to be behaving itself.

Perhaps you can at least make sure they compile with the new
version. Ultimately, Florian will have to integrate the patches into
the development sources, or not.

> While merging I came across a
> few oddities that I was hoping you might be able to clarify:
> 
> >    (09-ethercat-1.5-mailbox-tag.patch and
> >    10-ethercat-1.5-mailbox-allocate-buffer.patch contain the boring
> >    parts (preparations, new data structures) that shouldn't change
> >    the behaviour. The main change is in
> >    11-ethercat-1.5-mailbox-buffer.patch.)
> 
> In patch 11, you have a change to master.c that checks for jiffies_poll >
> jiffies_sent in addition to the original (jiffies_poll - jiffies_sent) >
> timeout_jiffies.
>
> What was the motivation for this?  It doesn't seem like it will be
> wraparound-safe, and the safe version of it would be redundant with the
> original code.

Yes, I think I was overly cautious here (same with cycles, just
before that). If we can assume that cycles_t is always an unsigned
type, I think we can rely on C wraparound rules and don't need those
additional checks.

> >    After I backported code from your repository to add locking in
> >    ec_master_clear_slaves()
> >    (18-ethercat-1.5-locking-fix-backport.patch)
> 
> I'm curious where this was backported from.  The current 1.5.2 code doesn't
> have this.

Indeed, it doesn't, but the development code as obtained by the
"hg clone" command as listed on
http://etherlab.org/de/ethercat/index.php does. Though it uses
master->io_mutex instead of master->io_sem now. For what I can tell
(seeing that 1.5.0 doesn't have io_mutex at all), an intermediate
version introduced the code with io_sem (which I then took) and was
later changed to io_mutex along with other changes. (This all might
not be very relevant except to explain where I got the code from;
otherwise you can treat it like it was one of my patches, I guess.)

However, looking at the ChangeLog entries from 2012-12-06 and
2013-01-10 there have been changes regarding io_sem in 1.5.2. This
might be a conflict area with my patches, so you might have to check
these whole patches (#18, #19, #20) carefully when applying.

My goals, as I tried to make clear, were:

1. To have a single semaphore responsible for locking datagram_queue
   in any situation. (Whether this queue is called io_sem or io_mutex
   is irrelevant, of course.)

2. To have a reliable way of locking between RTAI and normal code;
   as I wrote, semaphores won't do, that's why io_sem (or now
   perhaps io_mutex) shouldn't be used anywhere at all except in the
   default callback which my RTAI-using application would override
   in favour of a locking mechanism that cooperates with RTAI (see
   #21, #22 -- these patches apply to the example programs, but of
   course, my actual application does basically the same).

I don't know if 1. is already the case in 1.5.2, but 2. certainly
isn't (send_callback still uses an RTAI semaphore which isn't viable
as I described, because a single responsible semaphore as required
by 1. can also be used via cdev from any process which is not RT
hardened).

> >    I see that in newer versions (e.g. commit 53b5128e1313), you
> >    apparently reverted the callback mechanism from send/receive
> >    callbacks back to lock/unlock callbacks as it was in 1.4. I also
> >    prefer the latter since they can be used more generally.
> 
> The specified commit does not seem to be related to callbacks, was made in
> 2011, and the latest 1.5.2 code still has send/receive callbacks.  So this
> confuses me.

I admit I'm not very proficient with hg, so I probably mixed up the
commits. I'd have to read up on it and dig deeper, maybe you're
faster at it. In any case, the cloned code (see above) does contain
lock_cb and unlock_cb in place of send_cb and receive_cb as in 1.5.0
and 1.5.2. So I figure, Florian made this change, but hasn't pushed
it into a release yet.

> >  To avoid this, I fetch the mailbox once before using it for the
> >  first time, ignoring any result, whether empty or not.
> >  (26-ethercat-1.5-clear-mailbox.patch)
> 
> In this patch, you log a message saying that data was cleared if the fetch
> datagram working counter != 1.  Shouldn't that be != 0 or == 1 instead?
> AFAIK the working counter will be 0 if the mailbox is already empty and 1 if
> it fetched and discarded the mailbox contents.

Yes, it should be != 0. (I didn't notice it during testing since I
could never know the actual state of the mailbox, and it was only a
log message anyway.)

> Also, this blindly clears the mailbox whenever the slave is rescanned.  It's
> possible for the slave to be rescanned during operation (eg. if the number
> of responding slaves on the network changes).  I'm not sure if this will
> have any negative consequences for pending CoE/FoE/EoE requests (and
> presumably unsolicited EoE received packets) or if these are just
> abandoned/reset on rescan anyway (which might also be a problem, but at
> least not a new one).  I haven't looked closely enough at the code in
> question to be sure.

AFAICS the scanning ultimately originates in
ec_fsm_master_state_broadcast, starting with
ec_fsm_master_enter_clear_addresses. Several lines before that,
there's a call to ec_master_clear_slaves which does what it says.
Otherwise, I'd also have to clear my buffers (#10) on a rescan; this
way, I only need to free them in ec_slave_clear as is done in the
ec_master_clear case anyway.

So I don't think continuing communication over a rescan is possible
anyway. (Consider that slaves may even change their position if
other slaves were inserted before them.)

I don't know about your application, but in my case, I only want to
run with the correct (configured) number of slaves anyway.
Therefore, in my cyclic code I always call ecrt_master_state and
check that in the result link_up is set and slaves_responding is
equal to the number I expect; otherwise I abort immediately because
something's fishy.

> >  The next problem then is that some code (e.g.
> >  ec_fsm_master_exec()) just assumes that the FSM has a datagram to
> >  send out in every state, so it always returns 1 unless it's
> >  waiting for a reply. With my previous change, this isn't the case
> >  anymore, and it cannot be -- unless I'd block the FSM completely
> [...]
> >  (27-ethercat-1.5-coe-lock.patch)
> 
> This one was also a bit tricky.  Since the patch was made, it looks like
> ec_fsm_coe_exec had already been changed to include the concept of not
> sending a datagram and returning 0 -- except that most of the places that it
> gets called just ignore the return value

ec_fsm_coe_exec could return 0 in 1.5.0 already. The return value is
only used by those callers that go on to do other things after
calling it (to return early if 0); those that call it at their end
don't (need to) check the result, and I think that's ok.

One of my changes was that ec_fsm_master_exec doesn't always return
1 after executing a state (it still does in 1.5.2). It's a
consequence of my previous changes: If CoE is reserved by the slave
FSM, the master FSM must wait. In order to do that I set
datagram->state to EC_DATAGRAM_INVALID, and prevent it from being
queued. Returning 0 from ec_fsm_master_exec is an easy way to
achieve this (for both callers) since the 0 return (the first branch
in ec_fsm_master_exec) was already there in case the FSM is still
waiting for a reply and doesn't execute a state at all (and
therefore obviously doesn't send a new datagram either).

Does this clarify things for you? If not, please ask again.

Regards,
Frank

-- 
Dipl.-Math. Frank Heckenbach <f.heckenbach at fh-soft.de>
Stubenlohstr. 6, 91052 Erlangen, Germany, +49-9131-21359
Systems Programming, Software Development, IT Consulting