[etherlab-dev] ethercat-1.5: Various issues

Sat Jun 14 00:59:24 CEST 2014

> Quoth Frank Heckenbach:
> > [2] We were basically told, e1000 is broken, don't use it, use
> >     r8169. Well, my experience was different, we tried r8169, but
> >     also had some problems with it (I didn't debug them further),
> >     whereas e1000, after fixing this bug (patch #02) worked and
> >     still works very reliably for us.
> 
> I'm using a mix of r8169 and e1000e.  There were some bugs in e1000e when I
> started, but they've been resolved now (that's one of the patches that did
> make it into mainline).
> 
> I haven't looked at your e1000 patch yet, but maybe it made it in too?  The
> 1.5.x history does show some fixes to those drivers.

One part of the patch seems to have been applied (or made
independently) in the 3.4 version (as for the other one, the context
has changeed, can't tell quickly if it's ok now), but not in the
2.6.24 one. That's also doubtful to me: If they support various
kernel versions, they should keep them all patched. So if someone
uses 2.6.24, it still won't be fixed with 1.5.2. (Of course, that
may concern you too: If you want to maintain my patches, you might
consider applying it to all kernel versions.)

> > The former shouldn't stop compilation (except for some warnings), the
> > latter (used before introduction) would be a mistake on my side.
> > If they cause problems, let me know which ones, and I'll try to
> > rearrange my patches.
> 
> It's not a big deal.  The patches in question are related anyway so it's
> unlikely they'd be applied piecemeal; I just wanted to have each patch in a
> separate commit for tracking purposes.

Yes, that was also my plan and the reason for splitting them up.

> > As I mentioned, I will probably do some work on our EtherCAT application
> > this year, but this will be (probably) compararably easy stuff with
> > (hopefully) no new bugs found and (quite certainly) not requiring a new
> > EtherCAT version. So you might not hear from me about that at all on the
> > lists, and in fact, when that's done, I'll probably unsubscribe from the
> > lists.
> 
> While you're free to do so, of course, I hope you don't.  Part of the beauty
> of open source is that even if the original developers get distracted for a
> while or even abandon something entirely (which I must point out again is
> not the case here), the users of it can still share patches and keep it
> updated.

While I share the general sentiment, I must point out again that
this was a paid job, and quite frankly, RTAI/kernel module/EtherCAT
code is not my idea of a fun project I'd do in my spare time (and
finally, I don't even have EtherCAT hardware myself, my client has
it). So unless someone hires me, I won't have any long-term
involvement with EtherCAT, sorry.

> I'm planning to set up a forked repository on SF consisting of the current
> 1.5.2 plus several of the patches I've submitted in the past, in the hopes
> that maybe it'll be easier for IgH to do an hg pull rather than applying a
> patch from a mailing list

Of course, I hope this will happen sometime, and if it does (after I
unsubscribe) feel free to let me know or ask me if questions arise.

> > >  - there's a very large number of "overriding mailbox check = 0"
> > > "buffering mailbox response" "overriding mailbox check = 1" "fetching
> > > mailbox response" sequences.  Does this just happen for every mailbox 
> > > exchange or is it significant (eg. showing an out-of-order response)?
> > 
> > Yes, that's normal. It shows that my patch is working.
> 
> It's probably a little too spammy to stay like that long-term.  It should
> log only when something unusual happens (or require level 2, maybe).  It
> also seems kinda annoying to go through the trouble of buffering the
> response in the happy-day case when there's nothing pending and it's for the
> correct protocol already (if nothing else, it's two extra memcpys and one
> extra state machine cycle, if I'm understanding it correctly), but I suppose
> that's a side-effect of trying to shoe-horn it in to the low level without
> altering the higher level state machines.

Yes, it is. Given the design of the state machines, I don't see an
easy alternative. The situation is: FSM sends check diagram, slave
answers yes, FSM takes it to mean to go on and fetch, but the
datagram may be for someone else. Therefore, we must either change
the FSMs (many places!) or delay the "yes" answer already which
requires this additional buffering. On the good side, this does not
affect PDOs (which are usually the most time critical data) since
they don't go through the mailbox at all.

> I changed several of these to level 2 in my local copy. :)

Reasonable. I didn't run in debug mode so often, so it didn't bother
me, but level 2 (or even higher) seems fine.

> So, I looked into the history now and I'm even more confused.
>
> [...]
>
> So in a way we're both right -- the callbacks did get changed back to
> lock/unlock but for some reason that didn't end up in 1.5.
> 
> I don't know why the default branch has seemingly been abandoned/shelved;
> that's something only IgH can answer.

Oh well, doesn't look pretty. Anyway, I think I've made it clear why
I favour lock/unlock callbacks, now it's up to others (including IgH
if they decide to look it) whether or not to agree.

Regards,
Frank

-- 
Dipl.-Math. Frank Heckenbach <f.heckenbach at fh-soft.de>
Stubenlohstr. 6, 91052 Erlangen, Germany, +49-9131-21359
Systems Programming, Software Development, IT Consulting