[etherlab-dev] Multiple mailbox protocols and other issues

Tue Feb 10 23:18:32 CET 2015

On 10 February 2015 20:47:
>> 4. Regarding 13_domain_lock.patch, I believe the original rationale of
the
>> master is that locking between concurrent application tasks is the
>> responsibility of the application, not the master -- that's why in
>> kernelspace it has send/receive callbacks (formerly lock/unlock
callbacks) so
>> that eg. RTAI locks can be substituted, or locking can be avoided if the
>> application has some other way to schedule things to avoid actual
concurrency
>> (or if it's only running a single task).  See the "Concurrent Master
Access"
>> section in the docs.  I don't have any personal objections to this patch
>> though.
>
> Yes, we just faced some cases where the application developers did not
> include the necessary locking, which have quite severe impact for the
> complete system. We have not used RTAI, but I am not sure I understand why
> the extra locks become a problem for RTAI?

Not a problem as such, but not necessarily sufficient to protect anything.
RTAI/Xenomai is a separate kernel, so concurrent tasks would be using RTAI
locks instead of regular kernel locks, so they would make the kernel locking
redundant.  It does trivially hurt performance to take a lock that is never
contended, but it's usually not worth worrying about that unless it's in a
tight loop.

Having said that, I don't use RTAI *or* concurrent tasks, so it doesn't
really affect me either way. :)

>> 6. Regarding 16_improved_ethercat_rescan_performance.patch, it looks like
a
>> stray temporary file was included in the patch.  Also, I'm not sure it's
safe
>> to retrieve the data only by serial number.  Serial numbers are not
>> guaranteed unique between vendors, or even between product lines -- I
think
>> at minimum you should include the vendor id and product code in the
index.
>> Also, possibly this should have a #define config guard to disable this
>> functionality in case the master will be used at a site with pathological
>> slaves (eg. multiple slaves with identical non-zero vendor/product/serial
>> triplets, since *technically* they're not guaranteed unique at all --
>> although any slave vendor who does this deserves a kick).
>
> Yes sorry, my mistake with the temporary file. I can only agree with
> you that vendor and productcode should be included in the index in order
for
> this patch to be used in large scale, I will add this. I can also agree
with
> the #define guard.

To reduce network cycles a bit, I suggest trying the alias (if nonzero)
first, as this is required to be network-unique if defined (meaning that you
wouldn't need to check vendor/product/serial); falling back to reading
serial, check if nonzero, and only then read the vendor and product codes.
(I'm not sure if the alias is already known at that point or if that
requires a network cycle to read as well, but even if the latter it means
one read instead of at least three.)

Of course, I might be a little biased since as I mentioned before I usually
configure aliases on all slaves. :)

>> I haven't had a chance to test things locally yet, but at least
everything is
>> compiling ok with these patches. :)

I've given it some minimal testing now, and I'm happy to report that in a
network with about 10 slaves (all with serial numbers) this reduces the
total rescan time from about 45 seconds to about 2, at least for subsequent
scans.  (Numbers are vague because the test conditions weren't quite
identical in each case.)

Although the SDO dictionary patch means that I'm not really testing your
mailbox patches anymore, because dictionary vs. other SDO requests were the
main cause of mailbox conflicts that I see with an unpatched master.  (I'm
not using EoE, which is the other main source of conflicts.)  That's also a
good patch, as the dictionary scan of those 10 slaves takes about two
minutes, and normally the information isn't required for standard running,
only when commissioning.  (It's a little disconcerting to see "ethercat
sdos" just sit apparently dead for a few minutes though.  Maybe it needs
some kind of progress reporting.  Although it's not as bad when limited to a
single slave, which is probably the more common use case.)