[etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Esben Haabendal esben.haabendal at gmail.com
Fri Mar 2 10:27:03 CET 2018


Graeme Foot <Graeme.Foot at touchcut.com> writes:

>> Graeme Foot <Graeme.Foot at touchcut.com> writes:
>> 
>> > The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls 
>> > were added to protect the following functions when multiple 
>> > send/receive loops are
>> > running:
>> > - ecrt_master_send()
>> > - ecrt_master_receive()
>> > - ecrt_master_domain_process()
>> > - ecrt_master_domain_queue()
>> >
>> > In my opinion any locking on these functions should be at the 
>> > application level instead.
>> 
>> Well, I have a different opinion on that.
>> 
>> Implementation of locking is inherently difficult to get right.  You
>> both have to ensure against race conditions while avoiding deadlocks.
>> But you also do not want to block for too long.
>> 
>> While I do acknowledge that for trivial cases where there are only a
>> single application, it is not a big program for that single
>> application to maintain synchronization, I don't think that it is a
>> good solution to let each application developer in the more
>> complicated situations (like multiple independent processes) have to
>> do this without any help from etherlabmaster.  Forcing all (or at
>> least some) application developers to solve this same problem again
>> and again should not be the best we can do.
>
> These 4 functions are special.  The master should be written (and
> mostly seems to be) in a fashion that between a ecrt_master_activate()
> and ecrt_master_deactivate() the above calls do not require any locks
> to synchronize with the master code, except for the EOE thread which
> uses callbacks.

Well, mostly is the same as not in my world.  If they mostly do not
require any locks, they do require locks.

I guess some of the confusion I have is caused by the fact that it is
unclear exactly which functions are allowed to be called between
ecrt_master_activate() and ecrt_master_deactivate(), and which functions
are not.  Do we have such a list?  Or how can I create such a list
exactly?  And would it be possible to enforce this directly in the API,
so if you call a function you are not allowed to call, you get an error
back instead of introducing random breakage by modifying unsynchronized
shared data structures?

As for the EOE thread, I might be overlooking something obvious.  But
how are you supposed to use callbacks when using the library API?

As far as I read the code, if you are using EoE and not RTDM, you will
always use the standard ec_master_internal_send_cb() and
ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().

And with them, I don't see how you are supposed to do locking at the
application level.  I will not be able to specify application level
locks that are used by both application and eoe, unless they are
implemented directly in the master (kernel).

And with ec_master_eoe_thread() running inside master, you really need
to do locking to synchronize access to master->datagram_queue (i.e. the
io_sem lock).  If not, you will have race conditions between eoe and 3
of the 4 functions:

   ecrt_master_send()
   ecrt_master_receive()
   ecrt_domain_queue()

So we really do not to do locking in at least these 3 when using EOE.

Or maybe the whole master->datagram_queue should be refactored to something
that can be synchronized safely in a lock-free way?

> And the reason it uses callbacks is because only your application
> knows if it is appropriate to allow the EOE thread to call
> ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular
> time.

I see what you mean.  But I just don't see how it is possible to
accomplish this for user space applications.

> However, if your application has multiple send/receive loops then they need to
> be synchronized with each other (see next comment).  There are a few more
> functions such as the distributed clock calls that are also in the
> send/receive loops.  They also do not require ethercat level locks as they
> should be safe to call between the activate and deactivate.  They also do not
> need application level locking as they should really only be called by one
> send/receive loop per master.  All other ethercat function calls should not be
> in a hard realtime context.

Ok.  But you imply a struct rule for how to design applications doing
EtherCAT communication: "There can only be one send/receive loop per
master".

That might (is not in my case) accepted by all application developers.
I need to continue support for applications that use multiple
send/receive loops for multiple EtherCAT domains, each with different
cycle time.

One solution could be to create a single send/receive loop per master,
and then let the multiple application loops communicate with that
instead of directly with the master.  That might actually be a really
good idea.  But, if that is done, I think it should fit very well as a
general feature of the EtherCAT master.  It should be possible to
implement without any application specific code.



The final "problem" is as you say a design question.  Should etherlabmaster
support synchronization of access from multiple threads/processes, or should
this be left entirely to the application developer.  But if
master->datagram_queue is changed to something lock-free, this "problem" will
likely be solved also.

>> > However, if they are being called from multiple processes (rather than
>> > threads) then you need to use something like a named semaphore so that 
>> > all processes share the same lock.  Of course if you are using 
>> > callbacks (for EoE) you are probably doing that anyway.
>> 
>> You easily have We have multiple processes.  Even with just a single
>> application, you have EtherCAT-OP, EtherCAT-EoE and the application.
>> All using some of the same shared data structures.  Throwing more multiple
>> application just adds to that, but I think it is critical enough with just a
>> single application.
>
> Even if you have multiple processes (rather than threads), it is your design
> decision as to which process takes priority and whether a particular
> send/receive loop should wait a bit even if it could get the lock now.  You
> can only do that if your application controls the locking of these functions.

While that sounds reasonable, I believe it is not entirely true.  You
could write a layer responsible for handling this, which gets
information from the application(s) for how to deal with this.  Think of
it like a kind of scheduler.

Example from an application I have seen.  You have two applications
(processes), each driving it's own EtherCAT domain.  One
application/domain is running every 2 ms, the other application/domain
every 10 ms.  In addition, EoE slaves are also used.  For each
application, you could specify the priority, the cycle time and the
start time.  EoE should also be given a priority.  This EtherCAT
"scheduler" would then be able to decide which EtherCAT communication to
do and when.

So while I agree that it might be most obvious to implement the
decission of when to send/receive directly in the application, I think
it could also be implemented in a more generic way.

>> > The reason that they have been named differently is that they are the 
>> > functions that are called from the realtime send/receive loop and the 
>> > define allows them to be ignored.  RTAI (and Xenomi?) are NOT allowed 
>> > to call standard Linux blocking functions from a hard realtime thread 
>> > (to maintain hard realtime).
>> 
>> Yes, I get that.
>> 
>> > The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi) 
>> > RTDM applications to avoid this problem.
>> 
>> Yes, EC_IOCTL_RTDM turns off these locks.  But it does not really do
>> anything, as it is never defined.
>
> EC_IOCTL_RTDM gets defined for "rtdm-ioctl.c" if --enable-rtdm is used (as you
> say below).  I don't know if you noticed, but "rtdm-ioctl.c" is just a link to
> "ioctl.c".  So the rtdm version gets it, but the standard version does not.

Argh, I didn't notice that.  Thanks for pointing that out.
It makes much more sense then :)

>> > They should probably also be turned off for RTAI / Xenomi in general 
>> > and as I said above use application level locking.
>> >
>> > You can pass --enable-rtdm when compiling to enable RTDM (and compile 
>> > rtdm-ioctl.o).
>> 
>> Passing --enable-rtdm to ./configure will define EC_RTDM macro and enable
>> the automake conditional ENABLE_RTDM.
>> 
>> This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c), and this
>> will be done with EC_IOCTL_RTDM macro defined.
>> 
>> But ioctl.c will be compiled as always, and that is without EC_IOCTL_RTDM
>> defined.  Was it supposed to be defined?  If so, it should be easy to fix,
>> but someone should definitely do some real testing of it.
>
> I think the locks should be disabled in both "rtdm-ioctl.c" and "ioctl.c" if
> using RTAI, but I use RTDM so haven't confirmed this.

I don't see how it could ever be disabled for ioctl.c with the current
code.

> Further to that I don't think they should be there at all.  Simple
> applications have one send/receive loop so don't need locks.
> Applications with multiple send/receive loops or EOE need to control
> their own locking for optimal results anyway.

Again, I don't fully agree with you on that.

But more importantly, it is not possible for user-space applications in
combination with EoE, due to inability to set application callbacks.

> Also, having these lock functions use master_sem the send/receive functions
> block unnecessarily with non-realtime ethercat function calls.  At a minimum
> they should be locking on their own semaphore.

True.  I have patches for fixing that.  The master_sem is definitely a
no-go for real-time.

/Esben


More information about the etherlab-dev mailing list