[etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Mon Mar 5 00:02:41 CET 2018

> From: Esben Haabendal [mailto:esbenhaabendal at gmail.com] On Behalf Of Esben Haabendal
> 
> >> Graeme Foot <Graeme.Foot at touchcut.com> writes:
> >> 
> >> > The ec_ioctl_lock_up() and ec_ioctl_lock_down_interruptible() calls 
> >> > were added to protect the following functions when multiple 
> >> > send/receive loops are
> >> > running:
> >> > - ecrt_master_send()
> >> > - ecrt_master_receive()
> >> > - ecrt_master_domain_process()
> >> > - ecrt_master_domain_queue()
> >> >
> >> > In my opinion any locking on these functions should be at the 
> >> > application level instead.
> >> 
> >> Well, I have a different opinion on that.
> >> 
> >> Implementation of locking is inherently difficult to get right.  You 
> >> both have to ensure against race conditions while avoiding deadlocks.
> >> But you also do not want to block for too long.
> >> 
> >> While I do acknowledge that for trivial cases where there are only a 
> >> single application, it is not a big program for that single 
> >> application to maintain synchronization, I don't think that it is a 
> >> good solution to let each application developer in the more 
> >> complicated situations (like multiple independent processes) have to 
> >> do this without any help from etherlabmaster.  Forcing all (or at 
> >> least some) application developers to solve this same problem again 
> >> and again should not be the best we can do.
> >
> > These 4 functions are special.  The master should be written (and 
> > mostly seems to be) in a fashion that between a ecrt_master_activate() 
> > and ecrt_master_deactivate() the above calls do not require any locks 
> > to synchronize with the master code, except for the EOE thread which 
> > uses callbacks.
> 
> Well, mostly is the same as not in my world.  If they mostly do not require any locks, they do require locks.
> 

When I say mostly, I mean the vanilla Etherlab version is ok, but the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() broke this.  I do not use this patch.

> I guess some of the confusion I have is caused by the fact that it is unclear exactly which functions are allowed to be called between
> ecrt_master_activate() and ecrt_master_deactivate(), and which functions are not.  Do we have such a list?  Or how can I create such a list exactly?  And would it be possible to enforce this directly in the API, so if you call a function you are not allowed to call, you get an error back instead of introducing random breakage by modifying unsynchronized shared data structures?
> 

The functions that I think should not require locks between ecrt_master_activate() and ecrt_master_deactivate() are:
- ecrt_master_receive()
- ecrt_domain_process()
- ecrt_domain_state()
- ecrt_master_state()
- ecrt_domain_queue()
- ecrt_master_reference_clock_time()
- ecrt_master_sync_slave_clocks()
- ecrt_master_sync_reference_clock()
- ecrt_master_64bit_reference_clock_time_queue()
- ecrt_master_64bit_reference_clock_time()
- ecrt_master_application_time()
- ecrt_master_send()
- ecrt_master_send_ext()
- ecrt_master_deactivate_slaves()

And with some of my recent patchs
- ecrt_master_eoe_is_open()
- ecrt_master_eoe_process()
- ecrt_master_rt_slave_requests()
- ecrt_master_exec_slave_requests()

There may be some others I don't use.

> As for the EOE thread, I might be overlooking something obvious.  But how are you supposed to use callbacks when using the library API?
> 
> As far as I read the code, if you are using EoE and not RTDM, you will always use the standard ec_master_internal_send_cb() and
> ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().
> 

If you do not set EoE callbacks (with ecrt_master_callbacks()) there will be NO callbacks once the master goes active.  (It will not use ec_master_internal_send_cb() and ec_master_internal_receive_cb().  It will not start an EoE processing thread.  You will get a "No EoE processing because of missing callbacks!" message in your system log.)

If your application is in kernel space then you can specify callback functions and then use application level locking.  If your application is in user space, then you cannot specify callback functions as the kernel cannot call back into user space.  This is why I created my latest patch "0002-eoe-via-rtdm.patch".  This patch lets you create your own EoE processing thread in your own application.  This of course also lets you use application level locking.

> And with them, I don't see how you are supposed to do locking at the application level.  I will not be able to specify application level locks that are used by both application and eoe, unless they are implemented directly in the master (kernel).
> 
> And with ec_master_eoe_thread() running inside master, you really need to do locking to synchronize access to master->datagram_queue (i.e. the io_sem lock).  If not, you will have race conditions between eoe and 3 of the 4 functions:
> 

The io_sem lock is only used by the master when it is idle.  Once you activate the master it is up to your application to provide the locking.

>    ecrt_master_send()
>    ecrt_master_receive()
>    ecrt_domain_queue()
> 
> So we really do not to do locking in at least these 3 when using EOE.
> 
> Or maybe the whole master->datagram_queue should be refactored to something that can be synchronized safely in a lock-free way?
> 

The point of the callback functions is so that you can make sure your application does not call any of ecrt_master_receive(), ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(), ecrt_domain_queue() for a given master at the same time.  If you don't use EoE you don't need the callbacks, but you still need to protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().

> > And the reason it uses callbacks is because only your application 
> > knows if it is appropriate to allow the EOE thread to call
> > ecrt_master_recieve() and ecrt_master_send_ext() etc at any particular 
> > time.
> 
> I see what you mean.  But I just don't see how it is possible to accomplish this for user space applications.
> 

Exactly why I create my patch "0002-eoe-via-rtdm.patch". 

> > However, if your application has multiple send/receive loops then they 
> > need to be synchronized with each other (see next comment).  There are 
> > a few more functions such as the distributed clock calls that are also 
> > in the send/receive loops.  They also do not require ethercat level 
> > locks as they should be safe to call between the activate and 
> > deactivate.  They also do not need application level locking as they 
> > should really only be called by one send/receive loop per master.  All 
> > other ethercat function calls should not be in a hard realtime context.
> 
> Ok.  But you imply a struct rule for how to design applications doing EtherCAT communication: "There can only be one send/receive loop per master".
> 
> That might (is not in my case) accepted by all application developers.
> I need to continue support for applications that use multiple send/receive loops for multiple EtherCAT domains, each with different cycle time.
> 
> One solution could be to create a single send/receive loop per master, and then let the multiple application loops communicate with that instead of directly with the master.  That might actually be a really good idea.  But, if that is done, I think it should fit very well as a general feature of the EtherCAT master.  It should be possible to implement without any application specific code.
> 

That's not what I said.  You can have multiple send/receive loops per master, but it is up to your application to make sure they don't call any of the ecrt functions I listed at the top at the same time (per master).  If your application is a single process with multiple threads you can use an application semaphore (for example), but if it is multiple processes you will need a named semaphore (or similar) that all of the processes share.

> 
> 
> The final "problem" is as you say a design question.  Should etherlabmaster support synchronization of access from multiple threads/processes, or should this be left entirely to the application developer.  But if
> master->datagram_queue is changed to something lock-free, this "problem" 
> master->will
> likely be solved also.
> 
> >> > However, if they are being called from multiple processes (rather 
> >> > than
> >> > threads) then you need to use something like a named semaphore so 
> >> > that all processes share the same lock.  Of course if you are using 
> >> > callbacks (for EoE) you are probably doing that anyway.
> >> 
> >> You easily have We have multiple processes.  Even with just a single 
> >> application, you have EtherCAT-OP, EtherCAT-EoE and the application.
> >> All using some of the same shared data structures.  Throwing more 
> >> multiple application just adds to that, but I think it is critical 
> >> enough with just a single application.
> >
> > Even if you have multiple processes (rather than threads), it is your 
> > design decision as to which process takes priority and whether a 
> > particular send/receive loop should wait a bit even if it could get 
> > the lock now.  You can only do that if your application controls the locking of these functions.
> 
> While that sounds reasonable, I believe it is not entirely true.  You could write a layer responsible for handling this, which gets information from the application(s) for how to deal with this.  Think of it like a kind of scheduler.
> 
> Example from an application I have seen.  You have two applications (processes), each driving it's own EtherCAT domain.  One application/domain is running every 2 ms, the other application/domain every 10 ms.  In addition, EoE slaves are also used.  For each application, you could specify the priority, the cycle time and the start time.  EoE should also be given a priority.  This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
> 
> So while I agree that it might be most obvious to implement the decission of when to send/receive directly in the application, I think it could also be implemented in a more generic way.
> 

You don't need the EtherCAT master to do that scheduling.  That is already available in Linux / RTAI or whatever you use.  But once one of the above ecrt function calls is in progress you need to ensure no other process / thread will call one at the same time.

> >> > The reason that they have been named differently is that they are 
> >> > the functions that are called from the realtime send/receive loop 
> >> > and the define allows them to be ignored.  RTAI (and Xenomi?) are 
> >> > NOT allowed to call standard Linux blocking functions from a hard 
> >> > realtime thread (to maintain hard realtime).
> >> 
> >> Yes, I get that.
> >> 
> >> > The EC_IOCTL_RTDM define turns off these locks for (RTAI / Xenomi) 
> >> > RTDM applications to avoid this problem.
> >> 
> >> Yes, EC_IOCTL_RTDM turns off these locks.  But it does not really do 
> >> anything, as it is never defined.
> >
> > EC_IOCTL_RTDM gets defined for "rtdm-ioctl.c" if --enable-rtdm is used 
> > (as you say below).  I don't know if you noticed, but "rtdm-ioctl.c" 
> > is just a link to "ioctl.c".  So the rtdm version gets it, but the standard version does not.
> 
> Argh, I didn't notice that.  Thanks for pointing that out.
> It makes much more sense then :)
> 
> >> > They should probably also be turned off for RTAI / Xenomi in 
> >> > general and as I said above use application level locking.
> >> >
> >> > You can pass --enable-rtdm when compiling to enable RTDM (and 
> >> > compile rtdm-ioctl.o).
> >> 
> >> Passing --enable-rtdm to ./configure will define EC_RTDM macro and 
> >> enable the automake conditional ENABLE_RTDM.
> >> 
> >> This will trigger Kbuild to build rtdm-ioctl.o (from rtdm-ioctl.c), 
> >> and this will be done with EC_IOCTL_RTDM macro defined.
> >> 
> >> But ioctl.c will be compiled as always, and that is without 
> >> EC_IOCTL_RTDM defined.  Was it supposed to be defined?  If so, it 
> >> should be easy to fix, but someone should definitely do some real testing of it.
> >
> > I think the locks should be disabled in both "rtdm-ioctl.c" and 
> > "ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
> 
> I don't see how it could ever be disabled for ioctl.c with the current code.
> 

By not using the patch that added them.

> > Further to that I don't think they should be there at all.  Simple 
> > applications have one send/receive loop so don't need locks.
> > Applications with multiple send/receive loops or EOE need to control 
> > their own locking for optimal results anyway.
> 
> Again, I don't fully agree with you on that.
> 
> But more importantly, it is not possible for user-space applications in combination with EoE, due to inability to set application callbacks.
> 
> > Also, having these lock functions use master_sem the send/receive 
> > functions block unnecessarily with non-realtime ethercat function 
> > calls.  At a minimum they should be locking on their own semaphore.
> 
> True.  I have patches for fixing that.  The master_sem is definitely a no-go for real-time.
> 
> /Esben
>

So, I believe the following two patches are the problem:
- base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
- base/0018-Use-call-back-functions.patch

My new EoE patch rolls back the changes from "base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so ignore the changes in "base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".

Regards,
Graeme.