[etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Mon May 7 06:16:49 CEST 2018

Hi,

I found a couple of bugs in my patches.  New patches attached.

Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-bounces at etherlab.org] On Behalf Of Graeme Foot
Sent: Wednesday, 7 March 2018 10:53 AM
To: Gavin Lambert <gavinl at compacsort.com>
Cc: etherlab-dev at etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi Gavin,

I have attached updated eoe patches to maintain compatibility with your patchset for non-RTAI / RTDM applications.

Patch 0001:
This will now continue with the existing behaviour of auto-create and auto-delete of eoe ports as the eoe slaves are detected and removed.  It does contain the fix for removing the lock from the transmit callback so eoe users will probably want this update even if they want to keep existing behaviour.

Patch 0002:
Still only targeted at RTAI / RTDM users, but will now maintain existing behaviour (internal locking via the internal callbacks) for Linux user space applications.

Regards,
Graeme.

-----Original Message-----
From: etherlab-dev [mailto:etherlab-dev-bounces at etherlab.org] On Behalf Of Graeme Foot
Sent: Tuesday, 6 March 2018 1:40 PM
To: Esben Haabendal <esben.haabendal at gmail.com>
Cc: etherlab-dev at etherlab.org
Subject: Re: [etherlab-dev] ec_lock_* vs. ec_ioctl_lock in master/ioctl.c

Hi,

The email's getting a little hard to read so I'll try to summarise what I'm doing here first and reply to specifics below.

The primary problem with RTAI is that you cannot use Linux locks in RTAI hard realtime calls.  So the base 0017 and 0018 patches are useless for RTAI.  Secondly RTAI / RTDM cannot use callbacks, so EoE does not work for this framework.

I put the patches in the features branch of Gavin's patchset as they should be optional and especially patch 2 (0002-eoe-via-rtdm.patch) is specifically for RTAI / RTDM for EoE.  I haven't looked too hard at making this one generic as it does break the previous patches I mentioned (base 0017 and 0018).  But it also needs to break them as the whole point of it is to use an application level lock (i.e. RTAI lock) instead (as per the design of the vanilla Etherlab master).  So no, the patch does not support "plain" user-space applications (i.e. not using RTDM) as my focus was only for RTDM applications.  Patch 2 could probably have a few more defines to make sure it only affects RTDM users and keeps 0017/0018 active for non-RTDM installs.

patch 1 (0001-eoe-addif-delif-tools.patch) only looks at changing the lifetime options of the eoe interfaces so anyone should be able to use this.

patch 2 (0002-eoe-via-rtdm.patch) is targeted to add the ability for RTDM users to provide a user space alternate to the callbacks framework.  If people prefer this method to allow "plain" Linux user space applications to supply their own locking then it could be extended.  But it does mean that the base patches "0017-Master-locks-to-avoid-corrupted-datagram-queue.patch" and "0018-Use-call-back-functions.patch" do need to be removed.

So using patch 2 is a matter of preference: Patch 2 uses the Vanilla EtherLab EtherCAT master approach of "the user application knows best" as to when a lock can be acquired, but also covers the "realtime RTAI can't use Linux locks" issue; OR don't use patch 2 and go with the patches 0017/0018 approach where the master handles this locking (which as I said is not an option for RTAI).

> From: Esben Haabendal [mailto:esbenhaabendal at gmail.com] On Behalf Of 
> Esben Haabendal
> 
> Hi Graeme
> 
> Need to get one thing straight first.
> 
> RTDM user-space is not the same as "plain" Linux user-space.
> I know that you are 100% aware that, but your replies to me seems less clear about it.
> 
> Your 0002-eoe-via-rtdm.patch does not seem usable for "plain" user-space applications, so does not apply to the applications we support.
> 
> >> > These 4 functions are special.  The master should be written (and 
> >> > mostly seems to be) in a fashion that between a
> >> > ecrt_master_activate() and ecrt_master_deactivate() the above 
> >> > calls do not require any locks to synchronize with the master 
> >> > code, except for the EOE thread which uses callbacks.
> >> 
> >> Well, mostly is the same as not in my world.  If they mostly do not 
> >> require any locks, they do require locks.
> >
> > When I say mostly, I mean the vanilla Etherlab version is ok, but 
> > the patch that added ec_ioctl_lock_up() and ec_ioctl_lock_down() 
> > broke this.  I do not use this patch.
> 
> Do you mean that the patches (and I guess you mean 
> 0017-Master-locks-to-avoid-corrupted-datagram-queue.patch and
> 0018-Use-call-back-functions.patch) introduces the need for locking?
> 
> That does not sound right to me.  As I read them (and your writing in this thread), the need for locking is there with and without these patches.  Without them, locking needs to be handled at application level.  With the patches, the master tries to handle the locking.
> 
> And that is a clearly a design change, and one which you clearly does not agree with.
> 
> But if you really mean that these patches broke etherlabmaster (and not "just" changed the design), would you please explain it a bit more detailed?
> 

Locking is required if multiple processes / threads will access these functions, regardless of which method you prefer.

Vanilla EtherLab approach is, user application knows best when to lock, leave it up to them.  But EoE thread is running in the master so allow the user application to provide some callbacks so that the EoE thread can obtain the lock in the user application and call the appropriate send/receive function.

Gavin's patchset approach (base patches 0017/0018) is to change this to provide the locking within the master.
- Problem 1) Patch base/0017 uses master->master_sem which can be held by a non-realtime process
- Problem 2) Patch base/0017 uses master->master_sem which can't be used by RTAI, so for RTAI it is disabled anyway
- Problem 3) the callbacks are designed to allow the EoE send/receive to be synced with your application.  Patch base/0018 repurposes this so that all user space ecrt_master_send() calls use the send callback and user space ecrt_master_receive() calls use the receive callback.  The send callback is designed to call ecrt_master_send_ext() specifically for EoE processing, not ecrt_master_send().  The receive callback is designed to call ecrt_master_receive() so no real problem there.

Note: ecrt_master_send_ext() will queue any external datagrams and then call ecrt_master_send().  The drawbacks to calling ecrt_master_send_ext() rather than ecrt_master_send() from your main send/receive loops are:
1) If you have a very fast and tight send/receive loop this may add extra overhead you don't want
2) You can get extra jitter between calling ecrt_master_application_time() and the frame going onto the wire

> >> As for the EOE thread, I might be overlooking something obvious.  
> >> But how are you supposed to use callbacks when using the library API?
> >> 
> >> As far as I read the code, if you are using EoE and not RTDM, you 
> >> will always use the standard ec_master_internal_send_cb() and
> >> ec_master_internal_receive_cb() callbacks.  See ec_ioctl_activate().
> >
> > If you do not set EoE callbacks (with ecrt_master_callbacks()) there 
> > will be NO callbacks once the master goes active.  (It will not use
> > ec_master_internal_send_cb() and ec_master_internal_receive_cb().  
> > It will not start an EoE processing thread.  You will get a "No EoE 
> > processing because of missing callbacks!" message in your system
> > log.)
> 
> In ec_ioctl_activate() there is:
> 
> #ifndef EC_IOCTL_RTDM
>     ecrt_master_callbacks(master, ec_master_internal_send_cb,
>             ec_master_internal_receive_cb, master); #endif
> 
> So at least for (non RTDM) user-space applications, that activate the 
> master with ioctl, EoE callbacks are always set to
> ec_master_internal_send_cb() and ec_master_internal_receive_cb().
> 

Sorry, I misread your comment and the code.  I'm using RTDM so don't notice this.

For RTAI/Xenomi the internal callbacks are not allowed as they use master->io_sem which is a non-RTAI compatible semaphore.  The reason for patch base/0018 is to allow master->io_sem to be shared for the user space send/receive calls and the EoE thread (which will be created as the callbacks are defined).  But it will break if your application also uses a kernel process, as it will not be using the master->io_sem semaphore (not likely I know).

The above ecrt_master_callbacks() in ec_ioctl_activate() was added way back in 2012 (rev 2433) when RTDM became more fully supported.  I don't know what behaviour it had before that but that is probably irrelevant as it has been like this for so long.

> > If your application is in kernel space then you can specify callback 
> > functions and then use application level locking.  If your 
> > application is in user space, then you cannot specify callback 
> > functions as the kernel cannot call back into user space.  This is why I created my latest patch "0002-eoe-via-rtdm.patch".
> > This patch lets you create your own EoE processing thread in your 
> > own application.  This of course also lets you use application level locking.
> 
> But only for RTDM.  The functions added are guarded by
> 
> #if !defined(__KERNEL__) && defined(EC_RTDM) && (EC_EOE)
> 
> which means that they do not apply when using non-RTDM user-space.
> 

Correct.

> >> And with them, I don't see how you are supposed to do locking at 
> >> the application level.  I will not be able to specify application 
> >> level locks that are used by both application and eoe, unless they 
> >> are implemented directly in the master (kernel).
> >> 
> >> And with ec_master_eoe_thread() running inside master, you really 
> >> need to do locking to synchronize access to master->datagram_queue 
> >> (i.e. the io_sem lock).  If not, you will have race conditions 
> >> between eoe and 3 of the 4
> >> functions:
> >
> > The io_sem lock is only used by the master when it is idle.  Once 
> > you activate the master it is up to your application to provide the locking.
> 
> Ok.  But I dare to say that it is an open question if that is an design for etherlabmaster.  I believe it has both pros and cons.
> 
> Pro: Application developers can implement synchronization as they find most optimal for them.
> 
> Cons: Application developers need to figure out and implement synchronization without much (any) help from the etherlabmaster code.
> And getting synchronization is IMHO something that is often tricky to get right (at least for many developers).
> 

Generally agree, but RTAI can't use EtherCAT master locking in hard realtime, so it needs an alternate solution anyway.  And the vanilla Etherlab design is to use application level locking (except for the ecrt_master_callbacks() call in ec_ioctl_activate() which I suspect snuck through).

> >>    ecrt_master_send()
> >>    ecrt_master_receive()
> >>    ecrt_domain_queue()
> >> 
> >> So we really do not to do locking in at least these 3 when using EOE.
> >> 
> >> Or maybe the whole master->datagram_queue should be refactored to 
> >> something that can be synchronized safely in a lock-free way?
> >
> > The point of the callback functions is so that you can make sure 
> > your application does not call any of ecrt_master_receive(), 
> > ecrt_master_send(), ecrt_master_send_ext(), ecrt_domain_process(),
> > ecrt_domain_queue() for a given master at the same time.  If you 
> > don't use EoE you don't need the callbacks, but you still need to 
> > protect ecrt_master_receive(), ecrt_master_send(),ecrt_domain_process(), ecrt_domain_queue().
> 
> Which I personally see as a trivial omission in the etherlabmaster design (requiring all API users to spend time and code writing the same code for protect the API from itself).
> 

Not a trivial omission as locking can't be done in the master for RTAI for hard realtime calls, so it need an alternative.  Also I suspect most simple applications only require one send/receive loop, so don't require locking anyway.  I've only needed to start adding locks since starting to use EoE.

> >> > And the reason it uses callbacks is because only your application 
> >> > knows if it is appropriate to allow the EOE thread to call
> >> > ecrt_master_recieve() and ecrt_master_send_ext() etc at any 
> >> > particular time.
> >> 
> >> I see what you mean.  But I just don't see how it is possible to 
> >> accomplish this for user space applications.
> >> 
> >
> > Exactly why I create my patch "0002-eoe-via-rtdm.patch". 
> 
> And again, which does not extend to non-RTDM user-space.
> 

Agreed, but it could be, but that will require application level locks for those that don't currently use them (for non-RTDM user-space apps).

> >> > However, if your application has multiple send/receive loops then 
> >> > they need to be synchronized with each other (see next comment).
> >> > There are a few more functions such as the distributed clock 
> >> > calls that are also in the send/receive loops.  They also do not 
> >> > require ethercat level locks as they should be safe to call 
> >> > between the activate and deactivate.  They also do not need 
> >> > application level locking as they should really only be called by 
> >> > one send/receive loop per master.  All other ethercat function calls should not be in a hard realtime context.
> >> 
> >> Ok.  But you imply a struct rule for how to design applications 
> >> doing EtherCAT communication: "There can only be one send/receive 
> >> loop per master".
> >> 
> >> That might (is not in my case) accepted by all application developers.
> >> I need to continue support for applications that use multiple 
> >> send/receive loops for multiple EtherCAT domains, each with different cycle time.
> >> 
> >> One solution could be to create a single send/receive loop per 
> >> master, and then let the multiple application loops communicate 
> >> with that instead of directly with the master.  That might actually 
> >> be a really good idea.  But, if that is done, I think it should fit 
> >> very well as a general feature of the EtherCAT master.  It should 
> >> be possible to implement without any application specific code.
> >
> > That's not what I said.  You can have multiple send/receive loops 
> > per master, but it is up to your application to make sure they don't 
> > call any of the ecrt functions I listed at the top at the same time 
> > (per master).  If your application is a single process with multiple 
> > threads you can use an application semaphore (for example), but if 
> > it is multiple processes you will need a named semaphore (or similar) that all of the processes share.
> 
> For the single process, multiple threads application, it is not that bad.  It is requring all such application developers to do the same over and over again, wasting time and introducing the same bugs again and again.
> 
> The multiple process world can be different though.  You basically end up with a new EtherCAT API.  A combination of the etherlabmaster API and a custom named semaphore API.  Without this, applications will not work properly together.  Why not include such a feature directly in etherlabmaster?  Without it, I think we are making the user-space applications (non-RTDM) into a second-class citizen.
> 
> It might not matter to you, but as this how we use etherlabmaster, it matters to me.
> 

It always maters and that's why the patch is optional.

For RTAI, application level locks is the only allowable case.  For non-RTAI apps either approach is possible, but most would probably prefer the built in EtherCAT master level locks (until they aren't powerful enough anymore).

> >> The final "problem" is as you say a design question.  Should 
> >> etherlabmaster support synchronization of access from multiple 
> >> threads/processes, or should this be left entirely to the 
> >> application developer.  But if
> >> master->datagram_queue is changed to something lock-free, this "problem" 
> >> master->will
> >> likely be solved also.
> >> 
> >> >> > However, if they are being called from multiple processes 
> >> >> > (rather than
> >> >> > threads) then you need to use something like a named semaphore 
> >> >> > so that all processes share the same lock.  Of course if you 
> >> >> > are using callbacks (for EoE) you are probably doing that anyway.
> >> >> 
> >> >> You easily have We have multiple processes.  Even with just a 
> >> >> single application, you have EtherCAT-OP, EtherCAT-EoE and the application.
> >> >> All using some of the same shared data structures.  Throwing 
> >> >> more multiple application just adds to that, but I think it is 
> >> >> critical enough with just a single application.
> >> >
> >> > Even if you have multiple processes (rather than threads), it is 
> >> > your design decision as to which process takes priority and 
> >> > whether a particular send/receive loop should wait a bit even if 
> >> > it could get the lock now.  You can only do that if your 
> >> > application controls the locking of these functions.
> >> 
> >> While that sounds reasonable, I believe it is not entirely true.  
> >> You could write a layer responsible for handling this, which gets 
> >> information from the
> >> application(s) for how to deal with this.  Think of it like a kind 
> >> of scheduler.
> >> 
> >> Example from an application I have seen.  You have two applications 
> >> (processes), each driving it's own EtherCAT domain.  One 
> >> application/domain is running every 2 ms, the other 
> >> application/domain every 10 ms.  In addition, EoE slaves are also 
> >> used.  For each application, you could specify the priority, the 
> >> cycle time and the start time.  EoE should also be given a priority.
> >> This EtherCAT "scheduler" would then be able to decide which EtherCAT communication to do and when.
> >> 
> >> So while I agree that it might be most obvious to implement the 
> >> decission of when to send/receive directly in the application, I 
> >> think it could also be implemented in a more generic way.
> >
> > You don't need the EtherCAT master to do that scheduling.  That is 
> > already available in Linux / RTAI or whatever you use.
> 
> What do you mean?
> 

In Linux you can give processes various priorities (via its nice value).  For RTAI you can also give each task a priority, but I think it is more strict that standard Linux where higher priority tasks will fully run before lower priority tasks.  If you have a single CPU (or single allocated CPU to all your EtherCAT processes) then the higher priority process will get more CPU and be woken up before a lower priority process if they are both waiting on a lock.  Even with multiple CPU's the higher priority task should always get a lock before a lower priority task.

So you don't need the EtherCAT master to have a scheduler on top of this.

> > But once one of the above ecrt function calls is in progress you 
> > need to ensure no other process / thread will call one at the same time.
> 
> Yes, that is your preferred design (that **I** need to ensure that).  I prefer a design where the master will do that for me (actually the application developers using our platform).
> 

For RTAI its not preferred, it's required.  Can't use Linux locks in RTAI hard realtime calls.

> >> >> > They should probably also be turned off for RTAI / Xenomi in 
> >> >> > general and as I said above use application level locking.
> >> >> >
> >> >> > You can pass --enable-rtdm when compiling to enable RTDM (and 
> >> >> > compile rtdm-ioctl.o).
> >> >> 
> >> >> Passing --enable-rtdm to ./configure will define EC_RTDM macro 
> >> >> and enable the automake conditional ENABLE_RTDM.
> >> >> 
> >> >> This will trigger Kbuild to build rtdm-ioctl.o (from 
> >> >> rtdm-ioctl.c), and this will be done with EC_IOCTL_RTDM macro defined.
> >> >> 
> >> >> But ioctl.c will be compiled as always, and that is without 
> >> >> EC_IOCTL_RTDM defined.  Was it supposed to be defined?  If so, 
> >> >> it should be easy to fix, but someone should definitely do some 
> >> >> real testing of it.
> >> >
> >> > I think the locks should be disabled in both "rtdm-ioctl.c" and 
> >> > "ioctl.c" if using RTAI, but I use RTDM so haven't confirmed this.
> >> 
> >> I don't see how it could ever be disabled for ioctl.c with the current code.
> >
> > By not using the patch that added them.
> 
> Of-course.  But back to the same discussion again.
> 
> >> > Further to that I don't think they should be there at all.  
> >> > Simple applications have one send/receive loop so don't need locks.
> >> > Applications with multiple send/receive loops or EOE need to 
> >> > control their own locking for optimal results anyway.
> >> 
> >> Again, I don't fully agree with you on that.
> >> 
> >> But more importantly, it is not possible for user-space 
> >> applications in combination with EoE, due to inability to set application callbacks.
> >> 
> >> > Also, having these lock functions use master_sem the send/receive 
> >> > functions block unnecessarily with non-realtime ethercat function 
> >> > calls.  At a minimum they should be locking on their own semaphore.
> >> 
> >> True.  I have patches for fixing that.  The master_sem is 
> >> definitely a no-go for real-time.
> >
> > So, I believe the following two patches are the problem:
> > - base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch
> > - base/0018-Use-call-back-functions.patch
> 
> Well, that depends on how you look at it.  Removing those patches removes the problem from etherlabmaster.  But then I just have to do exactly the same thing on top of etherlabmaster, and I will be back to square one.
> 
> > My new EoE patch rolls back the changes from 
> > "base/0018-Use-call-back-functions.patch" and I use EC_IOCTL_RTDM so 
> > ignore the changes in 
> > "base/0017-Master-locks-to-avoid-corrupted-datagram-queue.patch".
> 
> Ok.  I think you could have been a bit more explicit about that roll-back.
> 
> /Esben
>

Regards,
Graeme.
_______________________________________________
etherlab-dev mailing list
etherlab-dev at etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-eoe-addif-delif-tools.patch
Type: application/octet-stream
Size: 74582 bytes
Desc: 0001-eoe-addif-delif-tools.patch
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20180507/1813505b/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-eoe-via-rtdm.patch
Type: application/octet-stream
Size: 14987 bytes
Desc: 0002-eoe-via-rtdm.patch
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20180507/1813505b/attachment-0007.obj>