[etherlab-dev] Pre-announcement: unofficial patchset update (Gavin Lambert)

Mon May 8 02:30:45 CEST 2017

On 7 May 2017 20:33, quoth Graeme Foot:
> I don't use ec_master_set_send_interval() or --enable-hrtimer since my
masters operational thread 
> has always run at 10ms (100Hz) anyway.  (I probably should so will look at
adding that in at some 
> stage.)  My Linux kernel is configured to run at 100Hz and the master
thread is not realtime so is 
> scheduled by the Linux scheduler (RTAI).  Because Linux is set to 100Hz,
it only runs the masters 
> operation thread once every 10ms.
> 
> Prior to your SDO patch, ec_fsm_master_action_process_sdo() was being
called by the masters 
> fsm, but from its idle state.  So it would only be called after all other
processing and 
> housekeeping was complete and was only being fired approx once every 800ms
on my setup 
> with 50 odd slaves.  After the patch ec_master_exec_slave_fsms() is now
called every time 
> the masters operational thread fires.  All good except that on my system
that is still only once 
> every 10ms.

Right, but what I was saying is that prior to my patch it would actually
service the requests faster than 10ms even on your system due to the way the
re-scheduling is done (the master isn't idle until it finishes any
outstanding requests, so it calls schedule() instead of schedule_timeout(1)
-- ie. if there's no other work for the kernel to do it will reschedule
immediately instead of waiting for the next 10ms time slice).  After my
patch the master is idle even while requests are in progress, so the
condition it checks is no longer sufficient and it calls schedule_timeout(1)
too soon, making it slower than it should be.

I didn't notice this regression because I *am* using --enable-hrtimer, which
does not have the same issue.  So what I was suggesting is that *you* try
using --enable-hrtimer, which I'm reasonably certain will solve that
performance issue without needing to try to exec slave FSMs from the
realtime context.

If you can't use --enable-hrtimer for some reason, then the most likely
solution to the above issue is to find the two lines where it checks for
ec_fsm_master_idle (in ec_master_idle_thread and ec_master_operation_thread)
and change the condition from this:

        if (ec_fsm_master_idle(&master->fsm)) {

to this:

        if (ec_fsm_master_idle(&master->fsm)
            && !master->fsm_exec_count) {

This is just air code and I haven't tested it, but it seems reasonably
likely to solve the issue and restore performance without --enable-hrtimer
to pre-patch levels or better.  Though there might be a risk that it will
make the kernel do some busy-waiting in some cases, though that shouldn't
bother an RTAI application.

> So my new patch allows ec_master_exec_slave_fsms() to be called from my
realtime context.  As 
> you pointed out the master_sem lock would cause a deadlock, so I don't use
it.  Because I don't 
> use the lock I have instead added some flags to track whether it is
currently safe to make the 
> ec_master_exec_slave_fsms() call.  It's generally just the rescan thats a
problem.

I haven't looked at your patch in detail, but it makes me nervous to pull
code outside of a lock like that; there are a lot of data structures that it
protects, and some of them might be more subtle than rescan.  Also, while
this probably isn't a problem with RTAI (since it can pre-empt the Linux
kernel), this API probably would be unsafe to use with regular kernel or
userspace code due to the inverse problem -- what if the app code is in the
middle of executing ec_master_exec_slave_fsms() when the master thread
decides to start a rescan (or otherwise mutate data structures it depends
on)?

> I don't know if the patch will be useful for anyone else, but is useful if
Linux is configured for 
> 100Hz.  It may also be useful on short cycle time systems, e.g. 100 -
250us cycle times, 
> where you want to process the SDO's faster.  Even if Linux is set to
1000Hz is will only 
> schedule the master operational thread at 1ms.  The master thread may also
be delayed if 
> the Linux side gets some heavy CPU usage.

SDOs by design are intended to be slower-than-cycle tasks.  They're for
occasional configuration, diagnostic, or slow acyclic tasks, not for rapid
activity, so if you're trying to get 1ms or higher response rates out of
them, you're probably doing it wrong.  (Recommended timeouts for SDO tasks
are generally measured in *seconds*, not milliseconds.)