[etherlab-users] Userspace scheduling with RTAI

Thu Mar 24 01:53:56 CET 2011

Hello,

I am experiencing some timing issues in relation to using Etherlab in
userpace, I am also using RTAI user-mode to try to get timing
scheduling (1ms cycle) of the process.  I would like to explain my
procedure used to measure the timing first, and then show the actual
results on my system.

The system is a Beckhoff Unit with the following specifications:
Intel Celeron-M 999Mhz CPU
1GB Ram
e100 Network Interface

The code
---

tick ( ) {
<t1>
ecrt_master_receive()
<t2>
ecrt_domain_process()
// some processing
ecrt_domain_queue()
<t3>
ecrt_master_send()
<t4>
}

This is a basic workflow of reading and writing, with timestamp
capture (in nano seconds)  at each stage to get timings.

t_receive = t2 - t1
t_process = t3 - t2
t_send = t4 - t3

The main thread is scheduled via RTAI,

main_thread() {
   while (1) {
      sleep_until_next_time_period();
      tick();
      update_and_print_timing();
    }
}

The call to sleep_until_next_time_period, wakes the thread up when it
is time, tick() performs the actual work of sending, process and
receiving followed up by update_and_print_timing, which computes the
time taken for each stage, comparing it against a overall worst case.

The results:

The timings fluctuate wildly, and it is hard to quantitatively analyse
them as one run can differ from the next vastly, however a common
pattern emerges.  The system is run with no additional processes
running (except for ssh, bash etc.). The timings are at their worse
when the system is still spooling up (although I am not sure if this
analogy is useful here), after a couple of cycles the system settles
into a rhythm.

The min/max bounds of the various measurements are as follows (in nano-seconds):

t_receive =   10107   | 54297
t_process = 17275   | 49137
t_send =      7774    | 36845

Occasionally the receive or the send time will spike to 200 000 (or
approximately 200 micro-seconds) - which is acceptable since the sum
of all the times are less then the cycle period of 1ms.  Although at
the moment the logic is simple and in a real situation it would leave
little time to do any processing.

This event of (spikes to 200k) happen once every 10-15 seconds (it
might happen sooner or later - hard to predict)!

The above illustrates a performance level that is acceptable with the
current work load (toggling output pins), however every 30 seconds to
90 seconds, the delays are unacceptable to the real-time operation,
and we end up over-running the 1ms tick.

Absolute worst case,

t_receive = 996141 (ns) ~= 996 (micro-s)
t_process = 181540 (ns) ~= 182 (micro-s)
t_send = 241299 (ns) ~= 241 (micro-s)

These Absolute-worst-case events are distinct from the general
oscillations in that, they happen once in a while (30-90 seconds) but
don't repeat themselves again until some later times (30-90 seconds).

Running other programs on the system, writing to disk etc. (cat
/dev/zero > /foo) does not seem to increase or decrease the chances of
this happening, nor does it skew the distribution of the values is any
noticeable way (to the naked eye at least).

Comparison against a in-kernel version
----

To see if the latency and jitter experienced is to do with the
application being in userspace, I wrote the exact same program but in
kernel-space (also using RTAI for accurate time scheduling of the 1ms
period).

The results as follows:

Min/max bounds (ns):

t_receive =  1401   | 6357
t_process = 350    | 1348
t_send =     1471   | 4288

Here too the timings oscillate, but they are much more predictable and
bound by the absolute worst case values show below.

Absolute worst case (after the system settles):

t_receive = 7436 (ns) ~= 7 (micro-s)
t_process = 4105 (ns) ~= 4 (micro-s)
t_send = 9101 (ns) ~= 9 (micro-s)

---

Assuming that the worst case values of the kernel version are the best
the system can achieve (cpu, memory, network constraints), it is hard
to understand why the userspace version performs so poorly.  Even with
a few micro seconds of overhead caused by calling into the IOCTL
interface of the using etherlab from userspace - it doesn't explain
why sometimes we miss the deadline (at worst) and why it takes
considerably large chuck of the time slice (most of the time).

I am really stuck with ideas on what it could be :|

I will post up my test code if anyone is wanting to repeat this experiment.

-ravi

-- 
C-x C-s, C-x C-c