[etherlab-users] etherlab dc sync check

Jun Yuan j.yuan at rtleaders.com
Sat Jan 4 14:37:36 CET 2014


I feel great, Jeroen's log confirms my worry about the system time
offset calculation. I've prepared a patch for it. Now I need someone
to apply my patch, and do the same experiement again. I'm pretty sure
this patch would improve the performance of the DC sync on your
system. Just tell me which version of etherlabmaster you're using, and
I can make a patch for you. If you have any difficulties to apply my
patch to your source of etherlabmaster, I'll help.

OK, let me analyse what happened in Jeroen's experiement.

> [2020582.839193] EtherCAT DEBUG 0-0: Checking system time offset.
> [2020582.843201] EtherCAT DEBUG 0-0: DC 64 bit system time offset calculation: system_time=1388788619730212737 (corrected with 4000000), app_time=1388788619729210397, diff=-1002340
> [2020582.843204] EtherCAT DEBUG 0-0: Setting time offset to 1388787098688224529 (was 1388787098689226869)

The timestamps [2020582.839193] -> [2020582.843201] tells that
Jeroen's got a master EtherCAT-OP thread running in a loop of 4ms.
Every 4ms, the master thread wakes up and does some work. To make it
easier to explain, I'll substract 2020582 seconds from the timestamp
from now on.
I bet the first slave is used as the reference clock.

839193us: "Checking system time offset." means the master prepares a
datagram to read the system time from the first the slave. This
datagram will be sent out later by the next ecrt_master_send() call in
the user loop.
843201us: The master reads the answer from the slave. The first slave
has a 64 bit DC. And the master sees a difference between the master
app time and the first slave's system time of -1002340ns.
843204us: Based on the diff the master calculated, the master turns
the clock of the first slave 1002340ns back.

> [2020582.847211] EtherCAT DEBUG 0-1: Checking system time offset.
> [2020582.851219] EtherCAT DEBUG 0-1: DC 64 bit system time offset calculation: system_time=1388788619738253673 (corrected with 4000000), app_time=1388788619737210345, diff=-1043328
> [2020582.851221] EtherCAT DEBUG 0-1: Setting time offset to 1388787098694672687 (was 1388787098695716015)

The same happens for the second slave. The master sees a diff of
-1043328ns, turns its clock 1043328ns back.

> [2020582.855227] EtherCAT DEBUG 0-2: Checking system time offset.
> [2020582.859235] EtherCAT DEBUG 0-2: DC 64 bit system time offset
> calculation: system_time=1388788619746127955 (corrected with 4000000),
> app_time=1388788619745210316, diff=-917639
> [2020582.859237] EtherCAT DEBUG 0-2: Not touching time offset.

The same for the third slave. The master sees a diff of -917639ns.
Since it's with the tolerance EC_SYSTEM_TIME_TOLERANCE_NS, the master
doesn't change anything.

> [2020582.863244] EtherCAT DEBUG 0-3: Checking system time offset.
> [2020582.867251] EtherCAT DEBUG 0-3: DC 64 bit system time offset
> calculation: system_time=1388788619754132695 (corrected with 4000000),
> app_time=1388788619753210302, diff=-922393
> [2020582.867253] EtherCAT DEBUG 0-3: Not touching time offset.

Same for the slave 4. diff -922393ns.

OK, wait a minute. Why does the master always see a difference of
around -1ms between his app time and the time of all the 4 slaves
above? Is that a just coincident?
If we assume the diffs are correct, it would means that within only
several seconds (the time slot between the stop and the restart of the
master application), there comes a drift of -1ms between the master
and the first 4 slaves. Do we really have such a bad clock which
results such a bad drift effect, 1ms slower in only several seconds?

> [2020582.871260] EtherCAT DEBUG 0-4: Checking system time offset.
> [2020582.875267] EtherCAT DEBUG 0-4: DC 64 bit system time offset
> calculation: system_time=1388788619762180961 (corrected with 4000000),
> app_time=1388788619762210451, diff=29490
> [2020582.875269] EtherCAT DEBUG 0-4: Not touching time offset.

The fifth slave seems to be good. Well, this suprises me a little.

> [2020582.879278] EtherCAT DEBUG 0-5: Checking system time offset.
> [2020582.883286] EtherCAT DEBUG 0-5: DC 64 bit system time offset
> calculation: system_time=1388788619767191997 (corrected with 0),
> app_time=1388788619770210343, diff=3018346
> [2020582.883288] EtherCAT DEBUG 0-5: Setting time offset to
> 1388787098704211274 (was 1388787098701192928)

Hold on, now we have a correction of 0 ns instead of 4000000ns as
above. And the master sees a diff=3018346ns,  that's approx. 3ms

> [2020582.887294] EtherCAT DEBUG 0-6: Checking system time offset.
> [2020582.891302] EtherCAT DEBUG 0-6: DC 64 bit system time offset
> calculation: system_time=1388788619775150970 (corrected with 0),
> app_time=1388788619778210426, diff=3059456
> [2020582.891304] EtherCAT DEBUG 0-6: Setting time offset to
> 1388787098700760295 (was 1388787098697700839)
> [2020582.895311] EtherCAT DEBUG 0-7: Checking system time offset.
> [2020582.899318] EtherCAT DEBUG 0-7: DC 64 bit system time offset
> calculation: system_time=1388788619783154795 (corrected with 0),
> app_time=1388788619786210495, diff=3055700
> [2020582.899321] EtherCAT DEBUG 0-7: Setting time offset to
> 1388787098701077013 (was 1388787098698021313)
> [2020582.903327] EtherCAT DEBUG 0-8: Checking system time offset.
> [2020582.907334] EtherCAT DEBUG 0-8: DC 64 bit system time offset
> calculation: system_time=1388788619791155722 (corrected with 0),
> app_time=1388788619794210387, diff=3054665
> [2020582.907337] EtherCAT DEBUG 0-8: Setting time offset to
> 1388787098702059792 (was 1388787098699005127)
> [2020582.911343] EtherCAT DEBUG 0-9: Checking system time offset.
> [2020582.915351] EtherCAT DEBUG 0-9: DC 64 bit system time offset
> calculation: system_time=1388788619799144106 (corrected with 0),
> app_time=1388788619802210392, diff=3066286
> [2020582.915353] EtherCAT DEBUG 0-9: Setting time offset to
> 1388787098703046442 (was 1388787098699980156)
> [2020582.919360] EtherCAT DEBUG 0-10: Checking system time offset.
> [2020582.923367] EtherCAT DEBUG 0-10: DC 64 bit system time offset
> calculation: system_time=1388788619807217599 (corrected with 0),
> app_time=1388788619810210385, diff=2992786
> [2020582.923369] EtherCAT DEBUG 0-10: Setting time offset to
> 1388787098694401972 (was 1388787098691409186)
> [2020582.927376] EtherCAT DEBUG 0-11: Checking system time offset.
> [2020582.931383] EtherCAT DEBUG 0-11: DC 64 bit system time offset
> calculation: system_time=1388788619815217343 (corrected with 0),
> app_time=1388788619818210352, diff=2993009
> [2020582.931386] EtherCAT DEBUG 0-11: Setting time offset to
> 1388787098694270706 (was 1388787098691277697)
> [2020582.935392] EtherCAT DEBUG 0-12: Checking system time offset.
> [2020582.939400] EtherCAT DEBUG 0-12: DC 64 bit system time offset
> calculation: system_time=1388788619823158753 (corrected with 0),
> app_time=1388788619826210432, diff=3051679
> [2020582.939402] EtherCAT DEBUG 0-12: Setting time offset to
> 1388787098702271332 (was 1388787098699219653)
> [2020582.943409] EtherCAT DEBUG 0-13: Checking system time offset.
> [2020582.947416] EtherCAT DEBUG 0-13: DC 64 bit system time offset
> calculation: system_time=1388788619831157270 (corrected with 0),
> app_time=1388788619834210362, diff=3053092
> [2020582.947418] EtherCAT DEBUG 0-13: Setting time offset to
> 1388787098703045985 (was 1388787098699992893)
> [2020582.951425] EtherCAT DEBUG 0-14: Checking system time offset.
> [2020582.955432] EtherCAT DEBUG 0-14: DC 64 bit system time offset
> calculation: system_time=1388788619839165891 (corrected with 0),
> app_time=1388788619842210326, diff=3044435
> [2020582.955435] EtherCAT DEBUG 0-14: Setting time offset to
> 1388787098703499171 (was 1388787098700454736)
> [2020582.959441] EtherCAT DEBUG 0-15: Checking system time offset.
> [2020582.963449] EtherCAT DEBUG 0-15: DC 64 bit system time offset
> calculation: system_time=1388788619847182087 (corrected with 0),
> app_time=1388788619850210371, diff=3028284
> [2020582.963451] EtherCAT DEBUG 0-15: Setting time offset to
> 1388787098700973845 (was 1388787098697945561)
> [2020582.967458] EtherCAT DEBUG 0-16: Checking system time offset.
> [2020582.971465] EtherCAT DEBUG 0-16: DC 32 bit system time offset
> calculation: system_time=1116891966 (corrected with 0),
> app_time=1388788619858210417, diff=3048243
> [2020582.971467] EtherCAT DEBUG 0-16: Setting time offset to 614131959 (was
> 611083716)

The same, corrected with 0, master sees a diff of approx. 3ms for the
slave No. 7 to 17.

Shall I believe that all the slave 6 to 17 have a same drift of approx
3ms, and the slave 1 to 4 drift to approx -1ms, within only several
seconds?
No, I prefer to believe that the master doesn't calculate the diff right.

> [2020583.075648] EtherCAT DEBUG 0-1: Checking for synchrony.
> [2020583.079655] EtherCAT DEBUG 0-1: 37 ns difference after 4 ms.

The master checks the "system time difference" register at address
0x092c for the second slave, and it is 37ns.  This is the time
difference between the reference clock (the first slave) and the
second slave.
As the master turned both the first and the second slave's clock
approx. 1ms back, (although it did this wrongly),  the end effect is
that the second slave has no big "system time difference" refers to
the first slave.

Jeroen got lucky. If the master checks the 0x092c on the first
slave(the ref clock) in this experiement, I bet it will show "EtherCAT
DEBUG 0-0: approx. 1000000 ns difference after 4 ms."
Jeroen, can you configure your application to use both the first and
the second slave? Or you may configure the master to use the second
slave as the reference clock with the function
ecrt_master_select_reference_clock(). In both way, the master would
check for synchrony on the ref clock. Then we'll have a good chance to
see 1ms or 3ms "difference after 4ms", and the warning "Slave did not
sync after 5000 ms" later.

Thanks to Jeroen's log, I realize I made a small calculation mistake
in my first mail predicting the diff would be typically approx. -1ms
or 2ms. It's actually -1ms or 3ms.

Let me repeat and correct my words in my first mail:

Take an example, we have a master FMS EtherCAT-OP thread running in a
loop of 4ms, and a user realtime application thread running at 1ms.
Let’s define the time the step 1 happens is x ms. And assume that the user loop
runs 0.5ms (which could be actually something around 0 to 1 ms) after
the EtherCAT-OP.

The following would happen in the current etherlabmaster:
Time : Event
x    ms: Step 1, FSM prepares an FPRD datagram to 0x0910
x+0.5ms: Step 2, user loop sets a new app_time; the FPRD datagram gets
sent out, the sending timestamp jiffies is stored in datagram->jiffies_sent;
x+1.5ms: Step 3, user loop sets a new app_time; the datagram is
received, the receiving timestamp jiffies is stored in
datagram->jiffies_received;
x+2.5ms: user loop sets a new app_time;
x+3.5ms: user loop sets a new app_time;
x+4  ms: Step 4, FSM calculate the time offset.

And here is the source code in ec_fsm_master_dc_offset64()

    // correct read system time by elapsed time since read operation
    correction = (u64) (jiffies_since_read * 1000 / HZ) * 1000000;
    system_time += correction;
    time_diff = fsm->slave->master->app_time - system_time;

The jiffies is a counter in Linux kernel which get increased by 1 in a
frequency defined by HZ. Usually we have a 250 Hz linux system, so the 1
jiffies means 4 ms. As jiffies_sent was taken when the master clock is
x+0.5ms, and the current jiffies value is taken at x+4ms. We have a
possibility of 0.5/4 = 12.5% that the jiffies don’t increase itself
during that 3.5ms time, and 87.5% possibility that the jiffies has
been increased by 1. This means the value „correction“ would have a
typical value of 4000000ns, occasionally being 0 ns.

Let’s assume that the slave DC has been perfectly synchronized with
the master app time(with no drift error at all). So the system_time
from the slave equals to
0.5ms(the time the FPRD datagram was sent). With correction added,
system_time = x+4.5ms or x+0.5ms.

The app_time is x+3.5ms at the time of the Step 4.

time_diff = app_time - system_time = (x+3.5) - (x+4.5) or (x+3.5) - (x+0.5) =
-1000000ns, or 3000000ns, depending on the correction.

See, the time_diff should actually be 0, not -1ms or 3ms, as we
assumed before that
the slave DC is perfectly synchronized with the master app time.



More information about the Etherlab-users mailing list