[etherlab-users] DC-Synchronization - Sync signal generation

Tue Apr 8 11:31:32 CEST 2014

Thank you so much for the test. I am sorry for the formatting error. I
should test it before send it out but I didn't have the chance.

I'm out of the office today. So I give you a quick answer without looking
into the code first right now.

1) sync ref slave to master.
>From your description, I didn't see any  problem in your code.
Apparently the slave needs more time to get synchronized at the first time
after a reboot. I think it is because the drift compensation is reset after
the reboot, and the slave losts the last drift estimation value to the
master clock.
If the slave do need such a long time to get DC sync converged after a
reboot, I don't think I can do anything to make it better. I also doubt
that the original Etherlab-Master can do it better. After all, the drift
compensation algorithm is on the slave side, the master can only try to
give the best transmission delay estimation and the system offset value.
In the situation that the slave always gets reboot, the difference between
my changes and the original code is not so apparently. You'll have the
benefit if the master program always need to be rebooted, and the slave has
been synchronized with the master once before.

Hi Jun,



I tried rebuilding the master with the code from your bundle (merged with
the latest 1.5.2 tip) but I was still getting the 5000ms sync timeouts.
(Although it does seem a lot rarer for it to actually print that error, it
does still seem to take a few seconds to get to OP, which is indicative of
sync taking a while.)



I haven’t tried modifying my user code yet, but the way that it works at
the moment is:

-          Outside the cyclic thread, it calls
ecrt_master_set_send_interval(master, TIMESPEC2NS(cycletime)/1000), sets up
the slaves, and starts the cyclic thread (no calls to set master time).

-          At the top of the cyclic thread, it activates the master and
inits the first wakeup time (as in the original dc_user example).

-          In the cyclic loop, it sleeps until the wakeup time, then
receives and processes, and:

o   Every loop: ecrt_master_application_time(master, TIMESPEC2NS(time));

o   Every 2nd loop: ecrt_master_sync_reference_clock(master);

o   Every loop: ecrt_master_sync_slave_clocks(master);

-          Then queue & send, and repeat loop.



So it’s essentially the same as the dc_user example.  Currently I’m testing
it on a network with only one DC-enabled slave.





Possibly of interest is that the error seems to be related to the slave
boots – if I reboot the slave (resetting its internal clock) then the next
start of the master application will produce the 5000ms timeout.
Subsequent starts of the master app seem to start ok.



If I set it to “debug 1” then on that first run it prints “Sync after 4996
ms: 4293798555 ns” (and the number was decreasing), which looks like
something needs to be signed rather than unsigned (it’s about -1ms).  On
subsequent runs the synchrony seems to be typically around 600ns (sometimes
up to about 2000ns), which is pretty good.  (In rare cases it does the
negative value thing again although with a much smaller magnitude, and it
takes a third or fourth try to “really” lock it in.)



I had a quick look at nearby code and it looks like the number is a
formatting bug; this patch fixes it:

--- a/master/fsm_slave_config.c

+++ b/master/fsm_slave_config.c

@@ -1400,8 +1400,8 @@

             EC_SLAVE_WARN(slave, "Slave did not sync after %lu ms.\n",

                     diff_ms);

         } else {

-            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10u ns\n",

-                    diff_ms, EC_READ_U32(datagram->data) & 0x80000000 ?
–abs_sync_diff: abs_sync_diff);

+            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10d ns\n",

+                    diff_ms, (EC_READ_U32(datagram->data) & 0x80000000) ?
–abs_sync_diff: abs_sync_diff);

             // check synchrony again

             ec_datagram_fprd(datagram, slave->station_address, 0x092c, 4);



However the underlying problem remains; this just makes it show the initial
sync difference after 1388ms is -371806ns and it gets worse over time
instead of better (finishing up after 5s at close to -1ms).  Subsequent
runs always seem to be better, although sometimes it takes 2-5 runs to get
it “right”.



Regards,

Gavin Lambert



*From:* Jun Yuan [mailto:j.yuan at rtleaders.com]
*Sent:* Friday, 4 April 2014 21:00
*To:* Gavin Lambert
*Cc:* etherlab-users at etherlab.org
*Subject:* Re: [etherlab-users] DC-Synchronization - Sync signal generation



Hi,

that's all right. I'm using Xenomai. I just want to demonstrate the idea
about how to synchronize the master clock to ref slave clock in an
alternative way. I choose the RTAI example to have the comparison with the
method of Graeme Foot.

You can test the rest nevertheless without that part. I didn't change the
API, so you don't need to change anything in your code. But
1) If you call ecrt_master_application_time() outside of the loop, it is
recommended that you remove it.
2) And only if you use ecrt_master_reference_clock_time(), you need to
notice that at the program start, before the calculation of dc system time
offsets for each slave has been done, ecrt_master_reference_clock_time()
would now have errno EAGAIN to notify the user that the ref clock is not
ready yet. So it is worth to always check the return value of
ecrt_master_reference_clock_time(), like I did in my rtai_rtdm_dc example.



That's it.



Regards,
Jun



On Fri, Apr 4, 2014 at 3:17 AM, Gavin Lambert <gavinl at compacsort.com> wrote:

Hi Jun,



Thanks; I’m having a look at it, but much of it is new to me.  I’m using
PREEMPT_RT so my code is based on the dc_user example, not the RTAI
examples, and I’d probably have to try adapting it before I could test it.



Regards,

Gavin Lambert



*From:* Jun Yuan [mailto:j.yuan at rtleaders.com]
*Sent:* Friday, 4 April 2014 01:46
*To:* Gavin Lambert
*Cc:* etherlab-users at etherlab.org
*Subject:* Re: [etherlab-users] DC-Synchronization - Sync signal generation



Hi Gavin,

your interest is my motivation. I have attached the bundle file.

My changes is base on the newest Version 1.5.2 in 'stable-1.5' branch. I
added a new 'rtleaders' branch first and did all my changes on that. So
after "$ hg unbundle etherlab_1.5.2_jyuan.hg", don't forget to switch to
the 'rtleaders' branch using "$ hg update rtleaders".

I found a better way of synchronizing the master clock to ref slave clock.
It is much faster and more stable. I managed to port my C++ code into C
code in the rtai_rtdm_dc example today, but I cannot test if the new code
compiles right now. If you have a rtai environment, please test it for me
if it compiles, and give me some feedback.

Besides that, there is a more accurate DC time offset calculation. There
should be no more errors like "Slave did not sync after 5000ms". The
accurate time offset estimation saves much time for the DC Sync procedure.
Slaves would have such a small dc diff (several hundred ns maybe) at the
beginning of the dc sync check, that I even changed
EC_SYSTEM_TIME_TOLERANCE_NS from 1000000ns to 1000ns.

The postponed check of master->has_app_time makes the error "No app_time
received up to now, but master already active" away.

And there are the bugfix for ecrt_master_select_reference_clock() from
Graeme Foot, and some other bug fixes from Jeroen Van den Keybus.



Any feedback is welcome. Have fun testing those changes!



Jun



On Thu, Apr 3, 2014 at 12:13 AM, Gavin Lambert <gavinl at compacsort.com>
wrote:

On 2 April 2014 22:40, quoth Jun Yuan:

> But there is a reason why we all put the ecrt_master_application_time()
outside
> the loop. Because we all got burned by the error "No app_time received up
to
> now, but master already active.", which is a timing bug in Etherlab. I've
> resolved the problem by change the code of Etherlabmaster, which get rid
of
> the "No app_time" bug. Now I don't need to call
ecrt_master_application_time()
> outside the loop any more. I will publish the bundle to the mailing list
when
> I have time.

I'd be very interested to see this.  Slave sync timing, "no app time", and
the 5000ms sync timeout have been a recurring bugbear for me.




-- 
Jun Yuan
[Aussprache: Djün Üän]

Robotics Technology Leaders GmbH
Am Loferfeld 58, D-81249 München
Tel: +49 89 189 0465 24
Fax: +49 89 189 0465 11
mailto: j.yuan at rtleaders.com

Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten y, j,
q, und x wird u als ü ausgesprochen, z.B. yu => ü,  ju => dschü,  qu =>
tschü,  xu => schü.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20140408/af792801/attachment-0003.htm>