[etherlab-users] DC-Synchronization - Sync signal generation

Tue Apr 8 04:35:11 CEST 2014

Hi Jun,

 

I tried rebuilding the master with the code from your bundle (merged with the latest 1.5.2 tip) but I was still getting the 5000ms sync timeouts.  (Although it does seem a lot rarer for it to actually print that error, it does still seem to take a few seconds to get to OP, which is indicative of sync taking a while.)

 

I haven’t tried modifying my user code yet, but the way that it works at the moment is:

-          Outside the cyclic thread, it calls ecrt_master_set_send_interval(master, TIMESPEC2NS(cycletime)/1000), sets up the slaves, and starts the cyclic thread (no calls to set master time).

-          At the top of the cyclic thread, it activates the master and inits the first wakeup time (as in the original dc_user example).

-          In the cyclic loop, it sleeps until the wakeup time, then receives and processes, and:

o   Every loop: ecrt_master_application_time(master, TIMESPEC2NS(time));

o   Every 2nd loop: ecrt_master_sync_reference_clock(master);

o   Every loop: ecrt_master_sync_slave_clocks(master);

-          Then queue & send, and repeat loop.

 

So it’s essentially the same as the dc_user example.  Currently I’m testing it on a network with only one DC-enabled slave.

 

 

Possibly of interest is that the error seems to be related to the slave boots – if I reboot the slave (resetting its internal clock) then the next start of the master application will produce the 5000ms timeout.  Subsequent starts of the master app seem to start ok.

 

If I set it to “debug 1” then on that first run it prints “Sync after 4996 ms: 4293798555 ns” (and the number was decreasing), which looks like something needs to be signed rather than unsigned (it’s about -1ms).  On subsequent runs the synchrony seems to be typically around 600ns (sometimes up to about 2000ns), which is pretty good.  (In rare cases it does the negative value thing again although with a much smaller magnitude, and it takes a third or fourth try to “really” lock it in.)

 

I had a quick look at nearby code and it looks like the number is a formatting bug; this patch fixes it:

--- a/master/fsm_slave_config.c

+++ b/master/fsm_slave_config.c

@@ -1400,8 +1400,8 @@

             EC_SLAVE_WARN(slave, "Slave did not sync after %lu ms.\n",

                     diff_ms);

         } else {

-            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10u ns\n",

-                    diff_ms, EC_READ_U32(datagram->data) & 0x80000000 ? –abs_sync_diff: abs_sync_diff);

+            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10d ns\n",

+                    diff_ms, (EC_READ_U32(datagram->data) & 0x80000000) ? –abs_sync_diff: abs_sync_diff);

             // check synchrony again

             ec_datagram_fprd(datagram, slave->station_address, 0x092c, 4);

 

However the underlying problem remains; this just makes it show the initial sync difference after 1388ms is -371806ns and it gets worse over time instead of better (finishing up after 5s at close to -1ms).  Subsequent runs always seem to be better, although sometimes it takes 2-5 runs to get it “right”.  

 

Regards,

Gavin Lambert

 

From: Jun Yuan [mailto:j.yuan at rtleaders.com] 
Sent: Friday, 4 April 2014 21:00
To: Gavin Lambert
Cc: etherlab-users at etherlab.org
Subject: Re: [etherlab-users] DC-Synchronization - Sync signal generation

 

Hi,

that's all right. I'm using Xenomai. I just want to demonstrate the idea about how to synchronize the master clock to ref slave clock in an alternative way. I choose the RTAI example to have the comparison with the method of Graeme Foot. 

You can test the rest nevertheless without that part. I didn't change the API, so you don't need to change anything in your code. But 
1) If you call ecrt_master_application_time() outside of the loop, it is recommended that you remove it. 
2) And only if you use ecrt_master_reference_clock_time(), you need to notice that at the program start, before the calculation of dc system time offsets for each slave has been done, ecrt_master_reference_clock_time() would now have errno EAGAIN to notify the user that the ref clock is not ready yet. So it is worth to always check the return value of ecrt_master_reference_clock_time(), like I did in my rtai_rtdm_dc example.

 

That's it.

 

Regards,
Jun

 

On Fri, Apr 4, 2014 at 3:17 AM, Gavin Lambert <gavinl at compacsort.com> wrote:

Hi Jun,

 

Thanks; I’m having a look at it, but much of it is new to me.  I’m using PREEMPT_RT so my code is based on the dc_user example, not the RTAI examples, and I’d probably have to try adapting it before I could test it.

 

Regards,

Gavin Lambert

 

From: Jun Yuan [mailto:j.yuan at rtleaders.com] 
Sent: Friday, 4 April 2014 01:46
To: Gavin Lambert
Cc: etherlab-users at etherlab.org
Subject: Re: [etherlab-users] DC-Synchronization - Sync signal generation

 

Hi Gavin,

your interest is my motivation. I have attached the bundle file. 

My changes is base on the newest Version 1.5.2 in 'stable-1.5' branch. I added a new 'rtleaders' branch first and did all my changes on that. So after "$ hg unbundle etherlab_1.5.2_jyuan.hg", don't forget to switch to the 'rtleaders' branch using "$ hg update rtleaders".

I found a better way of synchronizing the master clock to ref slave clock. It is much faster and more stable. I managed to port my C++ code into C code in the rtai_rtdm_dc example today, but I cannot test if the new code compiles right now. If you have a rtai environment, please test it for me if it compiles, and give me some feedback. 

Besides that, there is a more accurate DC time offset calculation. There should be no more errors like "Slave did not sync after 5000ms". The accurate time offset estimation saves much time for the DC Sync procedure.  Slaves would have such a small dc diff (several hundred ns maybe) at the beginning of the dc sync check, that I even changed EC_SYSTEM_TIME_TOLERANCE_NS from 1000000ns to 1000ns.

The postponed check of master->has_app_time makes the error "No app_time received up to now, but master already active" away.

And there are the bugfix for ecrt_master_select_reference_clock() from Graeme Foot, and some other bug fixes from Jeroen Van den Keybus.

 

Any feedback is welcome. Have fun testing those changes!

 

Jun

 

On Thu, Apr 3, 2014 at 12:13 AM, Gavin Lambert <gavinl at compacsort.com> wrote:

On 2 April 2014 22:40, quoth Jun Yuan:

> But there is a reason why we all put the ecrt_master_application_time() outside
> the loop. Because we all got burned by the error "No app_time received up to
> now, but master already active.", which is a timing bug in Etherlab. I've
> resolved the problem by change the code of Etherlabmaster, which get rid of
> the "No app_time" bug. Now I don't need to call ecrt_master_application_time()
> outside the loop any more. I will publish the bundle to the mailing list when
> I have time.

I'd be very interested to see this.  Slave sync timing, "no app time", and the 5000ms sync timeout have been a recurring bugbear for me.




-- 
Jun Yuan
[Aussprache: Djün Üän]

Robotics Technology Leaders GmbH
Am Loferfeld 58, D-81249 München
Tel: +49 89 189 0465 24
Fax: +49 89 189 0465 11
mailto: j.yuan at rtleaders.com

Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten y, j, q, und x wird u als ü ausgesprochen, z.B. yu => ü,  ju => dschü,  qu => tschü,  xu => schü. 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20140408/68a0f44b/attachment-0003.htm>