[etherlab-users] Calculation of time_diff in ec_fsm_master_dc_offset()

Jun Yuan j.yuan at rtleaders.com
Wed Jan 23 17:36:54 CET 2013


Hello Florian,

I think there might be a bug in the calculation of time_diff between the
master clock and the slave clock in fsm_master.c. My temporary solution is
to comment out the line 'system_time32 += correction;' in the function
ec_fsm_master_dc_offset32() and the line 'system_time += correction;' in
ec_fsm_master_dc_offset64().

Here is my story. I've been testing the etherlab master on Xenomai. After
several times restart of my master process, I occasionally found something
interesting: the DC sync between the master and the slave always takes a
long time (approx. 5 second) by each start, and it seems by each time the
time diff between the master and the slave is always approx. -4000000 ns.
[16165.453658] EtherCAT DEBUG 0-0: DC 32 bit system time offset
calculation: system_time=xxxxxxxxxxx (corrected with 4000000),
app_time=xxxxxxxxxx, diff=-3954618

And the checking synchrony will be started with a difference around that
diff time.
[16165.521874] EtherCAT DEBUG 0-0: Checking for synchrony.
[16165.529891] EtherCAT DEBUG 0-0: Sync after    4 ms:     3941732 ns

This is odd. Since the master has synchronized with the slave just several
seconds ago before the restart, their clocks should not have 4 ms
difference in such a short time. And it is strange that the value is always
around 4000000 ns. So I digged into the master's source code to find out
the reason.

I have etherlab master rev 2498:9cdd7669dc0b at stable-1.5 on Xenomai 2.6.1,
and my rt task is something like the following:

ecrt_master_application_time(master, dummy_time);

while(true) { // run loop
        wait_period();
        ecrt_master_receive(master);
        ecrt_domain_process(domain1);
        ...
        ecrt_domain_queue(domain1);
        sync_distributed_clocks();
        ecrt_master_send(master);
}

In the function sync_distributed_clocks() there are three calls:
    ecrt_master_application_time(master, dc_time_ns);
    ecrt_master_sync_reference_clock(master);
    ecrt_master_sync_slave_clocks(master);

And here are what I've found:

1. I must call ecrt_master_application_time() once somewhere before my run
loop, otherwise I'll get error "No app_time received up to now", and
ec_fsm_master_state_dc_read_offset will not be executed. The app time given
to the ecrt_master_application_time() at this point is not important, it
will not be used anywhere. Calling the function
ecrt_master_application_time(master, dummy_time) can avoid
master->has_app_time = 0 in ec_fsm_master_enter_write_system_times().

2. The first app_time *dc_time_ns* given to the
ecrt_master_application_time() within the sync_distributed_clocks() in my
run loop will be used as slave->master->app_time in the function
ec_fsm_master_dc_offset.

3. In ec_fsm_master_dc_offset, there is a variable 'correction', which is
somehow always equal to my fsm interval(4000000 ns on my master). I believe
this correction should be the time interval since the last read. It is
calculated using the variable jiffies_since_read.
*jiffies_since_read = jiffies - datagram->jiffies_sent;*
jiffies is the current time, datagram->jiffies_sent is the time of the last
call ecrt_master_send(master).

The system_time from the ref_sync_datagram is the time of my ref slave. The
following line calculates the current system_time of the ref slave clock.
*system_time += correction;*

Then the time_diff is calcuated as following
*time_diff = fsm->slave->master->app_time - system_time;*

The problem here however is that the fsm->slave->master->app_time is not
the current app_time of the master. It is the app time set in
ecrt_master_application_time() just before ecrt_master_send(master). So the
current app_time of master should also be approximately calculated as
fsm->slave->master->app_time + correction.

So the correct calculation for the current time_diff should be
time_diff = fsm->slave->master->app_time + correction - system_time;

Now the question is, why bother adding the correction(time interval since
read) to the system_time, as the app_time was set by the user at about the
same time when the system_time was read?

After commenting out the line '*system_time += correction;', *I have now
very small time diff between the master and the slave after restart, and
the ec_fsm_slave_config_state_dc_sync_check goes way faster because the
initial difference becomes small.

I hope this mail could also help those having the warning 'Slave did not
sync after 5000 ms'.

Regards,
Jun



diff -r 9cdd7669dc0b master/fsm_master.c
--- a/master/fsm_master.c    Thu Jan 10 17:36:41 2013 +0100
+++ b/master/fsm_master.c    Wed Jan 23 17:09:51 2013 +0100
@@ -976,7 +976,7 @@

     // correct read system time by elapsed time since read operation
     correction = jiffies_since_read * 1000 / HZ * 1000000;
-    system_time32 += correction;
+//    system_time32 += correction;
     time_diff = (u32) slave->master->app_time - system_time32;

     EC_SLAVE_DBG(slave, 1, "DC 32 bit system time offset calculation:"
@@ -1013,7 +1013,7 @@

     // correct read system time by elapsed time since read operation
     correction = (u64) (jiffies_since_read * 1000 / HZ) * 1000000;
-    system_time += correction;
+//    system_time += correction;
     time_diff = fsm->slave->master->app_time - system_time;

     EC_SLAVE_DBG(slave, 1, "DC 64 bit system time offset calculation:"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20130123/114fe2b3/attachment-0003.htm>


More information about the Etherlab-users mailing list