Hello Florian,<br><br>I think there might be a bug in the calculation of time_diff between the master clock and the slave clock in fsm_master.c. My temporary solution is to comment out the line 'system_time32 += correction;' in the function ec_fsm_master_dc_offset32() and the line 'system_time += correction;' in ec_fsm_master_dc_offset64().<br>
<br>Here is my story. I've been testing the etherlab master on Xenomai. After several times restart of my master process, I occasionally found something interesting: the DC sync between the master and the slave always takes a long time (approx. 5 second) by each start, and it seems by each time the time diff between the master and the slave is always approx. -4000000 ns. <br>
<a href="tel:%5B16165.453658" value="+16165453658" target="_blank">[16165.453658</a>] EtherCAT DEBUG 0-0: DC 32 bit system time offset calculation: system_time=xxxxxxxxxxx (corrected with 4000000), app_time=xxxxxxxxxx, diff=-3954618<br>
<br>And the checking synchrony will be started with a difference around that diff time.<br>
<a href="tel:%5B16165.521874" value="+16165521874" target="_blank">[16165.521874</a>] EtherCAT DEBUG 0-0: Checking for synchrony.<br><a href="tel:%5B16165.529891" value="+16165529891" target="_blank">[16165.529891</a>] EtherCAT DEBUG 0-0: Sync after 4 ms: 3941732 ns<br>
<br>This is odd. Since the master has synchronized with the slave just several seconds ago before the restart, their clocks should not have 4 ms difference in such a short time. And it is strange that the value is always around 4000000 ns. So I digged into the master's source code to find out the reason.<br>
<br>I have etherlab master rev 2498:9cdd7669dc0b@stable-1.5 on Xenomai 2.6.1, and my rt task is something like the following:<br><br>ecrt_master_application_time(master, dummy_time);<br><br>while(true) { // run loop<br> wait_period();<br>
ecrt_master_receive(master);<br> ecrt_domain_process(domain1);<br> ...<br> ecrt_domain_queue(domain1);<br>
sync_distributed_clocks();<br> ecrt_master_send(master);<br>}<br><br>In the function sync_distributed_clocks() there are three calls:<br> ecrt_master_application_time(master, dc_time_ns);<br> ecrt_master_sync_reference_clock(master);<br>
ecrt_master_sync_slave_clocks(master);<br><br>And here are what I've found:<br><br>1. I must call ecrt_master_application_time() once somewhere before my run loop, otherwise I'll get error "No app_time received up to now", and ec_fsm_master_state_dc_read_offset will not be executed. The app time given to the ecrt_master_application_time() at this point is not important, it will not be used anywhere. Calling the function ecrt_master_application_time(master, dummy_time) can avoid master->has_app_time = 0 in ec_fsm_master_enter_write_system_times().<br>
<br>2. The first app_time <i>dc_time_ns</i> given to the ecrt_master_application_time() within the sync_distributed_clocks() in my run loop will be used as slave->master->app_time in the function ec_fsm_master_dc_offset. <br>
<br>3. In ec_fsm_master_dc_offset, there is a variable 'correction', which is somehow always equal to my fsm interval(4000000 ns on my master). I believe this correction should be the time interval since the last read. It is calculated using the variable jiffies_since_read.<br>
<i>jiffies_since_read = jiffies - datagram->jiffies_sent;</i><br>jiffies is the current time, datagram->jiffies_sent is the time of the last call ecrt_master_send(master).<br><br>The system_time from the ref_sync_datagram is the time of my ref slave. The following line calculates the current system_time of the ref slave clock.<br>
<i>system_time += correction;</i><br><br>Then the time_diff is calcuated as following<br><i>time_diff = fsm->slave->master->app_time - system_time;</i><br><br>The problem here however is that the fsm->slave->master->app_time is not the current app_time of the master. It is the app time set in ecrt_master_application_time() just before ecrt_master_send(master). So the current app_time of master should also be approximately calculated as<br>
fsm->slave->master->app_time + correction.<br><br>So the correct calculation for the current time_diff should be<br>time_diff = fsm->slave->master->app_time + correction - system_time;<br>
<br>Now the question is, why bother adding the correction(time interval since read) to the system_time, as the app_time was set by the user at about the same time when the system_time was read?<br><br>After commenting out the line '<i>system_time += correction;', </i>I have now very small time diff between the master and the slave after restart, and the ec_fsm_slave_config_state_dc_sync_check goes way faster because the initial difference becomes small.<br>
<br>I hope this mail could also help those having the warning 'Slave did not sync after 5000 ms'.<br><br>Regards,<br>Jun<br><br><br><br>diff -r 9cdd7669dc0b master/fsm_master.c<br>--- a/master/fsm_master.c Thu Jan 10 17:36:41 2013 +0100<br>
+++ b/master/fsm_master.c Wed Jan 23 17:09:51 2013 +0100<br>@@ -976,7 +976,7 @@<br> <br> // correct read system time by elapsed time since read operation<br> correction = jiffies_since_read * 1000 / HZ * 1000000;<br>
- system_time32 += correction;<br>+// system_time32 += correction;<br> time_diff = (u32) slave->master->app_time - system_time32;<br> <br> EC_SLAVE_DBG(slave, 1, "DC 32 bit system time offset calculation:"<br>
@@ -1013,7 +1013,7 @@<br> <br> // correct read system time by elapsed time since read operation<br> correction = (u64) (jiffies_since_read * 1000 / HZ) * 1000000;<br>- system_time += correction;<br>+// system_time += correction;<br>
time_diff = fsm->slave->master->app_time - system_time;<br> <br> EC_SLAVE_DBG(slave, 1, "DC 64 bit system time offset calculation:"