<div dir="ltr"><div><div><div>Hey Jun<br><br></div>If you look deep into the Documentation you notice that they say that dc sync is influenced by the number of packets sent. Here are my benchmarks<br>for a 6 slaves system when powering it up.<br>


<br>

1. 1ms sent_interval for Op_thread with 4ms transmit interval  . 33 seconds <br></div><div>2. 500us sent_interval for op_therad with 500us transmit interval. 6 seconds<br></div><div>4. 100us sent_interval for op_thread with 100us transmit interval. 2.5 seconds.<br>


<br></div><div>I believe the reason is the accuracy which is better in shorter intervals.<br><br><br></div><div><br>

</div></div><div> </div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Jan 1, 2014 at 11:19 PM, Jun Yuan <span dir="ltr"><<a href="mailto:j.yuan@rtleaders.com" target="_blank">j.yuan@rtleaders.com</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Raz,<br>

<br>

there have been many people raised the same kind of questions like you<br>

did. Some of them asked in the mailing list, some of them wrote to me<br>

directly, worrying about those warnings like slave didn't sync after 5<br>

seconds. For the past two years, I kept answering, that I didn't know<br>

about the DC sync mechanism very much, that by examining the register<br>

0x092c, it can be confirmed the DCs get perfectly synchronized in the<br>

end anyway, that my customers could get used to obey my rules that<br>

they must wait several minutes doing nothing until the DCs on the<br>

EtherCAT bus get synchronized/converged, that maybe it is the slaves’<br>

fault to have such a slow convergence for their DC.<br>

<br>

Frankly speaking, I hate my answers, they are like excuses. So I<br>

decided to fight them back, and took some time digging into this<br>

problem for the last two days.<br>

<br>

The first thing to do would be learning how the DC sync mechanism<br>

works. I don't have any official EtherCAT documents, and would be<br>

appreciate if anyone could send me some of the specifications from<br>

EtherCAT. On the internet I did find a paper "On the Accuracy of the<br>

Distributed Clock Mechanism in EtherCAT" and a PPT "Accurate<br>

Synchronization of EtherCAT Systems Using Distributed Clocks" by<br>

Joseph E Stubbs. Those two files helped me a lot.<br>

<br>

The other obstacle is, I don't have any EtherCAT slave devices at<br>

hand. Occasionally I receive a project to develop an interface for a<br>

new sort of slaves using EtherLab Master. Those slaves usually stay<br>

with me for about two to three weeks, and after that, they will be<br>

shipped with my software to our customers. The chance to have a slave<br>

in my office is 1/12, not to mention the deadline pressure from those<br>

projects. I remember I still owe Florian an apology, as he once asked<br>

me to test a new feature of the master, but since then I haven't given<br>

him a reply, because I've been waiting for a slave, expecting that the<br>

next opportunity to have a slave will come soon, but this didn't<br>

happen. So I am lack of a testing environment, which could make my<br>

vision of EtherCAT quite narrowed, and I can’t test my thoughts<br>

myself.<br>

<br>

Alright, here is something I would like to share.<br>

<br>

I. The problem with "No app_time received up to now, but master already active."<br>

I've been always having this error if I don't call<br>

ecrt_master_application_time() before my realtime cycle loop. I've<br>

also tried giving a garbage value to the first call of this function<br>

outside my loop, and it didn't hurt my system at all. This phenomenon<br>

was recored in my last mails to the mailing list, and the reply from<br>

Florian is, I shouldn't do that. Well, he is right, because in the<br>

first call, the app_time will be saved as app_start_time, and then be<br>

used to calculate the "remainder" correction to the DC start time. By<br>

calling ecrt_master_application_time() prior to the cycle loop, we<br>

will give a wrong starting point for DC cyclic operation on the slave.<br>

I think the end effect will be something like we play with the<br>

sync0->shift_time, that is, set a shift time to the DC sync0. Although<br>

this won't hurt us for the most of time, it is not the right way to do<br>

so.<br>

<br>

Where does this warning come from?<br>

When a master application is running, there would be two threads in<br>

the system. One is the user realtime cycle loop, the other is the<br>

EtherCAT-OP thread. These two thread however, are not synchronized<br>

with each other.<br>

<br>

After calling ecrt_master_activate(), the master goes into<br>

ec_master_operation_thread, which execute further the FSM(finite state<br>

machine) of the master repeatedly. The cycle time of the EtherCAT-OP<br>

thread on my machine is 4ms, my linux kernel is running at 250Hz. And<br>

the function ec_fsm_master_enter_write_system_times will get called<br>

after several ms, which could be something around 4 to 8 ms, I guess.<br>

<br>

If the ecrt_master_application_time() is not be called within that<br>

time, the master would fail to have a app_time in time, and such an<br>

error "No app_time" would occur.<br>

<br>

In my case, my realtime thread happens to have a cycle time of 4ms.<br>

And since my loop is like<br>

<br>

// first doing some initialization job, which costs 10ms<br>

while () {<br>

    wait_for_4_ms();<br>

    master_receive();<br>

    ...<br>

    master_application_time()<br>

    master_send();<br>

}<br>

<br>

This means, after ecrt_master_activate(), there would be at least 14ms<br>

passed away before the first master_application_time() in my loop get<br>

called. The chance for me to have a "No app_time" warning is<br>

reasonable quite high.<br>

<br>

To resolve this problem properly, I can offer two options:<br>

<br>

The first option is to change your code: Reduce the initialization<br>

time, making the time interval between master_activate() and your<br>

cycle loop as small as possible.<br>

<br>

But what if we have a large cycle time, say 16ms? Our cycle loop will<br>

wait 16 ms anyway before the first master_application_time() get<br>

called, which could be too late  for the EtherCAT-OP thread. So my<br>

second option is, to change the code of EtherCAT master. And the<br>

simplest way for me to do so, is to add a "return;" after the line<br>

            EC_MASTER_WARN(master, "No app_time received up to now,"<br>

                    " but master already active.\n");<br>

in master/fsm_master.c. This would force the master FSM to wait until<br>

it has got an app_time.<br>

<br>

Note that I don't have the possibility to do the test. So please<br>

change your etherlab master code, check it out on your system, and<br>

give everybody a feedback if it works.<br>

<br>

<br>

II. The problem with "Slave did not sync after 5000 ms"<br>

This is a little bit more complicated. In short, IMHO, it is the<br>

master who should take the responsibility to this problem.<br>

<br>

Concerning the DC sync, there are 3 phases.<br>

Phase 1. Measure the transmission delays t_delay to each slave.<br>

Phase 2. Calculate the system time offset t_offset for each slave.<br>

Phase 3. Drift compensation, where the slave will adjust their local<br>

DC to have dt = (t_local + t_offset - t_delay) -<br>

t_received_system_time go to 0.<br>

<br>

The first phase will be executed during the bus scanning in the<br>

function ec_fsm_master_state_scan_slave() -> ec_master_calc_dc() -><br>

ec_master_calc_transmission_delays() -> ec_slave_calc_port_delays().<br>

It seems that the EtherLab master measure this for only once. Well we<br>

could argue that, measuring the transmission delay for several times<br>

and get its average could generate a better estimation. Until now, my<br>

experiences tell me these values don’t vary much, and it seems the<br>

EtherLab master is doing good. But I will be appreciate if anyone<br>

would like to do the „bus rescan“ thing many times on the same set of<br>

EtherCAT bus, check if the delay_to_next_dc of all the slaves change<br>

too much each times of the bus scan. If it is so, changes must be made<br>

to have several measurements instead of only one in the source of<br>

etherLab master.<br>

<br>

At the beginning of the year 2013, I encountered a phenomenon, which<br>

has been written in my last emails, when I tried to correct it but<br>

failed in the end. This phenomenon in my observation one year ago, is<br>

that, after the bus has reached a stable state for all the DCs, a<br>

restart of the master application would cause a wrongly change of<br>

approx. 4ms to the system_time_offset of the ref clock, and later  the<br>

ec_fsm_slave_config_state_dc_sync_check() of the ref slave shows that<br>

there are around 4ms errors between the master clock to the slave<br>

clock at the beginning. This certainly demonstrates the weakness of<br>

the current EtherLab master in the second phase, that the calculation<br>

of the t_offset is not alright.<br>

<br>

Since the t_offset is given wrongly to the slaves by the master, the<br>

difference dt = (t_local + t_offset - t_delay) -<br>

t_received_system_time for the drift compensation becomes too large at<br>

its beginning. In my humble opinion, the EtherLab master might have<br>

abused the functionality of the drift compensation mechanism to<br>

compensate its failure in the accurate calculation of the system time<br>

offset t_offset.<br>

<br>

What is the matter with the time offset?<br>

Let’s have look at the procedure of time offset calculation:<br>

1. The master FSM prepares a ec_datagram_fprd(fsm->datagram,<br>

fsm->slave->station_address,                    0x0910, 24) to read<br>

out the system time of the slave.<br>

2. The user realtime cycle loop sends out the datagram while calling<br>

ecrt_master_send.<br>

3. The next ecrt_master_receive fetches the answer.<br>

4. The master FSM read the datagram and calculate the time offset.<br>

<br>

Take an example, we have a master FMS EtherCAT-OP thread running in a<br>

loop of 4ms, and a user realtime application thread running at 1ms.<br>

Let’s define the time the step 1 happens is x ms. And the user loop<br>

runs 0.5ms after the EtherCAT-OP.<br>

<br>

The following would happen:<br>

Time : Event<br>

x    ms: Step 1, FSM prepares an FPRD datagram to 0x0910<br>

x+0.5ms: Step 2, user loop sets a new app_time; the FPRD datagram gets<br>

sent out, the sending timestamp jiffies is stored in<br>

datagram->jiffies_sent;<br>

x+1.5ms: Step 3, user loop sets a new app_time; the datagram is<br>

received, the receiving timestamp jiffies is stored in<br>

datagram->jiffies_received;<br>

x+2.5ms: user loop sets a new app_time;<br>

x+3.5ms: user loop sets a new app_time;<br>

x+4  ms: Step 4, FSM calculate the time offset.<br>

<br>

And here is the source code in ec_fsm_master_dc_offset64()<br>

<br>

    // correct read system time by elapsed time since read operation<br>

    correction = (u64) (jiffies_since_read * 1000 / HZ) * 1000000;<br>

    system_time += correction;<br>

    time_diff = fsm->slave->master->app_time - system_time;<br>

<br>

The jiffies is a counter in Linux kernel which get increased by 1 in a<br>

frequency defined by HZ. I have a 250 Hz linux system, so the 1<br>

jiffies means 4 ms. As jiffies_sent was taken when the master clock is<br>

x+0.5ms, and the current jiffies value is taken at x+4ms. We have a<br>

possibility of 0.5/4 = 12.5% that the jiffies don’t increase itself<br>

during that 3.5ms time, and 87.5% possibility that the jiffies has<br>

been increased by 1. This means the value „correction“ would have a<br>

typical value of 4000000ns, occasionally being 0 ns.<br>

<br>

Let’s assume that the slave DC has been perfectly synchronized with<br>

the master app time. So the system_time from the slave equals to<br>

0.5ms(the time the FPRD datagram was sent). With correction added,<br>

system_time = x+4.5ms or x+0.5ms.<br>

<br>

The app_time is x+3.5ms at the time of the Step 4..<br>

<br>

time_diff = app_time - system_time = -1000000ns for the most of the<br>

time, and around 2000000ns occasionally, depending on the correction .<br>

<br>

See, the time_diff should actually be 0, not -1ms or 2ms, as we said,<br>

the slave DC is perfectly synchronized with the master app time.<br>

<br>

You may argue that the -1ms error isn’t that too much, but this error<br>

will typically goes to around -4ms if the user realtime cycle loop is<br>

running every 4ms, as in my case one year ago.<br>

<br>

Where comes the error in the calculation?<br>

Two reasons:<br>

1. jiffies have a bad resolution of 4ms in a linux system of 250Hz.<br>

2. app_time is not the time when Step 4 is executed.<br>

<br>

While using get_cycles() instead of jiffies could be able to improve<br>

the accuracy of the correction, the fact that app_time is not the<br>

current master system time would still drags errors into time offset.<br>

<br>

Why do we need "correction" here at all? Because the app_time in Step<br>

4 is not the app_time of the slave system time reading.<br>

<br>

The key is to have the correct app_time the FPRD datagram 0x0910 is<br>

sent, and use that app_time to calculate the time_diff, without any<br>

correction any more of course.<br>

<br>

I know, it is easier said than done. Right now I have two ideas for the master.<br>

The first idea: add a new variable app_time_sent to the ec_datagram_t<br>

struct. write down the app_time when each datagram get sent. time_diff<br>

= datagram->app_time_sent - system_time(0x0910);<br>

<br>

The second solution is a little bit tricky: triggers the calculation<br>

by the user realtime cycle loop. i.e. we may check the fsm_datagram in<br>

ecrt_master_receive() or even in ecrt_master_application_time() when<br>

the last app_time is still there. If we find out it is a FPRD 0x0910<br>

datagram, we do the calculation right away using the old app_time.<br>

<br>

I think the first idea would be easier to implement.<br>

<br>

<br>

Besides the inaccurate calculation of the time offset, the other issue<br>

in the EtherLab master that bothers me is, it seems to me that the<br>

drift compensation is working at the same time when the new system<br>

time offset is<br>

calculated and sent to the slaves, as the drift compensation is in the<br>

user realtime cycle loop and the t_offset calculation is the<br>

EtherCAT-OP. Shouldn’t we get the offset calculation be done first,<br>

before sending ref_sync_datagram to the ref clock and sync_datagram to<br>

the other slaves? Won’t the drift compensation algorithm of the slaves<br>

have any effects on its local DC time (by slowing or fastening the<br>

clock), which then effects the t_offset calculation? Since phase 2 and<br>

3 happens simultaneously, won’t the sudden change of the<br>

t_offset(which causes a sudden change of dt) causes some sort of<br>

disturbance to the drift compensation algorithm on the slave?<br>

<br>

I think we may need a boolean, set by the FSM to tell the user thread<br>

whether phase 2 is done, the user thread only calls<br>

ecrt_master_sync_reference_clock(master) and<br>

ecrt_master_sync_slave_clocks(master) when the correct system time<br>

offset for each slaves have been sent to the slaves.<br>

<br>

<br>

<br>

Sorry to have written such a long email, I hope I’ve made my thoughts<br>

clear.  I could be wrong in many different places, I’ll be very happy<br>

if somebody could change the EtherLab master code the way as I<br>

mentioned and test it for me.<br>

<br>

<br>

Wish all of you a Happy New Year!<br>

<br>

Jun<br>

<div><div><br>

On Mon, Dec 30, 2013 at 2:32 PM, Raz <<a href="mailto:raziebe@gmail.com" target="_blank">raziebe@gmail.com</a>> wrote:<br>

> Hey<br>

><br>

> At the moment it takes a long time to calibrate the dc. aprox 5 seconds<br>

> for each slave.  I am setting up a system which is supposed to control<br>

> over 12 axes and the calibration duration reaches a minute.<br>

><br>

> Is it possible to reduce this time ?<br>

><br>

><br>

> --<br>

> <a href="https://sites.google.com/site/ironspeedlinux/" target="_blank">https://sites.google.com/site/ironspeedlinux/</a><br>

><br>

</div></div>> _______________________________________________<br>

> etherlab-users mailing list<br>

> <a href="mailto:etherlab-users@etherlab.org" target="_blank">etherlab-users@etherlab.org</a><br>

> <a href="http://lists.etherlab.org/mailman/listinfo/etherlab-users" target="_blank">http://lists.etherlab.org/mailman/listinfo/etherlab-users</a><br>

><br>

<br>

<br>

<br>

--<br>

Jun Yuan<br>

[Aussprache: Djün Üän]<br>

<br>

Robotics Technology Leaders GmbH<br>

Am Loferfeld 58, D-81249 München<br>

Tel: <a href="tel:%2B49%2089%20189%200465%2024" value="+4989189046524" target="_blank">+49 89 189 0465 24</a><br>

Mobile: <a href="tel:%2B49%20176%202176%205238" value="+4917621765238" target="_blank">+49 176 2176 5238</a><br>

Fax: <a href="tel:%2B49%2089%20189%200465%2011" value="+4989189046511" target="_blank">+49 89 189 0465 11</a><br>

mailto: <a href="mailto:j.yuan@rtleaders.com" target="_blank">j.yuan@rtleaders.com</a><br>

<br>

Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten<br>

y, j, q, und x wird u als ü ausgesprochen, z.B. yu => ü,  ju => dschü,<br>

 qu => tschü,  xu => schü.<br>

</blockquote></div><br><br clear="all"><br>-- <br><div dir="ltr"><div><a href="https://sites.google.com/site/ironspeedlinux/" target="_blank">https://sites.google.com/site/ironspeedlinux/</a></div></div>

</div></div>