[etherlab-users] DC-Synchronization - Sync signal generation

Wed Apr 9 10:31:05 CEST 2014

Hi Gavin,

On Wed, Apr 9, 2014 at 3:40 AM, Gavin Lambert <gavinl at compacsort.com> wrote:

> On 8 April 2014, quoth Jun Yuan:
> > Thank you so much for the test. I am sorry for the formatting error.
> > I should test it before send it out but I didn't have the chance.
>
> I think this was actually from the original code, not your changes.
>

well, it is my fault trying to log the value with sign by changing the
original code
            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10u ns\n",
                    diff_ms, abs_sync_diff);
to
            EC_SLAVE_DBG(slave, 1, "Sync after %4lu ms: %10u ns\n",
                    diff_ms, EC_READ_U32(datagram->data) & 0x80000000 ?
-abs_sync_diff: abs_sync_diff);
and clearly I forgot to change the formatting accordingly :P

> Fair enough.  Although in that case I wonder why this isn't more of a
> widespread problem (or why it hasn't been solved somehow if it is).  The
> slave isn't doing anything weird, it's just using whatever default timing
> behaviours are built in.  And as I said according to the debug output the
> timing actually seemed to be diverging, which it definitely shouldn't be
> doing.  I wonder if there's some extra register that's supposed to be set
> during initial setup to help it along?  Some way to tell it "the network's
> restarting, forget about slowly adjusting the clock and just step it
> directly to this value".
>

Yes, I also observed that the converging curve is somehow zig-zag. I don't
quite understand why the difference sometimes diverges neither. But thank
God, it always converges to zero when we give it some time.

I'm not familiar with the implementation of the DC Sync on the slaves, but
I think it is effected by some hardware limitations. Our master, on the
other side, has a clock with more resolution, although it might not be so
accurate. And it can have more complicated software solution with control
loop feedback to make the convergence much faster. I also read somewhere
that the slave can only have 3 choices to sync its clock: it increases its
clock either with 9, 10, or 11 ns every 10ns. Stability is more important
than the converge speed.

I think the problem with the reboot is that the register 0x0932 Speed
Counter Diff is reset to zero, and the slave need to find out the proper
value for that again, which takes time. If the master - slave pair will not
be changed, we may read out the value before the reboot of the slave, and
write the old value to the register after the boot. But It's not generally
a good idea.

>
> There's a note in the slave datasheet about resetting the time filters by
> writing to 0x0930.  I can't see anywhere that this is happening at the
> moment, but I might just be missing it.  If not, maybe it should be doing
> this after updating the reference clock slave's time for the first time?
>

According to the EtherCAT documentation, the register 0x0930 Speed Counter
Start is related to the bandwidth of the drift compensation. And when
0x0930 is set by the user (a write access on 0x0930), the time filter
(0x092c and 0x0932) will be reset to zero.

I don't get what you mean above. Maybe we can adjust it to a smaller value
at the beginning to make the convergence faster for the first time after
the reboot, and change it back later to have it more stable. But the change
back will reset the time filter, which is bad for us. Well, I don't know.
It seems complicated, and need many experiments.

> > 2) sync the master to ref slave
> > sorry about the error message. It is because I let the
> > ecrt_master_reference_clock_time return EAGAIN error when the system
> > time offset has not been calculated yet. I tried to avoid the error
> > message in lib/master.c, but apparently I didn't do it right. I'll
> > fix it ASAP.
>
> That part worked; the issue is that if a reference clock is not explicitly
> selected (eg. ecrt_select_master_reference_clock(master, NULL) or not
> called) then calls to get the time will return ENXIO until later in the
> startup cycle when it actually finds a clock to use.  And this causes the
> user library to generate spam.  (The library error output seems like a
> misfeature to me, especially for functions intended to be called from
> realtime code that probably doesn't want to go near fprintf.  But that's an
> entirely separate topic.)
>

Thanks for pointing out it's a ENXIO error, not a EAGAIN error. So it's not
my fault. And I can‘t agree with you more on the realtime issue with the
fprintf in the userspace lib. I did remove those in my code, but I didn't
commit it into the repository yet.

>
> Incidentally, I tried this userland code with the original 1.5.2 master
> code (with modification to lib/master.c to avoid spam) and it produced the
> same results with one slave -- almost instantly locking in the timing.  I
> assume for the differences to become apparent, more slaves are required.
>
> Well, could you try it with a 4ms cycle loop? It should be more apparently
when the cycle time become larger.

Regards,
Jun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20140409/6a9be693/attachment-0004.htm>