[Etherlab-users] DC synchronization demo about etherlab master

Tue May 6 01:00:14 CEST 2025

Hi Circle,

Re #1)

  *   ecrt_master_application_time() stores the PC time in the master.
  *   The ref slave clock is set to the masters time (plus its transmission delay) on activation.
  *   ecrt_master_sync_slave_clocks() syncs subsequent slaves clocks to the ref slave, and returns the ref slave time.
  *   ecrt_master_reference_clock_time() gets the ref slaves time “slaveTime” (minus its transmission delay) from the previous ecrt_master_sync_slave_clocks()  call, so returns the slave time, at the time of the send, of the previous send
  *   “ecMaster->m_dcTime” caches the time sent to ecrt_master_application_time() of the previous send

So we can compare the difference between “(uint32_t)ecMaster->m_dcTime” and “slaveTime” and if there’s no drift or jitter between the master and slave clocks we should get a value of zero.  The rest of the code is attempting to filter out the jitter and calculate a drift compensation.  Note: we are comparing the lower 32 bits of the times.

“why are we using pc time - reference time to caculate m_dcDiff (m_dcTime -slaveTime, even it's named by m_dcTime, i think it's still a term of pc time)?”:
if there’s no drift and the ref slave clock has been set to the master clock on activation then the master and slave time should match.  If there’s drift, we need to adjust the master time to compensate.  (So it is comparing the PC clock time to the slave time to figure out the drift.)

“However time of SYNC0(0x990) is changing quite regularly”:
if dc is enabled on the slave it is incremented by the cycle period every cycle (by the slave).  If it’s a 32bit dc clock the time value rolls over every 4.2second odd, if it’s a 64bit dc clock you see the whole time value.  For DC to remain synced (and enable dc sync0) it only needs a 32bit dc clock, but if you want proper timestamping of events, you need the 64bit clocks.

“0x990 is always like xxxxxxxxxx500000 (500000 is sync0_shift), why don't we try to make slaveTime to, like 0-phase-aligned”:
Once dc is set up sync0 0x990 on the slave just keeps ticking over based on the slaves clock.  The slaves clock is synced to the ref slaves clock.  Your application can choose when sync0 occurs on the slave with respect to the cycle but has no other control over it.  However your application must call ecrt_master_send() once every cycle so that the frames reach the slave before the cycles sync0 time is triggered.  Other than that you can choose when to call ecrt_master_send().  In your realtime cycle you can choose when to wake up and perform your calculations.  That wakeup event it triggered in relation to the PC’s clock.  So to wake up in relation to the ref slaves clock, you need to sync your application (PC) time to the ref slaves time.

“BTW, is the write cmd to register 0x990 only sent once at the beginning”:
yes

Re #2)
“I didn't get the "first master diff" in syslog”:
I output all my app messages to the syslog via “rtai_lxrt(BIDX, SIZARG, PRINTK, &arg);”.  Via std out is fine too.

“And it's quite big”:
It should be small (within max jitter or so).  If it’s big to start with, it indicates the initial master time is not being set correctly in the ref slave on master activation.  If it becomes big then your drift compensation isn’t working.

“I don't know why you are saying m_dcDiff should be around 0”:
As per above, m_dcDiff is the difference between the time master time when the frame was sent and the time at the ref slave.  If the initial slave time is set up correctly and you have no drift between the master and slave clocks (or account for the drift) then m_dcDiff should jitter around 0.

Looking at your logs:

  *   Your syslog logs are showing unmatched and corrupted frames.  You need to sort that out first.  Try contact cleaner on the RJ45 / EBus connections.  Also, try higher quality shielded twisted pair cables.  You need to get it to zero comms errors.
  *   You need to get wireshark logs that include the response.  You could use the “ethercat pcap” command if you have patch “features\pcap\0001-pcap-logging.patch”.  You could also install a physical switch between your master and the first slave and use another computer as the sniffer (with all protocols disabled on that devices eth port).  Because EtherCAT is a broadcast frame you shouldn’t need to do anything special with the switch.
  *   Your crc log is a little weird.  There’s no crc or physical errors, but there are a lot of forwarded errors (more than the max count).  Maybe there’s problems on the master to first slave link.

Graeme.

> From: Circle Fang circlefang at live.com<mailto:circlefang at live.com>
> Sent: Saturday, 3 May 2025 20:10
> To: Graeme Foot Graeme.Foot at touchcut.com<mailto:Graeme.Foot at touchcut.com>
> Cc: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
> Subject: 回复: DC synchronization demo about etherlab master
>
> Hi Graeme,
> Basically I didn't figure out the following two things yet.

  1.  why are we using pc time - reference time to caculate m_dcDiff (m_dcTime -slaveTime, even it's named by m_dcTime, i think it's still a term of pc time)? I think they could drift together, like both delayed for hundreds of microseconds, since reference time(slaveTime) is strong-related to pc time of ecrt_master_send (get m_dcTime  then call ecrt_master_send).  However time of SYNC0(0x990) is changing quite regularlly. so in this way even m_dcDiff is small enough, dc sync error may still occurs becuase both m_dcTime and slaveTime may go beyond the time of 0x990. And I'm wondering why not try to sync master time to SYNC0 (probably with a sync0_shift interval). In my test, when i watch the 64bit of 0x990, it is just sync0_shift-phase-aligned, I mean 0x990 is always like xxxxxxxxxx500000 (500000 is sync0_shift), why don't we try to make slaveTime to, like 0-phase-aligned(it could be a little complicated since slaveTime is 32-bit, and i still don't figure out how to do this yet, maybe change ecrt_master_reference_clock_time to 64-bit), which means time of ecrt_master_send and both slaveTime is always drifting around 0-phase of SYNC0/0x990(I mean make 64-bit slaveTime always xxxxxxxxxabcdef wherein abcdef is around 0). BTW, is the write cmd to register 0x990 only sent once at the beginning?

  1.  I didn't get the "first master diff" in syslog, but I do get this from the cmd line which I run the application. And it's quite big. I am not clear about this but I think this is OK. Since the 0-phase-aligned m_dcTimeStart passed into ecrt_master_application_time at the beginning is used to caculate the real dc start time(which written to 0x990) about 100cycles beyond in the future (EC_DC_START_OFFSET=100ms, eventually with sync0_shift phase-aligned), and our first wakeup time is 50 cycles in the future, even it's 0-phase-aligned. I don't know why you are saying m_dcDiff should be around 0.
>
> I did 2 test, the first one is using your way, and the second is using my way(i don't know it's right or not, but m_dcDiff is always drifting around 0). please see attched logs. As for wireshark logs, only frames sent from master are captured. In those test, no motion task, only check the recived datagrams, adjust pc time, and send pdos, so the task is quite light-weight, and MSW(mode switch of xenomai) is always 0. However, from wireshark, sending time is getting odd occasionaly. one more earlier test (not recorded) is quite stable, as it's running for 24 hours without any errors, even motion task is running.
>
> The error of "Failed to get reference clock time"  is something like resource un-available, because i called this even for the first time (no datagram received yet). sometimes in the middle of test for a short time, but no dc sync error occurs. And it's not easy to reproduce.
>
> In addition, SMI, power-saving, and a lot of other features that may affect realtime task, are disabled. fixed cpu frequency is also set. and /proc/xenomai/stat is ok(no MSW).
>
> It seems my problem has nothing to do with problems by dc patches as mentioned before, since i don't see the difference before/after patches applied. 
>
> I am sorry this cost you so much time, and I am really really grateful about this.
>
> Best Regards,
> Circle   
>
________________________________
发件人: Graeme Foot <Graeme.Foot at touchcut.com<mailto:Graeme.Foot at touchcut.com>>
发送时间: 2025年5月2日 4:21
收件人: Circle Fang <circlefang at live.com<mailto:circlefang at live.com>>
抄送: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org> <etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>>
主题: RE: DC synchronization demo about etherlab master

Hi Circle,

m_dcDiff should jitter around zero.  The previous slave time to current slave time should jitter around your period (e.g. 1ms).  The PC clock total adjustment should drift by an approximate constant amount over time.  It can drift at slightly slower or faster rates over time due to electronics issues (such as thermal changes etc.)

The master needs to account for the time drift between the ref slave and PC clock so that it interacts with the fieldbus in the fieldbuses timeframe.  Looking at 0x92C of subsequent slaves won’t help as that is their syncing to the ref slave.  We are dealing with the master syncing to the ref slave.

If you enable “ethercat debug 1” and start your app you should see in the logging (where main-1 is the ref slave number):

  *   Using slave main-1 as DC reference clock
  *   DEBUG 0-main-1: Checking system time offset.
  *   DEBUG 0-main-1: DC 64 bit system time offset calculation: system_time=50278625750, app_time=42973023400, diff=-7305602350
  *   DEBUG 0-main-1: Setting time offset to 18446744066403949266 (was 0)
  *   first master diff: -609.

The first master diff should be quite small (within the jitter range).

You shouldn’t be getting any "Failed to get reference clock time" messages.  What is the error number that is output with it?

I’m starting to think you may be getting comms errors or something.  Can you send the kernel log messages (e.g. dmesg / journalctrl -k) (with “ethercat debug 1” set before startup) and maybe the wireshark logs.

Check for comms error using the “ethercat crc” command, check for unmatched datagram errors in the system logs and potentially check the wireshark logs for mismatched frames around the time of your errors.

Also check that you have the CPU speed stepping (dynamic frequency scaling) turned off in the kernel configuration options.  That can cause problems with the timestamp clock.

If you have an intel CPU you may need to disable the SMI interrupt.

Also, what are the Yaskawa sync errors you are getting?  Are the A12 errors?  If so these generally only occur (by default) after three missed PDO’s in a row.  You don’t generally get those alarms when you are just having drifting errors.  Check the wireshark logs around the time of the error and check the reference clock times to see when the master is sending the frames.

Under RTAI you need to have the RTDM interface enabled to allow realtime calls from the user space into the masters kernel space.  RTAI will allow you to make non-realtime syscalls, but you will lose hard realtime while the syscall is occurring.  You can check for any lost hard realtime events using the “/cat /proc/rtai/scheduler” command.  I don’t know what the equivalent in Xenomi is.

Regards,

Graeme.

> From: Circle Fang circlefang at live.com<mailto:circlefang at live.com>
> Sent: Friday, 2 May 2025 05:33
> To: Graeme Foot Graeme.Foot at touchcut.com<mailto:Graeme.Foot at touchcut.com>
> Cc: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>
> Subject: 回复: DC synchronization demo about etherlab master

>

> Hi Graeme,

>

> I think I was wrong about option a and b. I should compare reference clock time "slaveTime" with ideal values( i.e.,  initial slaveTime + cycle_counters*cycle_ns); And if that difference soon converges to 0, rather than some big weird values for several consecutive cycles (bigger than "sync error limit" threshold in slave) occasionaly . that will prove master is well synced to reference, right?

>

> I should continue on this bug.

>

> How do u_appTimeBase and m_dcDiff change in your app? is there any patterns?

>

> Many thanks again for your help.

> Best Regards,

> Circle

>

>

________________________________

>发件人: Circle Fang <circlefang at live.com<mailto:circlefang at live.com>>
>发送时间: 2025年5月1日 16:22
>收件人: Graeme Foot <Graeme.Foot at touchcut.com<mailto:Graeme.Foot at touchcut.com>>
>抄送: etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org> <etherlab-users at etherlab.org<mailto:etherlab-users at etherlab.org>>
>主题: 回复: DC synchronization demo about etherlab master

>

> Hi Graeme,

>

> Where and what value should I monitor if I want to check if my master is well synchronized to reference slave or not? I think that value should eventually converges to 0 soon (or maybe some constant value). I used the following 2 options:

> Option a: Initially, I monitor the reference's slaveTime - prev_slaveTime, and it converges to 1000000, meanwhile the jitter of this value is about +- 20 us. I thought this should prove that the master is well synced to reference, since it implies that time of ecrt_master_send is well aligned, but dc sync error still occurs occasionally when app running for several hours.

> Option b: Then I start to monitor m_dcDiff in ecMaster_syncDistClock(since first m_dcDiff  is probably several milliseconds which i don't know why, marked as "fixed", so my m_dcDiff formula is m_dcDiff = (uint32_t)ecMaster->m_dcTime - slaveTime - fixed), and soon eventually it also converges to 0, meanwhile the jitter of m_dcDiff is about several microseconds.

> And I think Option a and b is essentially same. Can I use these values to check synchronization, like 0x92c in other slaves(BTW, 0x92x is no more than 30 nanoseconds usually).

>

> Sometimes I got error from master, "Failed to get reference clock time", even no dc sync error occurs meanwhile.

>

> At last, when I monitor u_appTimeBase, I can see it's increasing(or decreasing) monotonically, like, eventually 1 second per day. Is this normal? ( J1900 cpu and xenomai).

>

> Any ideas/advices are highly appreciated.

> Best Regards,

> Circle

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.etherlab.org/pipermail/etherlab-users/attachments/20250505/85bc6b04/attachment-0001.htm>