[etherlab-users] Problem with distributed clocks in RTAI: "Slave did not sync after 5000 ms" & "No Sync Error"

Mon Oct 22 09:37:02 CEST 2018

I think I'm dealing with two separate problems here:
1- `abs_sync_diff` doesn't converge.
2- After the period specified by `EC_DC_SYNC_WAIT_MS`, something happens
that disrupts the real-time cycle, hence EtherCAT error code 0x002D
<https://infosys.beckhoff.com/english.php?content=../content/1033/ethercatsystem/1072494091.html&id=>:
"No Sync Error".
Best,
Mohsen

On Sat, Oct 20, 2018 at 2:08 PM Mohsen Alizadeh Noghani <
m.alizad3h at gmail.com> wrote:

> Also notable is that the userspace code reaches the 10 microseconds
> threshold after 5.2 seconds, despite having half the update rate of RTAI (1
> kHz vs. 2 kHz).
>
> Best,
> Mohsen
>
> On Sat, Oct 20, 2018 at 1:58 PM Mohsen Alizadeh Noghani <
> m.alizad3h at gmail.com> wrote:
>
>> Update: I increased EC_DC_SYNC_WAIT_MS to 50000 (50 seconds). I also set
>> debug level to 1, "ethercat debug 1" and the closest slave 0 (reference
>> clocks) gets to syncing is after about 48 seconds! ("abs_sync_diff" =
>> approximately 1.156 ms).
>> At this point, the value starts to diverge and ends at 1.77 seconds at
>> the 50 seconds mark.
>> *kernel: [14573.717225] EtherCAT DEBUG 0-0: Sync after 47800 ms:
>> 1156111 ns*
>> .
>> .
>> .
>>
>> *kernel: [14575.919495] EtherCAT DEBUG 0-0: Sync after 49996 ms:
>> 1771539607 ns*
>>
>> *kernel: [14575.923534] EtherCAT WARNING 0-0: Slave did not sync after
>> 50012 ms.*
>>
>> *kernel: [14575.923536] EtherCAT DEBUG 0-0:
>> app_start_time=593344410354840000*
>>
>> *kernel: [14575.923538] EtherCAT DEBUG 0-0:
>>  app_time=593344420279840000*
>>
>> *kernel: [14575.923539] EtherCAT DEBUG 0-0:
>>  start_time=593344420379840000*
>>
>> *kernel: [14575.923540] EtherCAT DEBUG 0-0:     cycle_time=500000*
>>
>> *kernel: [14575.923542] EtherCAT DEBUG 0-0:     shift_time=125000*
>>
>> *kernel: [14575.923543] EtherCAT DEBUG 0-0:      remainder=0*
>>
>> *kernel: [14575.923544] EtherCAT DEBUG 0-0:
>> start=593344420380465000*
>>
>> *kernel: [14575.923545] EtherCAT DEBUG 0-0: Setting DC cyclic operation
>> start time to 593344420380465000.*
>>
>> *kernel: [14575.928611] EtherCAT DEBUG 0-0: Setting DC AssignActivate to
>> 0x0300.*
>>
>> *kernel: [14575.941292] EtherCAT 0: Domain 0: Working counter changed to
>> 3/6.*
>>
>> *kernel: [14576.000500] EtherCAT DEBUG 0-0: Processing register
>> request...*
>>
>> *kernel: [14576.004685] EtherCAT DEBUG 0-0: Register request successful.*
>> *kernel: [14576.050335] EtherCAT DEBUG 0-0: Now in SAFEOP.*
>>
>> - Is there something wrong with the algorithm used for nudging
>> "abs_sync_diff" towards 0?
>> - If so, why does it work perfectly fine in the userspace, but not in
>> RTAI?
>>
>> Best,
>> Mohsen
>>
>> On Fri, Oct 19, 2018 at 1:05 PM Mohsen Alizadeh Noghani <
>> m.alizad3h at gmail.com> wrote:
>>
>>> Hello everyone.
>>> I'm using kernel 3.4.6, RTAI 4.0 and IgH Master 1.5.2.
>>> When running a simple RTAI program
>>> <https://github.com/mohse-n/L7N_EtherLab/blob/master/rtai/rtai_sample.c>
>>> that uses distributed clocks (basically the dc_rtai example), I encounter
>>> the following kernel log:
>>>
>>> *kernel: [ 1891.643677] EtherCAT 0: Link state of ecm0 changed to UP.*
>>> *kernel: [ 1891.647798] EtherCAT 0: 2 slave(s) responding on main
>>> device.*
>>> *kernel: [ 1891.647800] EtherCAT 0: Slave states on main device: PREOP.*
>>> *kernel: [ 1891.647837] EtherCAT 0: Scanning bus.*
>>> *kernel: [ 1892.083268] EtherCAT 0: Bus scanning completed in 436 ms.*
>>> *kernel: [ 1892.083271] EtherCAT 0: Using slave 0 as DC reference clock.*
>>> *kernel: [ 1906.700138] EtherCAT: Requesting master 0...*
>>> *kernel: [ 1906.700142] EtherCAT: Successfully requested master 0.*
>>> *kernel: [ 1906.700160] EtherCAT 0: Domain0: Logical address 0x00000000,
>>> 24 byte, expected working counter 6.*
>>> *kernel: [ 1906.700161] EtherCAT 0:   Datagram domain0-0-main: Logical
>>> offset 0x00000000, 24 byte, type LRW.*
>>> *kernel: [ 1906.700185] EtherCAT 0: Master thread exited.*
>>> *kernel: [ 1906.700187] EtherCAT 0: Starting EtherCAT-OP thread.*
>>> *kernel: [ 1906.704215] ec_rtai_sample: RT timer started with 3116/3117
>>> ticks.*
>>> *kernel: [ 1906.704218] ec_rtai_sample: Initialized.*
>>> *kernel: [ 1911.935059] EtherCAT WARNING 0-0: Slave did not sync after
>>> 5000 ms.*
>>> *kernel: [ 1911.946039] EtherCAT 0: Domain 0: Working counter changed to
>>> 3/6.*
>>> *kernel: [ 1914.070216] EtherCAT ERROR 0-0: Failed to set OP state,
>>> slave refused state change (SAFEOP + ERROR).*
>>> *kernel: [ 1914.073870] EtherCAT ERROR 0-0: AL status message 0x002D:
>>> "No Sync Error".*
>>> *kernel: [ 1914.081189] EtherCAT 0-0: Acknowledged state SAFEOP.*
>>> *kernel: [ 1919.308375] EtherCAT WARNING 0-1: Slave did not sync after
>>> 5000 ms.*
>>> *kernel: [ 1919.321187] EtherCAT 0: Domain 0: Working counter changed to
>>> 6/6.*
>>> *kernel: [ 1921.449013] EtherCAT ERROR 0-1: Failed to set OP state,
>>> slave refused state change (SAFEOP + ERROR).*
>>> *kernel: [ 1921.452670] EtherCAT ERROR 0-1: AL status message 0x002D:
>>> "No Sync Error".*
>>> *kernel: [ 1921.459991] EtherCAT 0-1: Acknowledged state SAFEOP.*
>>> *kernel: [ 1921.469158] EtherCAT 0: Slave states on main device: SAFEOP.*
>>>
>>> The slaves (servo drives) would give an alarm related to EtherCAT
>>> communication.
>>> Apparently, the slaves are unable to sync after 5 seconds. But why?
>>> (Note: I have tested the distributed clocks example in userspace and it
>>> works, so I don't think the issue is from the slaves' side.)
>>> Best,
>>> Mohsen
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20181022/4b3e7e60/attachment-0003.htm>