[etherlab-users] Unmatched/Skipped Datagrams
Gary Grobe
ggrobe at houstonmechatronics.com
Tue Dec 17 23:50:43 CET 2019
A few more questions if I may ...
My understanding of an unmatched datagram is that the master doesn't
recognize the frame it sent from the frame received, not that they're
not making it all the way around, though I'm sure that could very well
be the case, but what I'm looking at is as follows.
- datagram 0x21 is identified as being sent at 17401.923410.
- datagram 0x21 is shown received at 17401.923614.
And at 17401.923646, datagram 0x21 is shown as unmatched at 17401.923647.
[17401.923395] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923396] EtherCAT DEBUG 0: Adding datagram 0x21
[17401.923397] EtherCAT DEBUG 0: frame size: 46
[17401.923397] EtherCAT DEBUG 0: Sending frame:
[17401.923398] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4
18 10
[17401.923410] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923421] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[17401.923433] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923501] EtherCAT DEBUG 0: Datagram 00000000965a32dd already
queued (skipping).
[17401.923503] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923504] EtherCAT DEBUG 0: Adding datagram 0x22
[17401.923505] EtherCAT DEBUG 0: frame size: 46
[17401.923506] EtherCAT DEBUG 0: Sending frame:
[17401.923507] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4
18 10
[17401.923518] EtherCAT DEBUG: 0A 22 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923529] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[17401.923541] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923600] EtherCAT DEBUG 0: Received frame:
[17401.923602] EtherCAT DEBUG: FF FF FF FF FF FF 02 60 E0 7A 63 E0 88 A4
18 10
[17401.923614] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923625] EtherCAT DEBUG: 00 00 00 00 00 00 01 00 00 00 00 00 00 00
00 00
[17401.923637] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923646] EtherCAT DEBUG 0: UNMATCHED datagram:
[17401.923647] EtherCAT DEBUG: 0A 21 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923659] EtherCAT DEBUG: 00 00 00 00 00 00 01 00
[17401.923665] EtherCAT DEBUG 0: Received frame:
[17401.923666] EtherCAT DEBUG: FF FF FF FF FF FF 02 60 E0 7A 63 E0 88 A4
18 10
[17401.923678] EtherCAT DEBUG: 0A 22 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923689] EtherCAT DEBUG: 00 00 00 00 00 00 01 00 00 00 00 00 00 00
00 00
[17401.923700] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923719] EtherCAT DEBUG 0: ec_master_send_datagrams(device_index = 0)
[17401.923720] EtherCAT DEBUG 0: Adding datagram 0x23
[17401.923721] EtherCAT DEBUG 0: frame size: 46
[17401.923721] EtherCAT DEBUG 0: Sending frame:
[17401.923722] EtherCAT DEBUG: FF FF FF FF FF FF 00 60 E0 7A 63 E0 88 A4
18 10
[17401.923733] EtherCAT DEBUG: 0A 23 00 00 00 00 0C 00 00 00 00 00 00 00
00 00
[17401.923745] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00
[17401.923756] EtherCAT DEBUG: 00 00 00 00 00 00 00 00 00 00 00 00
[17401.923825] EtherCAT DEBUG 0: Datagram 00000000965a32dd already
queued (skipping).
From my reading, the master did send and receive this datagram. Yet I'm
still not quite clear on why this is unmatched. I can only assume that
something (which doesn't show in the log), is different and the master
doesn't like and we're saying it's some sort of network corruption? Am I
mis-interpretting things?
WRT calling ecrt_master_receive() too soon after ecrt_master_send(),
I've timed the calls by using the following before each call.
{
static uint8_t tictoc = 0;
tictoc = !tictoc;
clock_gettime(CLOCK_REALTIME, &recv[tictoc]);
printf("recv[%d]: %lld.%.9ld\n", tictoc, (long long)
recv[tictoc].tv_sec, recv[tictoc].tv_nsec);
ecrt_master_receive(master);
// ...
}
The time diff between ecrt_master_send() and the next iteration of the
task ecrt_master_recv() is common in the following:
1) 7400ns
2) 7304ns
3) 7289ns
recv[0]: 1576621482.827606007
proc[0]: 1576621482.827611921
queue[0]: 1576621482.827618270
send[0]: 1576621482.827622648
after[0]: 1576621482.827627484 <--- 1
recv[1]: 1576621482.827634884 <--- 1
proc[1]: 1576621482.827641278
queue[1]: 1576621482.827647525
send[1]: 1576621482.827652114
after[1]: 1576621482.827656902 <--- 2
recv[0]: 1576621482.827664206 <--- 2
proc[0]: 1576621482.827670504
queue[0]: 1576621482.827676664
send[0]: 1576621482.827681030
after[0]: 1576621482.827685919 <--- 3
recv[1]: 1576621482.827693208 <--- 3
proc[1]: 1576621482.827699553
queue[1]: 1576621482.827706048
send[1]: 1576621482.827710468
after[1]: 1576621482.827715254
This was running at 500Hz with pretty much an empty task doing nothing
other than the calls listed in the OP using the ec_e1000e. With those
numbers, it looks as if there's plenty of time to receive properly. I've
also removed the following 3 lines which I place just before
ecrt_domain_queue(). If I add those back, the number of mismatches triples.
ecrt_master_application_time(master, TIMESPEC2NS(wakeup_time));
ecrt_master_sync_reference_clock(master);
ecrt_master_sync_slave_clocks(master);
Thoughts?
On 12/16/19 4:26 PM, Gavin Lambert wrote:
> Another possible cause is if your cyclic loop timing is off somehow, and you're calling ecrt_master_receive too soon after ecrt_master_send so that the packet hasn't had time to make it back to the master yet.
>
> The timing code you have below looks correct, but bear in mind that the way wakeup_time is incremented is designed to smooth latency jitter but can result in a zero-time sleep if the task is already running behind for some reason. So adding some logging of your wakeup_time vs. current time and other performance metrics could be useful as well.
>
> Also, double check the code you're using to change the process priority; you didn't include that part below. Using SCHED_FIFO requires root permissions but even then there can be restrictions. Verify it's not failing.
>
>
> Gavin Lambert
> Senior Software Developer
>
>
>
>
> COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New Zealand
> Switchboard: +49 2630 96520 | https://www.tomra.com
>
> The information contained in this communication and any attachment is confidential and may be legally privileged. It should only be read by the person(s) to whom it is addressed. If you have received this communication in error, please notify the sender and delete the communication.
> -----Original Message-----
> From: Gavin Lambert
> Sent: Tuesday, 17 December 2019 11:09
> To: Gary Grobe <ggrobe at houstonmechatronics.com>; etherlab-users at etherlab.org
> Subject: Re: [etherlab-users] Unmatched/Skipped Datagrams
>
> There's a 4.19 version of all drivers (except r8169, because reasons), including igb, in the unofficial patchset (https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsourceforge.net%2Fu%2Fuecasm%2Fetherlab-patches%2Fci%2Fdefault%2Ftree%2F&data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444435878&sdata=zYsu8AzSjFAjmgkqI8MS9xAa0r4Rsowadhwiu0yuel4%3D&reserved=0).
>
> It can be normal to see a small number of unmatched datagrams when you're starting your app up (and it's transitioning from background mode to realtime mode), and when you're replugging the network, but otherwise no, they should never appear. It means the EtherCAT packets aren't getting all the way around the network.
>
> It's possible to watch the flow of packets, either by using one of the built-in debug interfaces or (usually easier) by connecting a small dumb switch between the master and the first slave, and then connecting a third laptop/PC running Wireshark to the switch as well (preferably in such a way that the laptop can't send packets, but it's not critical). This can sometimes give you more hints about what's going on.
>
> Using "ethercat debug 1" and observing the syslog can also help.
>
>
> Gavin Lambert
> Senior Software Developer
>
> -----Original Message-----
> From: Gary Grobe
> Sent: Tuesday, 17 December 2019 10:30
> To: etherlab-users at etherlab.org
> Subject: Re: [etherlab-users] Unmatched/Skipped Datagrams
>
> The ec_igb doesn't seem to be available for 4.19. If nothing else works, I may try to roll back the kernel as a last resort, but according to recent mailings, it would appear that others are running 4.19 LTS w/o issue.
>
> I did just give things a try on new hardware:
> - Linux 4.19.82-rt30 #1 SMP PREEMPT RT x86_64 GNU/Linux
> - IgH EtherCAT master 1.5.2 6258f7d85afb+
> - Intel(R) Core(TM) i7-7700T CPU @ 2.90GHz
> - ec_e1000e using the following NIC:
>
> 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection I219-LM [8086:156f] (rev 31)
> Subsystem: Intel Corporation Ethernet Connection I219-LM [8086:0000]
> Kernel driver in use: ec_e1000e
> Kernel modules: e1000e
>
> Also replaced cables, but even at 50Hz, I'm still seeing the following with the e1000e.
>
> [11576.944102] EtherCAT WARNING 0: 7 datagrams UNMATCHED!
> [11577.940744] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 13 times.
> [11577.944111] EtherCAT WARNING 0: 13 datagrams UNMATCHED!
> [11578.944115] EtherCAT WARNING 0: 13 datagrams UNMATCHED!
> [11578.944153] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 20 times.
> [11579.944090] EtherCAT WARNING 0: 17 datagrams UNMATCHED!
> [11580.940728] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 25 times.
> [11580.944071] EtherCAT WARNING 0: 9 datagrams UNMATCHED!
> [11581.944077] EtherCAT WARNING 0: 31 datagrams UNMATCHED!
> [11581.944091] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 32 times.
> [11582.944104] EtherCAT WARNING 0: 4 datagrams UNMATCHED!
> [11583.940692] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 18 times.
> [11583.944069] EtherCAT WARNING 0: 19 datagrams UNMATCHED!
> [11584.944048] EtherCAT WARNING 0: 3 datagrams UNMATCHED!
> [11584.944179] EtherCAT WARNING: Datagram 000000004e9cc6e1
> (domain0-0-main) was SKIPPED 8 times.
> [11585.944076] EtherCAT WARNING 0: 30 datagrams UNMATCHED!
>
> I've also replaced the controllers w/ a few in-house as well as Elmo motion controllers, but to no avail. Is it common/ok to have a small percentage of these messages to appear?
>
> Any other thoughts?
>
> On 12/15/19 3:55 PM, Gavin Lambert wrote:
>> 1kHz cycle is typically too fast for the generic driver. Try using the ec_igb driver instead, or reduce your cycle to 250 Hz or less.
>>
>> Otherwise, there may be some other issue with your network wiring, or your hardware. Have you tried it on other hardware?
>>
>>
>> Gavin Lambert
>> Senior Software Developer
>>
>>
>>
>>
>> COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland
>> 1061 | New Zealand
>> Switchboard: +49 2630 96520 | https://www.tomra.com
>>
>> The information contained in this communication and any attachment is confidential and may be legally privileged. It should only be read by the person(s) to whom it is addressed. If you have received this communication in error, please notify the sender and delete the communication.
>> -----Original Message-----
>> From: Gary Grobe
>> Sent: Saturday, 14 December 2019 11:19
>> To: etherlab-users at etherlab.org
>> Subject: [etherlab-users] Unmatched/Skipped Datagrams
>>
>> My configuration is:
>> - Linux 4.19.82-rt30 #1 SMP PREEMPT RT x86_64 GNU/Linux
>> - IgH EtherCAT master 1.5.2 6258f7d85afb+
>> - Intel(R) Celeron(R) CPU J1900 @ 1.99GHz
>> - ec_generic based on the following NIC:
>>
>> 03:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
>> Subsystem: Intel Corporation I211 Gigabit Network Connection [8086:0000]
>> Kernel driver in use: igb
>>
>> The app is a user-space client. The following is the cyclic task cut down to a minimum, and yet still shows the issue.
>>
>> {
>> // ...
>>
>> struct sched_param param = { };
>>
>> param.sched_priority = sched_get_priority_max(SCHED_FIFO);
>>
>> if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) {
>> std::cout << "Warning: Failed to lock memory: " <<
>> strerror(errno) <<
>> std::endl;
>> }
>>
>> stack_prefault();
>>
>> clock_gettime(CLOCK_MONOTONIC, &wakeup_time);
>> wakeup_time.tv_sec += 1;
>> wakeup_time.tv_nsec = 0;
>>
>> while (running) {
>> ret = clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &wakeup_time,
>> NULL);
>>
>> if (ret) {
>> std::cerr << "clock_nanosleep(): " << strerror(ret) <<
>> std::endl;
>>
>> break;
>> }
>>
>> task_cycle();
>>
>> wakeup_time.tv_nsec += PERIOD_NS;
>>
>> while (wakeup_time.tv_nsec >= NSEC_PER_SEC) {
>> wakeup_time.tv_nsec -= NSEC_PER_SEC;
>> wakeup_time.tv_sec++;
>> }
>> }
>> }
>>
>> inline void TestChain::task_cycle()
>> {
>> ecrt_master_receive(master);
>> ecrt_domain_process(domain);
>>
>> ecrt_master_application_time(master, TIMESPEC2NS(wakeup_time));
>> ecrt_master_sync_reference_clock(master);
>> ecrt_master_sync_slave_clocks(master);
>>
>> if (counter) {
>> counter--;
>> }
>> else {
>> counter = FREQUENCY;
>> }
>>
>> ecrt_domain_queue(domain);
>> ecrt_master_send(master);
>> }
>>
>> The following messages from dmesg were taken at 1kHz. Doesn't matter if it's doing 10Hz or 1kHz, the UNMATCHED and SKIPPED numbers are quite large. I've only seen 1Hz work w/o issue.
>>
>> [ 4132.457210] EtherCAT 0: Domain 0: 268 working counter changes - now 1/2.
>> [ 4133.373743] EtherCAT WARNING 0: 657 datagrams UNMATCHED!
>> [ 4133.461587] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 736 times.
>> [ 4133.461593] EtherCAT 0: Domain 0: 395 working counter changes - now 0/2.
>> [ 4134.374131] EtherCAT WARNING 0: 643 datagrams UNMATCHED!
>> [ 4134.465961] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 776 times.
>> [ 4134.465968] EtherCAT 0: Domain 0: 232 working counter changes - now 0/2.
>> [ 4135.374332] EtherCAT WARNING 0: 850 datagrams UNMATCHED!
>> [ 4135.470468] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 722 times.
>> [ 4135.470476] EtherCAT 0: Domain 0: 331 working counter changes - now 1/2.
>> [ 4136.374860] EtherCAT WARNING 0: 649 datagrams UNMATCHED!
>> [ 4136.475180] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 858 times.
>> [ 4136.475187] EtherCAT 0: Domain 0: 282 working counter changes - now 1/2.
>> [ 4137.375178] EtherCAT WARNING 0: 757 datagrams UNMATCHED!
>> [ 4138.319869] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 745 times.
>> [ 4138.319877] EtherCAT 0: Domain 0: 208 working counter changes - now 1/2.
>> [ 4138.375593] EtherCAT WARNING 0: 972 datagrams UNMATCHED!
>> [ 4139.324269] EtherCAT WARNING: Datagram 000000002f1447da
>> (domain0-0-main) was SKIPPED 911 times.
>>
> --
> _The contents of this email message and any attachments are intended solely for the addressee(s). This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. _Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. __ _______________________________________________
> etherlab-users mailing list
> etherlab-users at etherlab.org
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.etherlab.org%2Fmailman%2Flistinfo%2Fetherlab-users&data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444445873&sdata=R2bB6zC6lYaR7mcWdONQRGmrG%2FvZICVl3b0ef6aBXHQ%3D&reserved=0
> _______________________________________________
> etherlab-users mailing list
> etherlab-users at etherlab.org
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.etherlab.org%2Fmailman%2Flistinfo%2Fetherlab-users&data=02%7C01%7Cgavin.lambert%40tomra.com%7Cd8107ac890f84bfecff508d782748ed6%7C4308d118edd143008a37cfeba8ad5898%7C0%7C0%7C637121309444445873&sdata=R2bB6zC6lYaR7mcWdONQRGmrG%2FvZICVl3b0ef6aBXHQ%3D&reserved=0
--
_The contents of this email message and any attachments are intended solely
for the addressee(s). This message (including any attachments) may contain
confidential, proprietary, privileged and/or private information. The
information is intended to be for the use of the individual or entity
designated above. If you are not the intended recipient of this message,
please notify the sender immediately, and delete the message and any
attachments. _Any disclosure, reproduction, distribution or other use of
this message or any attachments by an individual or entity other than the
intended recipient is prohibited. __
More information about the Etherlab-users
mailing list