[etherlab-dev] wait_event() causes uninterruptible_sleep

Gavin Lambert gavin.lambert at tomra.com
Wed Jan 29 23:50:57 CET 2020


I'm not entirely sure, but I don't think simply changing that would be safe.

The whole "on interrupt return -EINTR" thing assumes that it's safe to simply make the exact same call again to "resume" the operation.  This is true in the first case because it's just waiting for the request to be enqueued, and on interrupt it simply dequeues it again.  However after that there's a race where it might have already been sent and is waiting for a response, and in that case it's not safe to return -EINTR because it might end up being sent a second time, which could cause incorrect behavior of the slave.  (And would probably also confuse the mailbox FSM.)

It might be possible to abort the request on interrupt instead, but that would be annoying as thread signals can cause spurious interrupts.  (And might still end up meaning the slave will receive requests twice, if the app then explicitly retries.)


If you instead explicitly close(masterfd) (aka ecrt_release_master) in your problem case, this should abort all pending requests and wake up the threads - you can see the code that does this in ec_slave_clear and ec_master_clear_slaves.

(The OS will automatically do this when your process actually terminates, but not while you still have a live thread.  So you will have to use an exception/signal handler to intercept the crash in progress.)

Another option is to use the non-blocking SDO request APIs instead.  Using these (on the cyclic thread) is better anyway for regular transfers done while the master is activated, as it avoids ping-ponging the master locks between multiple threads, which can increase cycle latency.


Gavin Lambert
Senior Software Developer

[cid:logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png]
[cid:compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png]<http://www.compacsort.com> [cid:facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png] <https://www.facebook.com/Compacsort>  [cid:linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png] <https://www.linkedin.com/company/compac-sorting-equipment/>  [cid:youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png] <https://vimeo.com/compacsort>  [cid:twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png] <https://twitter.com/compacsort>  [cid:insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png] <https://www.instagram.com/compacsort/>

COMPAC SORTING EQUIPMENT LTD | 4 Henderson Pl | Onehunga | Auckland 1061 | New Zealand
Switchboard: +64 96 34 00 88 | tomra.com<http://www.tomra.com>

The information contained in this communication and any attachment is confidential and may be legally privileged. It should only be read by the person(s) to whom it is addressed. If you have received this communication in error, please notify the sender and delete the communication.

From: Geller, Nir
Sent: Wednesday, 29 January 2020 23:31
To: etherlab-dev at etherlab.org
Subject: [etherlab-dev] wait_event() causes uninterruptible_sleep

Hi There,

we are working with etherlab's ethercat master and recently we've encountered a problem that is related to a non interruptible wait_event().

The scenario:
A multi-threaded user space app cyclically reads SDO from some ecat slave.
The user space app then crashes.
All the threads end besides the one that performs the SDO read:

.....
1022  1022 TS       -   0  19   0  0.0 Zl   task_dead                abcde <defunct>
1022  1202 RR       2   -  42   0  0.6 Dl   ecrt_master_sdo_upload   abcde1
.....

This situation interferes with debugging the app, and prevents a core dump from being generated.

In master.c in ecrt_master_sdo_upload() I see an invoke of wait_event_interruptible() followed by an invoke of wait_event().

After changing wait_event() to wait_event_interruptible() the app can successfully crash, and it is now easier to debug.

Needless to say, we need a core dump to be generated when the app crashes at costumer's site.

The question is what is the reason behind using wait_event() instead of wait_event_interruptible() ?

Is it safe for us to change the code?

Thanks,

Nir.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png
Type: image/png
Size: 11438 bytes
Desc: logo_compac_5dcf97ef-52f5-498c-8b9b-728410ddffaf.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png
Type: image/png
Size: 1629 bytes
Desc: compacicon_82e8a8c7-154a-4a32-9720-a5badb6258e0.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png
Type: image/png
Size: 1750 bytes
Desc: facebook_fa85b924-53b9-45cc-8162-0564f64ec3a3.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png
Type: image/png
Size: 1855 bytes
Desc: linkedin_4ec016ad-84fa-443c-85a3-b9615a4ccef8.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png
Type: image/png
Size: 1970 bytes
Desc: youtube_32142163-fc27-4aed-b14d-e8a377f98a6d.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0011.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png
Type: image/png
Size: 20278 bytes
Desc: twitter_d89338d8-98c8-4b65-9a9e-7b1333160b0d.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0012.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png
Type: image/png
Size: 1506 bytes
Desc: insta2_1cd85de9-b3a2-4971-9904-52b2481a7c82.png
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20200129/2fc39c62/attachment-0013.png>


More information about the etherlab-dev mailing list