[etherlab-dev] Problems with Xenomai

Fri Sep 30 14:33:08 CEST 2016

Hi Gavin,

thanks for the answer.

On 09/29/2016 01:21 AM, Gavin Lambert wrote:
> Given that stack trace, and that it works on default but not 1.5.2, then
> most likely the commit that worked around the issue for you was
> https://sourceforge.net/p/etherlabmaster/code/ci/3affe9cd0b66fe55ef8e8060778
> ef9461a8204a0.
>
> Having said that, given that the only reason I can think of that this would
> segfault is if strerror returned NULL or an invalid pointer, it suggests
> that you might have a broken or badly configured libc.  If you're building
> the libc yourself, make sure that you're using an up-to-date version and
> haven't excluded the strerror text.
>
> Another possibility is that if you were concurrently calling strerror() on
> another thread (and your libc doesn't implement strerror in a thread-local
> manner) then it could have corrupted the buffer.  Most likely another patch
> would be required to resolve this "properly", although one workaround for
> this is to avoid calling ecrt_* APIs from more than one thread.
>
> Although I suppose since you're linking to RTDM it's possible that
> strerror() is coming from there rather than the libc; I'm not exactly sure
> how RTAI/Xenomai work.  Or possibly that in that context it could be that
> the fprintf(strerr) itself is failing -- but this isn't new code so I would
> have thought the problem would have come up earlier if that were the case.
>
> I'm not sure exactly which commit 1.5.2 is based on, but it will be one of
> the ones in the "stable-1.5" branch.  Everything on "default" is newer than
> that.
My test application has only one Xenomai-task (thread) like the xenomai
example, so I don't think this is a concurrency problem unless there is
a thread of the master itself involved. My libc is rather old though
(2.13). Unfortunately there is no newer version backported for Debian
wheezy and I don't want to install it from sources since it wont be
available for working environments anyway. I will leave this matter for
now since it seems to be fixed or at least omitted already. I needed to
know when this was fixed to add the fix as patch to our Debian package
of the EtherCAT master.

> #2.)
> I did some minor tests with the patch queue and got some bad system
> freezes with the xenomai example. I could locate the patch that seems to
> cause the system freezes:
> 0011-Master-locks-to-avoid-corrupted-datagram-queue.patch
> The only notable thing I could see in the kernel log is that the slaves went
> back to PREOP. The Xenomai task was still running and hanging at some point
> of the cycle (I placed an rt_printf in the cycle which should have printed
> the cycle_counter value every other second).
> The patch series seems to work if I apply the patches up to 0010-Sdo-
> directory-now-only-fetched-on-request.patch. Is this reproduceable for
> you?
> I'm not sure about this as I don't use Xenomai myself.  That particular
> patch was authored by Knud Baastrup, so I've added him to the email chain
> directly just in case.  If I recall correctly I think he, like myself, was
> using PREEMPT_RT so it's possible that this has not been tested with
> Xenomai.
>
> Do you have locking on the Xenomai side as well?  Do you call ecrt APIs from
> multiple Xenomai tasks?  I believe the patch assumes that there is no
> external locking between tasks, so you might be running into deadlocks
> depending on the order in which things happen.
>
> Using Linux locks between Xenomai tasks is probably not ideal, but I would
> have expected that it ought to work as this occurs in other places as well.
This problem occured with the xenomai example (./examples/xenomai in the
masters source code) as well. There is only one Xenomai task and no
explicit locking from applications side. I am new to Xenomai but as far
as I understand Xenomai it uses a 'dual kernel' configuration called
'cobalt core' which has higher priority than the normal kernel and does
all the scheduling of realtime tasks (see
https://xenomai.org/start-here/#How_does_Xenomai_deliver_real-time). A
Xenomai task should therefore block every task executed in normal kernel
space until it's executed. My guess is that the task waits infinitely
for a master component to be unlocked by another thread in kernel space
which is never done because this thread is not executed due to the
higher priority of the Xenomai task.

Best regards,
Christoph

________________________________

Helmholtz-Zentrum Berlin für Materialien und Energie GmbH

Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.

Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher
Geschäftsführung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking

Sitz Berlin, AG Charlottenburg, 89 HRB 5583

Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin

http://www.helmholtz-berlin.de