[etherlab-users] Testing 1.5 branch: several issues

Wed Jun 29 17:58:27 CEST 2011

Hi Florian,

> > - Also, the problem with the debug interface during RTAI PDO
> >   transfer still exists
> >   (http://lists.etherlab.org/pipermail/etherlab-users/2011/001205.html),
> >   although the behaviour is a little different: It doesn't give a
> >   "Kernel BUG" anymore, but "Default Trap Handler: vector 6: Suspend
> >   RT task f8840880" (and the cyclic task indeed gets suspended). But
> >   the same patch as before fixes it
> >   (ethercat-1.5-debug-disable.patch).
> 
> Im not sure if it makes sense to enable the debug interfaces, if not all
> data are forwarded to it. Ome might get a wrong opinion about a system
> when there are the most frequent frames missing.

That's true. On the other hand, with the code as is, I have the
choice between never using the debug interface at all or risking to
forget to shut it down before starting a RTAI PDO task and crashing
the system which is a little annoying. ;-) With my patch, I can at
least use the debug interface while PDO transfer is not running,
without taking this risk.

Of course, I could disable the debug interface completely while the
RTAI task is running. Then it would be obvious that it's disabled
because there would be no frames visible. But it would still be
safer than it's now.

If you like, I can implement it this way. Actually, it would only be
a change in the demo program itself (put the ec_debug_disable()
calls in init_mod()/cleanup_mod() rather than run()).

Or the master could disable it automatically when
ecrt_master_activate() is called. Then it would be guaranteed --
though I think the explicit ec_debug_disable() call is more
flexible, but of course puts more responsibility on the application
programmer to use it reasonably.

> > - examples/mini/ gives the same syslog error ("scheduling while
> >   atomic") immediately upon "insmod". The error persists if I remove
> >   everything from cyclic_task() except for a single down() and up()
> >   call (but disappears if I remove these statements too).
> > 
> >   I see that the cyclic task is registered with add_timer(). Indeed
> >   according to http://www.makelinux.net/ldd3/chp-7-sect-4, such
> >   timer callbacks run in atomic context (interrupt context in fact)
> >   and therefore "Semaphores also must not be used since they can
> >   sleep."
> > 
> >   A solution might be to also revert the semaphore back to a
> >   spinlock as it was in 1.4.0. However, I'm not sure if any of the
> >   other functions called by cyclic_task() can sleep (or do anything
> >   else that's forbidden in interrupt context). If so, it might be
> >   easier to avoid the timer callback altogether and convert it to a
> >   kernel thread with sleeps or so.
> > 
> >   The same probably applies to examples/tty/ which I didn't test and
> >   possibly to tty/module.c.
> 
> If the timer callback runs in interrupt context, it will be probably the
> best solution tu use a spinlock.

OK. Since I'm not using this code myself (only RTAI processes), I'm
not making this change now. I hope you'll do it in the future.

> > - I wonder whether the use of RTAI semaphores in the master
> >   callbacks in the examples (e.g. examples/rtai/) is safe. (Since my
> >   own application also uses RTAI, in the same style as those
> >   examples, this is an important question for me, not just
> >   hypothetical.) I found this message:
> >   http://blog.gmane.org/gmane.linux.real-time.rtai/month=20020801
> >   "Any nonblocking function can be used from Linux, typically
> >   rt_sem_signal and rt_task_resume". The message is a bit dated, and
> >   he doesn't strictly say that blocking functions cannot be used
> >   from Linux (i.e., non-RTAI kernel modules), but it might be
> >   implied. Therefore the use of rt_sem_wait() in the callbacks might
> >   be problematic.
> >
> >   [...]
> 
> Did you encounter any problems with the RTAI example?

Honestly, I can't tell. I get so many crashes and other failures due
to the other problems mentioned that I don't know if some of them
are due to this also.

I now asked on the RTAI mailing list
(https://mail.rtai.org/pipermail/rtai/2011-June/024614.html).
According to the replies I got, a non-RT task that wants to use all
RTAI functions needs to be initialized with rt_kthread_init() (and
stopped with rt_task_delete()).

In this case, this would be the EoE task which (at least for now) is
the only one that uses the callbacks. At the moment, it's started
with kthread_run() from ec_master_eoe_start() and stopped with
kthread_stop().

One could e.g. let the application set some more callbacks to
provide its own task start/stop functions and one to replace the
kthread_should_stop() call in ec_master_eoe_thread(), e.g., by
providing a function ecrt_master_callbacks_ext() or so which takes
the additional callbacks, keeping the interface of
ecrt_master_callbacks() unchanged.

The master could use the new callbacks when it restarts/stops the
EoE thread in ecrt_master_[de]activate(). It could also use them to
start/stop the master operation thread -- ATM, there seems to be no
need since it doesn't use the callbacks, but AFAIK it doesn't hurt
to start them as (non-RT) RTAI tasks, and it would be more flexible
WRT future extensions.

If you agree, I can try to implement that.

> It is rarely
> modified and uses this mechanism quite for a while. We never saw aqny
> problems concerning the locking mechanism...

Perhaps due to the t_critical check? Anyway, if the RTAI author says
it's not safe this way and recommends to use rt_kthread_init(), I'd
rather do that than take any chances. Locking-related bugs
(particularly in RT code) are not my favourite kind of bugs to
debug, especially if they occur rarely. ;-)

> > - In my application I need the ability to read/write arbitrary SDOs
> >   (from the non-realtime, but kernel-module part of the code, so
> >   going through the cdev would be awkward; however at a time when
> >   the master is already activated and running).
> > 
> >   I thought I could just do an
> >   ecrt_slave_config_create_sdo_request() when needed. I'm not sure
> >   if this function is supposed to be used while the master is
> >   running (I couldn't find a statement that forbids it anyway).
> >   However, there is no corresponding "delete" function, so used-up
> >   SDO requests would accumulate and leak.
> > 
> >   I see three possible solutions:
> > 
> >   - Implement ecrt_slave_config_delete_sdo_request() or such. Is
> >     there more to it than basically doing
> >     "list_del(&req->list, &sc->sdo_requests);" after appropriate
> >     checks that the request is not busy etc.? And if done, would it
> >     be possible/reasonable to use
> >     ecrt_slave_config_create_sdo_request() while the master is
> >     running?
> > 
> >   - Allow changing the index and subindex of an existing request (so
> >     I could create some requests on startup and reuse them for
> >     arbitrary SDOs -- I only need a fixed number of them
> >     simultaneously). This seems to match the TODO list item: "Change
> >     SDO index at runtime for SDO request." Is there more to it than
> >     calling ec_sdo_request_address() (again, after appropriate
> >     checks)?
> 
> No, it's as easy as that. The reason for holding that back was just to
> avoid another interface change in 1.5. Give it a try.
> 
> >   - Implement ecrt_master_sdo_{down,up}load() also for (non-RT)
> >     kernel access. Of course, this could be implemented simply on
> >     top of ecrt_slave_config_create_sdo_request() etc. if either of
> >     the previous two solutions were implemented, but then the master
> >     has to call ecrt_sdo_request_read() etc. (as in read_sdo() in
> >     examples/mini/mini.c), so it would have to know about the SDO
> >     requests which might be a problem in the general case (in my own
> >     application probably not -- I know which SDO requests I have and
> >     can let my master know about them).
> 
> I personally dislike the SDO handler interface. My idea is to drop it
> completely and just use the (blocking) ad-hoc SDO transfer functions.
> SDO transfers are asynchronous anyway, so perhaps its the best way to
> let the user decide , how to cope with the blocking (perhaps spawn a
> special asynchronous thread for this).

This would be fine for me. I use them in a blocking way anyway. So I
won't do the first two items because they apply to the non-blocking
interface.

> For this, also a
> slave-config-based transfer functions had to be implemented.
>
> >     However, I see that this is apparently not required for the way
> >     the cdev does SDO transfers, so I tried to adjust this code for
> >     in-kernel use and implemented ecrt_master_sdo_{down,up}load() in
> >     slave_config.c. I'm not sure if it's intended to be used this
> >     way, and I wonder especially about the usage of
> >     ec_master_find_slave() (apparently the user-space code uses
> >     different ways to identify a slave -- although the tool accepts
> >     alias and position as command-line arguments, it only passes the
> >     position to the ioctl; however, in the ec_slave_config_t struct
> >     used in the kernel we have both alias and position; but since
> >     ec_master_find_slave() accepts alias and position, this might be
> >     alright). The code is not commented etc., since I'm not yet sure
> >     if that's the right way to go
> >     (ethercat-1.5-sdo-up-download.patch).

So that's what you also like, right? If you want, I can comment and
document them, so they can hopefully be included.

What's important for me is the interface, so if I write my
application according to this interface now, I don't want to have to
change it later. So do you agree with this interface, or do you want
any changes?

A minor thing: I'd like to be able to clean up all built files (so
after making some changes I can see exactly what I changed). AFAICS,
"make mrproper" comes closest, but misses a few. With the attached
patch to the Makefile, it will revert it exactly to the downloaded
state after building. (The "find" call is a little nasty, it's
necessary because documentation/ uses different Makefiles etc.)

Regards,
Frank

-- 
Dipl.-Math. Frank Heckenbach <f.heckenbach at fh-soft.de>
Systemprogrammierung, EDV-Beratung
Stubenlohstr. 6, 91052 Erlangen, Deutschland
Tel.: +49-9131-21359
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ethercat-1.5-mrproper.patch
Type: text/x-diff
Size: 469 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20110629/e22bd8f5/attachment-0005.patch>