[etherlab-dev] Meaning of FSM return values and "datagram_used"

Mon Jun 30 09:43:05 CEST 2014

Hi Florian,

In 9cdd7669dc0b, the return value of the protocol-specific FSMs
(fsm_coe_exec, fsm_foe_exec, etc) was apparently changed from returning 0
when the FSM had completed to returning 0 when the datagram was not used
(which could also include when the datagram was still in INIT, QUEUED, and
SENT states).

However most of the places that actually call the FSMs seem to fairly
uniformly assume that returning zero indicates completion of the FSM.  Many
of the FSMs that call fsm_coe in particular themselves still claim to return
zero only on completion of their FSM.  As a result, some operations are not
actually completed.

In particular, I found a case (intermittently but repeatedly reproducible)
where the SDO dictionary scan was in progress, but a particular datagram was
delayed as a result of the master being released (and consequent transfer of
sending responsibility from operation thread to idle thread).  This caused
the scan to be abandoned for that particular slave and then resumed for the
next one.  I had Frank's "coe-lock" patch applied at the time (since this
scenario also demonstrated CoE concurrency), so since the CoE FSM was
abandoned and restarted instead of being run to completion it resulted in a
dangling lock and thus deadlocked shortly thereafter.  (I have a more
detailed analysis of this if you're interested.)

(Note that Frank's original patches are not vulnerable to this as the issue
was introduced between 1.5.0 and 1.5.2.)

As a temporary hack I tried making fsm_coe_exec also return 1 in the case
where the new datagram was not used as a result of the old datagram still
being INIT/QUEUED/SENT (ie. only return 0 when actually finished), and this
appears to have resolved the problem (but presumably not in the "right
way").  I didn't notice anything obviously bad happen as a result of this
change (according to behaviour and logs) but I didn't check everything.

Just for comparison, I've attached the patch I used for the hack.  It's
based on the patch series I posted last week (a subset of Frank's patches
updated for 1.5.2), and the most important change is near the top where it
checks fsm->datagram->state and uses the opposite return value.  Obviously a
complete patch should probably change the other FSMs (eg. FoE) similarly,
but like I said before I'm not sure this is the best way to do it, and also
the CoE FSM is the only one that gets called from so many different places,
so the others are probably less vulnerable.

This was for a separate reason, but both Frank's and Knud's patches have
introduced an INVALID state for datagrams in order to be able to return 1
from FSMs (to indicate non-completion) but not actually send a new datagram,
which again might not be the "right" way to do it but does seem to be a
common need.

Regards,
Gavin Lambert

-------------- next part --------------
A non-text attachment was scrubbed...
Name: coe-finish.patch
Type: application/octet-stream
Size: 1879 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20140630/7f779d62/attachment-0003.obj>