[etherlab-users] Error reassigning removed PDO

Jun Yuan j.yuan at rtleaders.com
Thu May 29 14:33:21 CEST 2014


Thank you so much, after reading your mail, I finally understand why some
slave goto SAFEOP+ERROR state under the circumstances. Yes I had exactly
the same problem.


On 29 May 2014 11:24, Gavin Lambert <gavinl at compacsort.com> wrote:

> It’s mostly a master problem I think, although some of the worst
> misbehaviour requires particular functionality in the slave (which may be
> rarer).
>
>
>
> The main problem that I’ve personally run into recently (and coded my own
> workaround for, just a few minutes ago) was from this scenario:
>
> 1.       Master starts up, starts doing slave scanning.
>
> 2.       Application starts up, calls ecrt_request_master, which waits
> for slave scanning to complete before returning.
>
> 3.       Application sets up basic configuration and calls
> ecrt_master_activate.
>
> 4.       Slaves wind their way up to OP.
>
> 5.       Meanwhile in the background the master starts reading the CoE
> dictionary and getting entry descriptions to fill in the names.  (This
> takes quite a long time.)
>
> 6.       Application decides something is screwy while this is still
> happening and calls ecrt_master_release and unloads the master module.
>
> 7.       Since the master stops dead when this happens, occasionally it
> has just sent a CoE Info request to a slave but abandoned waiting for the
> response.  The response is still sitting there in the slave’s mailbox.  The
> slaves have dropped back to SAFEOP+ERROR because they’re no longer
> receiving data.
>
> 8.       The master service and application are reloaded.
>
> 9.       The initial scan sees the slaves at >= PREOP so merely
> acknowledges the error and leaves them at SAFEOP, then starts to read
> SM+PDOs.
>
> 10.   When it gets to the slave that had a stale SDO Info response in its
> mailbox (which is still there, because the slave was never sent back to
> INIT), it gets confused because it wasn’t the SDO 0x1C12 data response it
> was expecting (because it had just sent the request); it aborts the request
> and assumes 0 PDOs in that SM.  Hilarity ensues, as I’ve already outlined
> below.
>
>
>
> (This can also occur if the network is disconnected but not unpowered at
> any time during the CoE dictionary scan, then reconnected later.)
>
>
>
> Note that it’s reasonable for the scan to not reset to INIT, because
> rescans can occur during operation (although having said that, I haven’t
> looked too closely at whether this disrupts anything).  But I think it’s
> definitely a master-side bug that it can’t cope with stale responses –
> that’s just something you always have to expect with mailboxes, especially
> when there are timeouts involved as well.
>
>
>
> My workaround was to change the CoE FSM to check for and discard any stale
> data in the mailbox prior to beginning any CoE operation.  It seemed to
> resolve the above issue in a very basic test, but I’ll hopefully know more
> after a more thorough one tomorrow.
>
>
>
> It’s not an ideal solution, of course; the underlying problem (which I
> hinted at below, and posted in more detail about several months ago) is
> that the Etherlab code assumes that only one thing is going on in the
> mailboxes at a time, and so only checks them when it’s expecting a response
> and throws its virtual hands up when it finds something other than what it
> wanted.  This is particularly noticeable if a slave sends asynchronous
> notifications, or can process multiple mailbox protocols in parallel (both
> of which are allowed in the standards).  The most common types of these are
> CoE emergencies and EoE.  And woe betide you if the master happens to be
> handling a FoE request when an emergency arrives, or a CoE request when an
> EoE packet arrives, etc.
>
>
>
> Ideally the master should have some sort of central dispatcher which is
> constantly watching mailboxes and handing off incoming data to the protocol
> state machines as they arrive.  Often this can even be done for “free” –
> many slaves provide a dedicated “MBoxState” FMMU that can be used to watch
> for new mailbox messages as part of the regular process datagram, avoiding
> the need to individually poll the slaves.
>
>
>
> *From:* Jun Yuan [mailto:j.yuan at rtleaders.com]
> *Sent:* Thursday, 29 May 2014 20:40
> *To:* Gavin Lambert
> *Cc:* etherlab-users at etherlab.org
> *Subject:* Re: [etherlab-users] Error reassigning removed PDO
>
>
>
> Hello Gavin,
>
> for that specific part of the CoE transfer problem you mentioned, I may
> have observed the same problem, and I did some analysis on it. This is
> actually a big problem, makes the master quite unreliable for me. I have a
> temporary fix for it. But I don't know who should be responsible for this
> CoE mailbox bug. Is it the master? Is it the slave? or is it a design error
> in the EtherCAT standard for the mailbox? I'll write another email to
> elaborate the problem with the flaky CoE mailbox.
>
> Regards,
> Jun
>
>
>
> On 29 May 2014 09:37, Gavin Lambert <gavinl at compacsort.com> wrote:
>
> Last month, I wrote:
> > TLDR: when reassigning PDOs, why doesn't the master read mappings from
> > the slave via CoE?
> [...]
> > Shouldn't this scenario work?  The PDO is always specified in the SII,
> > even if not presently in PDO Assign, so the master ought to know that it
> > exists.
> > And failing that, it could just try to read the mappings directly from
> > the slave (if CoE is available) when unable to load default mapping from
> > its cache.  (I think part of the problem is that the CoE data appears to
> > be replacing the SII data in the master's PDO cache.)
> >
> > I'm also a little puzzled as to why (if it wants to have a cache of PDO
> > mappings) it seems to limit itself to reading only the currently
> > assigned PDOs during the initial scan, instead of fetching all of them.
> > They shouldn't be hard to find -- they can be identified purely by their
> > index.
>
> There's a further problem with this that I've since discovered: if, during
> the master's scan of the PDO assignment registers, something goes wrong
> with
> the CoE transfer of 0x1C1x:0, then the master will log an error but proceed
> anyway under the assumption that the slave has 0 PDOs assigned in that SM.
> If this is not contradicted by the application using ecrt_slave_config_pdos
> (including both assigns and mappings, because it read no default mappings),
> then the master will *write 0 back* to the PDO assignment register (if
> writable) on activate.
>
> This guarantees that the next scan will not find any PDOs, unless the slave
> reloads the default assignments during INIT (and with my "slave author" hat
> on, all advice I can find says that slaves should not do that, although I
> couldn't find official word).
>
> So basically it all seems to point to applications being unreliable (at
> least for flexible-assignment slaves) unless they use
> ecrt_slave_config_pdos
> to configure *everything* (including mappings, even for fixed-mapping
> slaves).  Which makes me wonder why it bothers scanning for PDO assignments
> at all.  Doesn't that just waste time if apps have to use
> ecrt_slave_config_pdos anyway?
>
> Given how flaky mailbox handling is in general (as previously mentioned),
> I'm surprised this hasn't come up more often.
>
>
> _______________________________________________
> etherlab-users mailing list
> etherlab-users at etherlab.org
> http://lists.etherlab.org/mailman/listinfo/etherlab-users
>
>
>
>


-- 
Jun Yuan
[Aussprache: Djün Üän]

Robotics Technology Leaders GmbH
Am Loferfeld 58, D-81249 München
Tel: +49 89 189 0465 24
Fax: +49 89 189 0465 11
mailto: j.yuan at rtleaders.com

Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten y, j,
q, und x wird u als ü ausgesprochen, z.B. yu => ü,  ju => dschü,  qu =>
tschü,  xu => schü.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20140529/be531a7c/attachment-0004.htm>


More information about the Etherlab-users mailing list