<div dir="ltr">Thank you so much, after reading your mail, I finally understand why some slave goto SAFEOP+ERROR state under the circumstances. Yes I had exactly the same problem.<br></div><div class="gmail_extra"><br><br>
<div class="gmail_quote">On 29 May 2014 11:24, Gavin Lambert <span dir="ltr"><<a href="mailto:gavinl@compacsort.com" target="_blank">gavinl@compacsort.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-NZ"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">It’s mostly a master problem I think, although some of the worst misbehaviour requires particular functionality in the slave (which may be rarer).<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The main problem that I’ve personally run into recently (and coded my own workaround for, just a few minutes ago) was from this scenario:<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>1.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Master starts up, starts doing slave scanning.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>2.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Application starts up, calls ecrt_request_master, which waits for slave scanning to complete before returning.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>3.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Application sets up basic configuration and calls ecrt_master_activate.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>4.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Slaves wind their way up to OP.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>5.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Meanwhile in the background the master starts reading the CoE dictionary and getting entry descriptions to fill in the names. (This takes quite a long time.)<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>6.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Application decides something is screwy while this is still happening and calls ecrt_master_release and unloads the master module.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>7.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Since the master stops dead when this happens, occasionally it has just sent a CoE Info request to a slave but abandoned waiting for the response. The response is still sitting there in the slave’s mailbox. The slaves have dropped back to SAFEOP+ERROR because they’re no longer receiving data.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>8.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The master service and application are reloaded.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>9.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">The initial scan sees the slaves at >= PREOP so merely acknowledges the error and leaves them at SAFEOP, then starts to read SM+PDOs.<u></u><u></u></span></p>
<p><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><span>10.<span style="font:7.0pt "Times New Roman""> </span></span></span><u></u><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">When it gets to the slave that had a stale SDO Info response in its mailbox (which is still there, because the slave was never sent back to INIT), it gets confused because it wasn’t the SDO 0x1C12 data response it was expecting (because it had just sent the request); it aborts the request and assumes 0 PDOs in that SM. Hilarity ensues, as I’ve already outlined below.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">(This can also occur if the network is disconnected but not unpowered at any time during the CoE dictionary scan, then reconnected later.)<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Note that it’s reasonable for the scan to not reset to INIT, because rescans can occur during operation (although having said that, I haven’t looked too closely at whether this disrupts anything). But I think it’s definitely a master-side bug that it can’t cope with stale responses – that’s just something you always have to expect with mailboxes, especially when there are timeouts involved as well.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">My workaround was to change the CoE FSM to check for and discard any stale data in the mailbox prior to beginning any CoE operation. It seemed to resolve the above issue in a very basic test, but I’ll hopefully know more after a more thorough one tomorrow.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">It’s not an ideal solution, of course; the underlying problem (which I hinted at below, and posted in more detail about several months ago) is that the Etherlab code assumes that only one thing is going on in the mailboxes at a time, and so only checks them when it’s expecting a response and throws its virtual hands up when it finds something other than what it wanted. This is particularly noticeable if a slave sends asynchronous notifications, or can process multiple mailbox protocols in parallel (both of which are allowed in the standards). The most common types of these are CoE emergencies and EoE. And woe betide you if the master happens to be handling a FoE request when an emergency arrives, or a CoE request when an EoE packet arrives, etc.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Ideally the master should have some sort of central dispatcher which is constantly watching mailboxes and handing off incoming data to the protocol state machines as they arrive. Often this can even be done for “free” – many slaves provide a dedicated “MBoxState” FMMU that can be used to watch for new mailbox messages as part of the regular process datagram, avoiding the need to individually poll the slaves.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div><div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm"><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" lang="EN-US"> Jun Yuan [mailto:<a href="mailto:j.yuan@rtleaders.com" target="_blank">j.yuan@rtleaders.com</a>] <br>
<b>Sent:</b> Thursday, 29 May 2014 20:40<br><b>To:</b> Gavin Lambert<br><b>Cc:</b> <a href="mailto:etherlab-users@etherlab.org" target="_blank">etherlab-users@etherlab.org</a><br><b>Subject:</b> Re: [etherlab-users] Error reassigning removed PDO<u></u><u></u></span></p>
</div></div><div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><div><div><p class="MsoNormal" style="margin-bottom:12.0pt">Hello Gavin,<u></u><u></u></p></div><p class="MsoNormal" style="margin-bottom:12.0pt">
for that specific part of the CoE transfer problem you mentioned, I may have observed the same problem, and I did some analysis on it. This is actually a big problem, makes the master quite unreliable for me. I have a temporary fix for it. But I don't know who should be responsible for this CoE mailbox bug. Is it the master? Is it the slave? or is it a design error in the EtherCAT standard for the mailbox? I'll write another email to elaborate the problem with the flaky CoE mailbox.<u></u><u></u></p>
</div><p class="MsoNormal">Regards,<br>Jun<u></u><u></u></p><div><p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> <u></u></p><div><p class="MsoNormal">On 29 May 2014 09:37, Gavin Lambert <<a href="mailto:gavinl@compacsort.com" target="_blank">gavinl@compacsort.com</a>> wrote:<u></u><u></u></p>
<p class="MsoNormal">Last month, I wrote:<br>> TLDR: when reassigning PDOs, why doesn't the master read mappings from<br>> the slave via CoE?<br>[...]<br>> Shouldn't this scenario work? The PDO is always specified in the SII,<br>
> even if not presently in PDO Assign, so the master ought to know that it<br>> exists.<br>> And failing that, it could just try to read the mappings directly from<br>> the slave (if CoE is available) when unable to load default mapping from<br>
> its cache. (I think part of the problem is that the CoE data appears to<br>> be replacing the SII data in the master's PDO cache.)<br>><br>> I'm also a little puzzled as to why (if it wants to have a cache of PDO<br>
> mappings) it seems to limit itself to reading only the currently<br>> assigned PDOs during the initial scan, instead of fetching all of them.<br>> They shouldn't be hard to find -- they can be identified purely by their<br>
> index.<br><br>There's a further problem with this that I've since discovered: if, during<br>the master's scan of the PDO assignment registers, something goes wrong with<br>the CoE transfer of 0x1C1x:0, then the master will log an error but proceed<br>
anyway under the assumption that the slave has 0 PDOs assigned in that SM.<br>If this is not contradicted by the application using ecrt_slave_config_pdos<br>(including both assigns and mappings, because it read no default mappings),<br>
then the master will *write 0 back* to the PDO assignment register (if<br>writable) on activate.<br><br>This guarantees that the next scan will not find any PDOs, unless the slave<br>reloads the default assignments during INIT (and with my "slave author" hat<br>
on, all advice I can find says that slaves should not do that, although I<br>couldn't find official word).<br><br>So basically it all seems to point to applications being unreliable (at<br>least for flexible-assignment slaves) unless they use ecrt_slave_config_pdos<br>
to configure *everything* (including mappings, even for fixed-mapping<br>slaves). Which makes me wonder why it bothers scanning for PDO assignments<br>at all. Doesn't that just waste time if apps have to use<br>ecrt_slave_config_pdos anyway?<br>
<br>Given how flaky mailbox handling is in general (as previously mentioned),<br>I'm surprised this hasn't come up more often.<br><br><br>_______________________________________________<br>etherlab-users mailing list<br>
<a href="mailto:etherlab-users@etherlab.org" target="_blank">etherlab-users@etherlab.org</a><br><a href="http://lists.etherlab.org/mailman/listinfo/etherlab-users" target="_blank">http://lists.etherlab.org/mailman/listinfo/etherlab-users</a><u></u><u></u></p>
</div><p class="MsoNormal"><br><br clear="all"><u></u><u></u></p></div></div></div></div></div></div></div></blockquote></div><br><br clear="all"><br>-- <br>Jun Yuan<br>[Aussprache: Djün Üän]<br><br>Robotics Technology Leaders GmbH<br>
Am Loferfeld 58, D-81249 München<br>Tel: +49 89 189 0465 24<br>Fax: +49 89 189 0465 11<br>mailto: <a href="mailto:j.yuan@rtleaders.com" target="_blank">j.yuan@rtleaders.com</a><br><br>Umlautregel in der chinesischen Lautschrift Pinyin: Nach den Anlauten y, j, q, und x wird u als ü ausgesprochen, z.B. yu => ü, ju => dschü, qu => tschü, xu => schü.
</div>