[etherlab-dev] Support for multiple mailbox protocols

Thu Jun 26 09:44:41 CEST 2014

Thanks Gavin, see my in-lined comments!

Regards,
Knud Baastrup

-----Original Message-----
From: Gavin Lambert [mailto:gavinl at compacsort.com] 
Sent: 26. juni 2014 04:19
To: Knud Baastrup
Cc: etherlab-dev at etherlab.org
Subject: RE: [etherlab-dev] Support for multiple mailbox protocols

On 26 June 2014, quoth Knud Baastrup:
>> Additionally it doesn't look like you have any protection against 
>> concurrent CoE access (which TBH I'm not entirely sure whether this 
>> occurs, but Frank's patch 27 suggests it does), and I'm definitely 
>> not a fan of allocating/deallocating memory on each mailbox transfer, 
>> which is what it looks like you're doing.
>
> I believe that the check_mbox flag should work ok for concurrent CoE 
> access as well (however, I can as well not see how this can happen?) 
> as the check_mbox flag will ensure only one ongoing read request per 
> slave no matter which mailbox protocol.

The issue, as I understand it, is that both fsm_master and fsm_slave have their own separate fsm_coe instances.  (Several other state machines have references to an fsm_coe but it's always handed down from one or the other of these parents.)  So it's just a question of whether fsm_master and fsm_slave can execute (their CoE related parts) concurrently or not.  Which I'm not entirely certain about from looking at the code, but I should add that after adding Frank's coe-lock patch I have observed cases where it has reported concurrent CoE access.  (I haven't been able to get it to happen in my bench testing but it has occurred in field tests; as a result I'm not sure exactly where it's coming from.)
[Knud] I believe the IgH Master has been designed to ensure sequential CoE access using the master state machine, e.g. slave scanning, slave configuration, SDO requests and retrieval of SDO dictionary is done in separate master states. If this is not working, we should fix the errors instead of implementing a locking mechanism for the sending part. The IgH master has on the other hand not been designed to support multiple mailbox protocols running at the same time and we must therefore either dramatically change the design or implement some sort of locking mechanism for the receiving part (as I attempt with the patch).

If there really is concurrent CoE going on, it's not a good idea to send two CoE requests in parallel to the same slave -- some slaves can cope with that (and send both replies) but some may choke, and the order in which they reply is not guaranteed.  So for one thing, your patch doesn't attempt to control sending, only receiving; this could result in both requests being sent, but the FSM that "wins" the check-lock might not be the one whose answer first arrives.  And the non-atomic check I mentioned before could result in both checks being active at once if they're coming from separate threads (which is less likely than sequentially concurrent access, but if you didn't want to protect against threads you wouldn't have used a lock).
[Knud] My primarily focus have been to solve the issue with CoE and EoE running at the same time in different threads and it works very well in my setup and applies for all the other mailbox protocols as well. I believe that I can solve the non-atomic check issue by using the test-and-set pattern as you suggested and I will include that in an updated patch.

> Each fetch data datagram is already allocating memory corresponding to 
> the mailbox size so allocation memory is already heavily used.

I don't believe so.  Each time it does call ec_datagram_prealloc, yes, but this will only allocate memory if the datagram isn't already large enough; it might take a few calls to fully expand the whole datagram ring buffer but after that it should be able to exchange datagrams of any equal or smaller size without reallocation.  (That's why it's a prealloc, not an alloc.) Conversely your version will always free/realloc on every transfer, which is what I'm objecting to.
[Knud] Sorry, yes you have right :-) I will probably change it to a one time allocation based on the slaves configured mailbox size and its supported mailbox protocols. I will include this in an updated patch.

Regards,
Gavin Lambert