[etherlab-dev] Support for multiple mailbox protocols

Tue Jul 1 03:17:16 CEST 2014

On 30 June 2014, quoth Jun Yuan:
> 1) The master can send max. only one mailbox request in one cycle.

Per slave, yes.

> 2) Before a mailbox request is going to be sent, the master should 
> check the SM0 register 0x800 whether the send mailbox is full or 
> empty, and send the request only if it is empty.

This is not necessary.  If the mailbox is still full then an attempt to write to it will simply fail with WC=0.  As the check and send would have to be in different frames, it's more efficient just to try to send blindly (because the most common case will be that the mailbox is empty), and retry later in case of failure.

Although some care might be required with that; WC=0 does not always mean that the mailbox is full, it can also occur if the packet has somehow missed the slave, or the incoming packet could get lost and time out (in which case you won't know whether the slave received it or not).

While testing Frank's patches (which include an auto-retry-send mechanism, since as he noted the higher level FSMs were inconsistent whether they retried sends or not) on an unreliable network I found a case where a send timed out, so it sent the message again, and then the slave replied twice (which the master wasn't expecting).  This suggests that the slave did receive the first send but the incoming WC=1 response got lost.  (Actually in this specific case it was merely delayed rather than really lost, but it had the same effect.)  Ultimately everything recovered from this properly due to higher-level retries, but it did result in some warnings and errors getting logged.  (See also discussion at the bottom.)

(Note that it's similarly possible to blindly issue fetch requests instead of doing the two-part check+fetch that Etherlab currently does -- but there the tradeoff is more dubious as it's expected to take some time to generate the response, so there will be several failures before it succeeds, so using the "cheaper" check datagram makes more sense.  It's also possible to explicitly map an FMMU into a domain to cyclically poll the mailbox state of all slaves, if a realtime loop is running, and if the slave provides an extra FMMU for this purpose [but most CoE slaves do].)

> 3) Mailbox service with different protocol can be executed in parallel. 
> That is, the master can send a new mailbox request, while the last 
> conversation in another protocol is not finished yet, given that the 
> send mailbox is already cleared.

I believe so, yes.  I can't rule out the possibility that some slaves might not be able to cope with this, but in general I would expect that any slave prepared to process multiple protocols should be able to keep track of those conversations in parallel, as they're always described as independent FSMs.

> 4) We're not sure if all the slave can support multiple CoE conversations 
> in parallel. So the master should start a new CoE conversation only when 
> the last one is finished. The same applies to SoE/FoE/VoE.

Yes.  I believe that most of the time it would actually work -- if a slave is processing requests entirely synchronously, then it will start processing the first received request and leave the second either in the mailbox or in its internal queue until it's done with the first, and everything should just work out.  But slaves are allowed to process requests asynchronously (and may choose to do so if something eg. requires configuring onboard hardware) and that's where trouble could start.  Additionally, the way that the Etherlab master code is written at the moment means (I think) the order in which the requests are actually sent is not known at the higher level, and there isn't a way to ensure the reply is matched up correctly when it arrives.

Another consideration is that it's possible for a single request or response to consist of multiple mailbox exchanges (eg. if a response is fragmented because it doesn't fit in the mailbox).  I'm not sure if this is a generic thing but I do know that SDO Information responses can get fragmented this way, and it's possible for other responses to get injected in the middle of a fragmented response, and that would presumably be a major pain to disambiguate if they didn't have different protocols.

And again, in the standards the various protocols are described as state machines that react to mailbox data in specific ways that imply you shouldn't be trying to concurrently access the same state machine with different requests.

So I don't feel like it's a good idea to attempt sending the same protocol in parallel.

(The one fly in the ointment is that CoE Emergencies technically belong to the CoE protocol, but they're described as a separate state machine, implying that emergencies can arrive at any time, including mid-fragmented-CoE-message.  But I think the Etherlab CoE state machine is already prepared for that.)

> While reading the documentation and another open source ethercat project 
> SOEM, I found there is a mailbox service "counter" besides the service 
> type in the mailbox header. It says, "Counter of the mailbox services 
> (0 ist start value, next value after 7 is 1)". I wonder if what this 
> counter is used for, how is it implemented in your slave example code, 
> whether it could be useful for us in the multiple conversation situation.

As I understand it, mostly it's intended as a way to avoid the situation above with repeated requests causing duplicated responses.  The idea is that when the master is sending a request it picks a value from 1-7 to go in there (this should increment with each unique request according to the spec, but slaves shouldn't be overly picky about it).  If the send gets WC=0, then it can repeat the request *with the same counter*.  If the slave receives two *consecutive* requests with the same counter value, the second is ignored, which would have meant the scenario above would have resulted in only one reply, and everyone would have been happy.  Note that the counter is global and not per-protocol.  Also note that this does mean that even sends for different protocols need to be aware of each other at the lower level in order to set the correct counter and retry if necessary before sending the subsequent request.

When the master uses a counter value of 0 (which Etherlab currently always does) then this is bypassed and all requests are processed.  Similarly when the slave generates responses into the receive mailbox it may either always use 0 for the counter or it may increment 1-7, but this is independent from the send mailbox counter.

So it's not really intended to deal with multiple conversation threads.  There are some other fields in the mailbox header that do look like they're intended for that sort of thing (channel and priority) but currently they're reserved in the spec and not actually implemented AFAIK.

There's also a mechanism for getting a slave to repeat a response without re-sending the request (which might have side effects), which could be useful if a check indicated something in the read mailbox but then the subsequent fetch timed out (after actually succeeding at clearing the read mailbox).  This involves a register write to 0x080E, but I haven't looked too closely at the specifics, or how likely slaves are to implement it.

Regards,
Gavin Lambert