[etherlab-dev] Master re-request race with slave mailboxes

Wed Aug 13 12:40:19 CEST 2014

On 13 August 2014, quoth Frank Heckenbach:
> > In my application, on startup it requests the master and then uses
> > ecrt_master_sdo_upload to fetch certain information from slaves (eg.
> > profile, version, etc), both for diagnostics and to help ensure the
> > config is sane.  While this normally works fine, there can be problems
> > if it occurs too soon after the master service is started or after it
> > was last released.
> 
> Just a quick thought, did you try waiting until the dictionaries are
> completely fetched (cf. my patch #28)?

That does help with half of it (the initial startup after starting the
service), but it doesn't help with the release-rerequest race (because the
dictionaries aren't re-fetched during that time).

Basically at some unknown-to-the-app point (following a deactivate/release)
it will pause any in-progress requests, bump the slave back to INIT and then
to PREOP (clearing its mailboxes along the way), and then resume the
in-progress requests -- possibly at a point where it will now futilely poll
the mailbox for a reply that will never come, because the slave already
replied and then erased it at the master's request.  (It's a race, so most
of the time it gets lucky and this only happens sometimes.)

Though you've reminded me that if the application does wait for dictionaries
to be fetched, then implementing #4 in the master should be sufficient to
solve this.  (It's a slight variation on #4 + #1.)

Regards,
Gavin Lambert