[etherlab-dev] [PATCH] A major patchset update

Gavin Lambert gavinl at compacsort.com
Tue Jun 21 10:43:52 CEST 2016


I might be on a bit of a roll now.  I've had some of these changes planned
for a while, and some of them are pretty major (though I think it's worth
it).

 

*         0026-Add-register-read-write-support.patch:
Renamed the ethercat tool command from reg_readwrite to reg_rdwr, as the
prefix matching was interfering with reg_read.

 

*         0029-Disable-DC-SYNC-before-updating-a-running-slave-s-sy.patch:
Dropped.



*         0029-Avoid-changing-running-slaves-DC-offset.patch:
Added as a replacement for the above.  It does the following:

o   If the System Time Offset and Delay registers for a given slave are
already correct, it will not write to them at all.

o   If it wants to change the System Time Offset register but the slave is
already in SAFEOP or OP, then it won't change it (but will still update the
System Time Delay with the transmission delay).

*  Modifying the System Time Offset register (0x0920) causes a step change
in the System Time of the slave, which can cause it to miss the next sync
time (for a 32-bit slave, it can take 4 seconds to recover, and for a 64-bit
slave, it might never sync again).

*  Modifying the System Time Delay register (0x0928) just alters the value
it uses when the normal time sync datagram circulates (as far as I can
tell); this is drift compensated so it will gradually drift to the correct
time instead of stepping straight to it, so shouldn't cause the above
problem.

*  Patches 0001 and 0002 both make it more likely for the master to want to
update the System Time Offset (though they do improve other things, and this
is good for the initial startup case - just less so for the
reconfigure-during-OP case).

o   If it updated either the offset or the delay it will also write register
0x0930 to reset the drift filter (this is recommended in the datasheets).


This should now be cleaner and safer than the previous patch (which disabled
and re-enabled sync outputs, which only works if the slave supports and
enables AssignActivate 0x2000 and might miss some pulses), and better for
general use, since this allows running slaves to always use drift
compensation to adjust their clocks gradually rather than stepping instantly
(while slaves being configured from PREOP can still step immediately).

The patch now applies to all DC-capable slaves - previously it only affected
slaves that use sync pulse generation (AssignActivate 0x0100), but step
changes can be a problem for slaves that perform DC timestamping or other
things too.

It should also avoid DC timing shifts during operation, especially if you
are using a slave as the reference clock and syncing the master clock to it
rather than the reverse.  (Note that the dc_user example code does do the
reverse and actually uses the master clock as the real reference.  If you're
not sure which is which, using ecrt_master_reference_clock_time to get the
slave refclock time and use it in the master code is the former, while using
ecrt_master_sync_reference_clock to send the master clock to the slave
refclock is the latter.  Both approaches should be safe but the latter is
subject to higher jitter and drift.)

However bear in mind that the way I'm using DC I'm not going to notice small
timing errors.  So I'd appreciate it if someone who is using DC more
extensively (especially with motor slaves, which tend to be picky) could
verify it.

(Just to repeat the background from the original patch: if the network
rescans while the master app is running [eg. due to a change in the number
of responding slaves], then without any patch the reconfig process is likely
to update the System Time Offset register, which causes an immediate step
change in the slave's DC clock and can in turn sometimes result in it
missing pulses and possibly stopping altogether, as mentioned above.)



FWIW, the datasheets also recommend sending a flood of resync datagrams
(about 15,000) after changing the clocks, to make it drift in more quickly.
I haven't done this as at the default update rate this would take quite a
while (about a minute, in fact).  But master app code might want to consider
detecting a change to the number of slaves during operation and calling
ecrt_master_sync_slave_clocks more frequently afterwards (if a change to the
number of slaves isn't fatal to your application anyway).  Another
possibility might be to temporarily increase the speed of the drift
adjustment.

 

*         0042-print-sync-signed.patch:
Corrects the DC sync log message to use signed format, since it can be given
negative values.

 

And now here's the major bundle (I skipped some numbers to show grouping):

 

*         0050-fsm_sii_external-datagram.patch:

*         0051-fsm_change-external-datagram.patch:

*         0052-fsm_slave_config-external-datagram.patch:

*         0053-fsm_slave_scan-external-datagram.patch:
These are prep work for the following patches.  They each take one state
machine and convert it from using a single fixed datagram object provided at
init time for all operations (as is the style of fsm_master), to using a
different one provided each time to the exec function (as is the style of
fsm_slave and several low-level FSMs).



*         0054-fsm_slave-handles-all-sdos.patch:
This moves the internal SDO requests and SDO dictionary requests (if you
disable EC_SKIP_SDO_DICT from patch 0010; otherwise dictionary requests
already were effectively moved) from fsm_master into fsm_slave.

This does two important things: firstly it removes the fighting over the CoE
mailbox between the internal and external SDO requests (making the
busy-checking on each side unnecessary).  And secondly it allows both of
these to occur in the background and in parallel between multiple slaves.



*         0055-fsm_slave_config_scan-to-fsm_slave.patch:
This is a big patch (though not the largest in the set; but I couldn't see a
way to split it up any more without making intermediate states that wouldn't
compile or run), and it's a doozy.

Similar to the previous patch, this moves fsm_slave_scan and
fsm_slave_config from fsm_master to fsm_slave.  This allows slave scanning
and configuration to occur in parallel for multiple slaves.  (Note that
scanning all slaves must complete before configuring any slave can begin.)

This also adds scan_required to ec_slave_info_t; when true the other fields
are unreliable (and should be ignored) as scanning has not yet started or is
still in progress.

The motivating case was a network of about 100 slave devices; while scanning
is fast (under a second, after prior SII patches), the configuration process
to bring the slaves from PREOP to OP took about 80 seconds (and you could
see the lights coming on each slave in sequence).  After the patch it takes
about 20 seconds.

I actually originally intended to only move fsm_slave_config, but the
structure of the code required moving fsm_slave_scan as well.  Logically
they do both belong in the slave FSM anyway.

Note that in this case "parallel" does not mean separate threads - all the
FSMs (master and all slaves) still execute on a single thread.  But it can
now include datagrams for multiple slaves in the same frame.  The existing
throttling mechanism for fsm_slave is used, so it will configure slaves in
chunks, not all at once (so the network won't get overloaded if you have a
large number of slaves, though network usage will be higher than it
previously was).



*         0056-fsm-exec-simplify.patch:
Now that most of the FSMs execute from fsm_slave, it's not necessary for
them to check the datagram state, as master.c's ec_master_exec_slave_fsms
does this in advance.  This simplifies the FSM exec functions.

 

These have been tested and appear to work as expected, at least with my
networks.  However there are a few caveats:

 

*         The "ready" state exposed in ec_slave_config_state_t and
ec_slave_info_t by one of my earlier patches (integrated into default) is
less useful than it used to be; it now turns on earlier when the slave is
still awaiting scanning and configuration (in fact it will rarely be false
unless you're hammering the slave with requests).  Since this is relatively
new I doubt it will bother anyone, but I'm open to suggestions.



*         Previously since slaves were scanned and configured in order, you
could check whether the last slave was ready and use this to decide that the
network as a whole has finished configuration (and consequently that things
like SDO requests would be acted on quickly).  Similarly since the last
slave was the last brought to OP, you could have external equipment detect
the network is ready by having an output on this slave (that turns off when
not at OP).

While the last slave should still be among the last slaves to be configured,
it may no longer be the actual last slave configured, especially if some
slaves are faster to configure than others.  Having said that, you can still
assume that requests will be acted on quickly for an individual slave if
that slave claims to be ready (unless it hasn't been configured yet) - it's
only making inferences about other slaves that's more problematic now.



*         Network usage will be higher when scanning/configuring than it
previously was (though it's capped, as mentioned above).  The price for
talking to more slaves at once is that frame sizes are larger (or it might
send multiple frames).  There is a chance that this might overflow your
cycle times if you're running at a very high rate.  Up to EC_EXT_RING_SIZE/2
slave FSMs (default 16) can be running in parallel; you may need to tune
this lower in that case.  (Or you can tune it higher, if you have spare time
and you want it to configure more slaves at once.)  Testing was done at 1ms
cycle times on PREEMPT_RT.

 

Also note that as the numbering implies, they assume that all prior patches
have been applied.  Some of the prior patches have effectively been "baked
in" as this moves their changes from one file to another, so they can't be
trivially reordered.  Each successive patch will compile and run correctly
before the next in the sequence is applied, however.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20160621/205f3d1d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patches-gavinl-20160621.zip
Type: application/x-zip-compressed
Size: 146514 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20160621/205f3d1d/attachment-0001.bin>


More information about the etherlab-dev mailing list