[etherlab-dev] [PATCH] A whole lotta patchin' goin' on

Gavin Lambert gavinl at compacsort.com
Wed Mar 11 01:45:16 CET 2015


Hi all,

I've attached my current patch series from stable-1.5.  This fixes various
issues that I've encountered while using the master library.  I've posted
some of these before but it seemed best to repost the full bundle.

Note that this bundle includes a few select patches from Frank Heckenbach's
and Knud Baastrup's previously posted patch series.  There are a few
included patches that I probably haven't tested very thoroughly (notably
anything to do with EoE, since I don't use that myself), and conversely
there are probably patches that I haven't included that are perfectly good
and should get merged to mainline.  This particular bundle mostly uses
Knud's patches for mailbox queuing and out-of-order replies, since they seem
"simpler" -- I have an alternate bundle that uses Frank's patches if you'd
prefer to see that.

The bundle is formatted as an HG MQ patch queue (which also works with quilt
and other similar utilities), so the series file defines the intended
application order.  But the short version is that there are a few
low-numbered patches to be applied first, followed by Frank & Knud's
patches, then the high-numbered gavinl patches at the end.

Each patch has a brief description at the top (intended as a commit
message), but I'll go through them all here as well:

gavinl-0001-deactivate_unmap:
  Deactivating the master from userspace did not release its process data
memory mapping -- only releasing it did.  This means that if a master is
deactivated, reconfigured, and reactivated, it becomes impossible to
"really" release without terminating the application, due to a dangling
handle use.  This patch moves the deallocation to where it belongs.

gavinl-0002-dc_refclk_not_op:
  The reference clock defaults to the first slave on the network.  If this
is not actually included in the application config, then it may fail to
transition to OP when requested, causing syslog spam.  Since I can't think
of any reason why a refclock would not function correctly when left in
PREOP, this patch changes that default behaviour -- however just in case
there's some weird slave out there that needs this, it can be enabled again
via configure.

gavinl-0003-refclk_nxio:
  When using ecrt_master_reference_clock_time in a loop (eg. as part of
synchronising the master to the refclock rather than the reverse), you're
very likely to call it before a reference clock has been selected.  When
you're doing this from a userspace app, this generates a lot of pointless
stderr spam.  This patch removes this, as any app calling this should be
prepared to deal with failure in a sensible way anyway.  (Arguably all the
other stderr prints in this file should be made optional as well, but that's
a separate issue.)

gavinl-0004-abort_slave_config_reg_requests:
  If a slave_config register request is in progress and the corresponding
slave goes offline, the request is aborted but left permanently in BUSY
state.  This patch marks the request with FAILURE/ERROR in this case.

gavinl-0005-abort_detached_requests:
  Similarly slave_config SDO and register requests that are queued for
slaves that are offline will never be started and will simply remain BUSY
forever from the perspective of the application.  This patch aborts (and
marks as FAILURE/ERROR) these requests.

gavinl-0006-dc_sync_vs_sys_time:
  When adjusting the system time (via the delay register, which is a step
adjustment, unlike the normal sync datagram) of a slave that is in OP and
has sync outputs enabled, there is a danger (especially if time is moved
forwards) that the sync outputs will stop (for 64-bit DC slaves this will be
permanent -- with 32-bit DC slaves the outputs may halt for about 4 seconds
and then resume).  This patch disables sync outputs temporarily before
updating the time and then re-enables them afterwards.  The standard
recommends also updating the DC Start Time registers in this case but this
did not seem necessary in my testing -- however it's possible that this is a
slave-specific thing and some slaves may need those registers updated as
well.

Next up are Frank & Knud's patches; these should be identical to those
previously posted, except possibly for a little de-fuzzing.

gavinl-1001-mbox_finish_sm:
  The mailbox state machines (fsm_coe, fsm_foe, fsm_soe) exec method
implementation (the return value) does not match what the parent state
machine was expecting it to mean.  In particular the parent state machine is
expecting a return value of 0 to mean that the state machine has completed,
but the child machine returned 0 in cases when it wasn't finished but it
didn't have a datagram to send that time (because it was waiting on
something else).  This could cause the state machine to be aborted early,
which causes problems with Frank's locking patches in particular (and
probably other things).  This patch makes it return 1 but with an
EC_DATAGRAM_INVALID datagram, so that the parent state machine knows that it
is not finished but there is no datagram to send.

gavinl-1002-foe_read_debug:
  This patch fixes the packet number for FoE read transfers, and improves
error response handling.  It also includes some extra (disabled by default)
debugging output in case you want to trace transfers more closely.

gavinl-1003-reboot:
  This patch adds a command line tool feature to request that a slave
perform a hardware reboot on itself.  It can either reboot individual slaves
or as a broadcast to the entire network.  This does only work if the slave
supports the option, but this is a standard ESC feature so it's likely that
many will.  It requires a new IOCTL number, which by convention should be in
the first group with the other command-line-tool ioctls, but I didn't want
to renumber everything in the patch so it's just added to the end (I assume
that they will get renumbered if this is integrated to mainline).  Also note
that this requires EC_DATAGRAM_INVALID support (from either Frank's or
Knud's patches) because it must be able to guarantee that the master sends
no other datagrams for a certain time after sending the reboot datagrams.

gavinl-1004-reboot_knud:
  This is just a small amendment to the prior patch to make it compile if
the knud-0004 patch (rt mutexes) is applied.

1005:
  There's a hole here because I decided one of the patches needed more
testing before I release it. :)

gavinl-1006-sdo_dict_fast_arrays:
  When reading the SDO dictionary, this recognises arrays and will only read
the description for the first element, since the standard requires that all
subsequent elements be identical apart from a trivial name change.  This may
save a bit of time if a given slave has large arrays, but (particularly
after knud-0010) it's not a major performance improvement, so I consider
this one optional.

gavinl-1007-more_state:
  This just adds a few more internal bits of state to the application
interface, allowing it to know when certain operations are likely to time
out or take a long time.

gavinl-1008-topology_bypass:
gavinl-1009-topology_upstream:
  Collectively these two patches are an attempt to improve behaviour in the
case that redundant links are used in the network, either internal subloops
or a main loop back to redundant master ports.  In either of these cases if
there is not actually a break in the network then packets are not actually
"received" by certain ports on the slaves, and as a result their timestamps
are not updated by the delay measurement datagram.  The first patch marks
these ports as "bypassed" in this case, which means that their timestamps
are not to be trusted even if the port is open.  The second patch uses this
information along with the DC times of the non-bypassed ports to work out
which port is *actually* the upstream (closest to master) port, instead of
prior behaviour that assumed this was always port 0.
  It then uses this to try to more accurately calculate the topology,
however this part is still incomplete and it's still going to get horribly
confused with reversed links (including when redundancy is active following
a break).  As a consequence of some receive timestamps not being usable, the
delay measurement and DC offsets are also wrong in this case, but after this
patch they are at least closer to accurate (they're zero or otherwise
smaller than correct, instead of typically being multiple seconds as before
this patch).  Possibly this could be improved further by assuming some
reasonable default delay instead of zero when the delay is not measurable.
  The motivation for this might be a little complicated, so let me know if
you want more explanation.  But I consider this a net improvement in
behaviour even if it's not perfect.  (I'm not sure if perfect is even
achievable, given limitations of information readable from the network in
reversed-wiring scenarios.)
  Note also that there are still some issues (from before this patch) with
DC synch and the backup link that are still unaddressed.

gavinl-1010-redundant_device:
  This changes several places where a slave's ring position is logged to
also display the link name ("main" or "backup").  This is because when
master-port redundancy was active, the first slave responding on each link
would both have been "0-0" previously; now one is "0-main-0" and the other
is "0-backup-0".  This will still affect the log output even if redundancy
is not enabled.


One final note is that several of these patches alter the ioctl interface.
I've intentionally excluded changes to the EC_IOCTL_VERSION_MAGIC because
depending on how the various patches are applied (all at once or in smaller
doses) this may need to change once or several times.  But if you're testing
out these patches, don't forget to change it -- it helps catch build
incompatibilities.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patches-gavinl.zip
Type: application/octet-stream
Size: 86070 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20150311/387287a5/attachment-0001.obj>


More information about the etherlab-dev mailing list