[etherlab-dev] [PATCH] A whole lotta patchin' goin' on

Gavin Lambert gavinl at compacsort.com
Fri Mar 13 04:11:48 CET 2015


Attached two more patches, one of which is the missing one from the prior
message (despite the numbers, they can basically be applied in any order,
but that's the minimum-fuzz locations).

gavinl-1005-quick_op_watchdog:
  When some kind of comms error occurs and a slave watchdogs back to
SAFEOP+ERROR, this patch will detect that the error case was specifically
due to a watchdog timeout and will transition it straight back to OP instead
of going all the way back to PREOP and fully reconfiguring it.  As a side
benefit it also caches the last AL status code from the slave, which could
potentially be made available to the application (although this patch does
not do so).

gavinl-1011-e1000e_watchdog:
  This resolves an issue with the e1000e driver that I mentioned earlier --
when wired for cable redundancy, the second port didn't establish link until
the network broke, causing an unacceptable delay in failover to the
redundant connection.  Turns out the problem was that the port watchdog has
the job of detecting link up/down and the watchdog was not run if the port
was receiving packets, even if it didn't think it had a link.  (With
redundant wiring, it would transmit on the main link and receive back on the
backup link, resetting the backup link's watchdog each time so that it never
ran.)  This patch removes the reset of watchdog on receive, so that the
watchdog runs every 2 seconds regardless.
  I haven't checked the other network drivers to see if they're similarly
afflicted.


On 11 March 2015 13:45, I quoth:
> I've attached my current patch series from stable-1.5.  This fixes various
> issues that I've encountered while using the master library.  I've posted
> some of these before but it seemed best to repost the full bundle.
> 
> Note that this bundle includes a few select patches from Frank
Heckenbach's
> and Knud Baastrup's previously posted patch series.  There are a few
included
> patches that I probably haven't tested very thoroughly (notably anything
to
> do with EoE, since I don't use that myself), and conversely there are
> probably patches that I haven't included that are perfectly good and
should
> get merged to mainline.  This particular bundle mostly uses Knud's patches
> for mailbox queuing and out-of-order replies, since they seem "simpler" --
I
> have an alternate bundle that uses Frank's patches if you'd prefer to see
> that.
> 
> The bundle is formatted as an HG MQ patch queue (which also works with
quilt
> and other similar utilities), so the series file defines the intended
> application order.  But the short version is that there are a few low-
> numbered patches to be applied first, followed by Frank & Knud's patches,
> then the high-numbered gavinl patches at the end.
> 
> Each patch has a brief description at the top (intended as a commit
message),
> but I'll go through them all here as well:
> 
> gavinl-0001-deactivate_unmap:
>   Deactivating the master from userspace did not release its process data
> memory mapping -- only releasing it did.  This means that if a master is
> deactivated, reconfigured, and reactivated, it becomes impossible to
"really"
> release without terminating the application, due to a dangling handle use.
> This patch moves the deallocation to where it belongs.
> 
> gavinl-0002-dc_refclk_not_op:
>   The reference clock defaults to the first slave on the network.  If this
is
> not actually included in the application config, then it may fail to
> transition to OP when requested, causing syslog spam.  Since I can't think
of
> any reason why a refclock would not function correctly when left in PREOP,
> this patch changes that default behaviour -- however just in case there's
> some weird slave out there that needs this, it can be enabled again via
> configure.
> 
> gavinl-0003-refclk_nxio:
>   When using ecrt_master_reference_clock_time in a loop (eg. as part of
> synchronising the master to the refclock rather than the reverse), you're
> very likely to call it before a reference clock has been selected.  When
> you're doing this from a userspace app, this generates a lot of pointless
> stderr spam.  This patch removes this, as any app calling this should be
> prepared to deal with failure in a sensible way anyway.  (Arguably all the
> other stderr prints in this file should be made optional as well, but
that's
> a separate issue.)
> 
> gavinl-0004-abort_slave_config_reg_requests:
>   If a slave_config register request is in progress and the corresponding
> slave goes offline, the request is aborted but left permanently in BUSY
> state.  This patch marks the request with FAILURE/ERROR in this case.
> 
> gavinl-0005-abort_detached_requests:
>   Similarly slave_config SDO and register requests that are queued for
slaves
> that are offline will never be started and will simply remain BUSY forever
> from the perspective of the application.  This patch aborts (and marks as
> FAILURE/ERROR) these requests.
> 
> gavinl-0006-dc_sync_vs_sys_time:
>   When adjusting the system time (via the delay register, which is a step
> adjustment, unlike the normal sync datagram) of a slave that is in OP and
has
> sync outputs enabled, there is a danger (especially if time is moved
> forwards) that the sync outputs will stop (for 64-bit DC slaves this will
be
> permanent -- with 32-bit DC slaves the outputs may halt for about 4
seconds
> and then resume).  This patch disables sync outputs temporarily before
> updating the time and then re-enables them afterwards.  The standard
> recommends also updating the DC Start Time registers in this case but this
> did not seem necessary in my testing -- however it's possible that this is
a
> slave-specific thing and some slaves may need those registers updated as
> well.
> 
> Next up are Frank & Knud's patches; these should be identical to those
> previously posted, except possibly for a little de-fuzzing.
> 
> gavinl-1001-mbox_finish_sm:
>   The mailbox state machines (fsm_coe, fsm_foe, fsm_soe) exec method
> implementation (the return value) does not match what the parent state
> machine was expecting it to mean.  In particular the parent state machine
is
> expecting a return value of 0 to mean that the state machine has
completed,
> but the child machine returned 0 in cases when it wasn't finished but it
> didn't have a datagram to send that time (because it was waiting on
something
> else).  This could cause the state machine to be aborted early, which
causes
> problems with Frank's locking patches in particular (and probably other
> things).  This patch makes it return 1 but with an EC_DATAGRAM_INVALID
> datagram, so that the parent state machine knows that it is not finished
but
> there is no datagram to send.
> 
> gavinl-1002-foe_read_debug:
>   This patch fixes the packet number for FoE read transfers, and improves
> error response handling.  It also includes some extra (disabled by
default)
> debugging output in case you want to trace transfers more closely.
> 
> gavinl-1003-reboot:
>   This patch adds a command line tool feature to request that a slave
perform
> a hardware reboot on itself.  It can either reboot individual slaves or as
a
> broadcast to the entire network.  This does only work if the slave
supports
> the option, but this is a standard ESC feature so it's likely that many
will.
> It requires a new IOCTL number, which by convention should be in the first
> group with the other command-line-tool ioctls, but I didn't want to
renumber
> everything in the patch so it's just added to the end (I assume that they
> will get renumbered if this is integrated to mainline).  Also note that
this
> requires EC_DATAGRAM_INVALID support (from either Frank's or Knud's
patches)
> because it must be able to guarantee that the master sends no other
datagrams
> for a certain time after sending the reboot datagrams.
> 
> gavinl-1004-reboot_knud:
>   This is just a small amendment to the prior patch to make it compile if
the
> knud-0004 patch (rt mutexes) is applied.
> 
> 1005:
>   There's a hole here because I decided one of the patches needed more
> testing before I release it. :)
> 
> gavinl-1006-sdo_dict_fast_arrays:
>   When reading the SDO dictionary, this recognises arrays and will only
read
> the description for the first element, since the standard requires that
all
> subsequent elements be identical apart from a trivial name change.  This
may
> save a bit of time if a given slave has large arrays, but (particularly
after
> knud-0010) it's not a major performance improvement, so I consider this
one
> optional.
> 
> gavinl-1007-more_state:
>   This just adds a few more internal bits of state to the application
> interface, allowing it to know when certain operations are likely to time
out
> or take a long time.
> 
> gavinl-1008-topology_bypass:
> gavinl-1009-topology_upstream:
>   Collectively these two patches are an attempt to improve behaviour in
the
> case that redundant links are used in the network, either internal
subloops
> or a main loop back to redundant master ports.  In either of these cases
if
> there is not actually a break in the network then packets are not actually
> "received" by certain ports on the slaves, and as a result their
timestamps
> are not updated by the delay measurement datagram.  The first patch marks
> these ports as "bypassed" in this case, which means that their timestamps
are
> not to be trusted even if the port is open.  The second patch uses this
> information along with the DC times of the non-bypassed ports to work out
> which port is *actually* the upstream (closest to master) port, instead of
> prior behaviour that assumed this was always port 0.
>   It then uses this to try to more accurately calculate the topology,
however
> this part is still incomplete and it's still going to get horribly
confused
> with reversed links (including when redundancy is active following a
break).
> As a consequence of some receive timestamps not being usable, the delay
> measurement and DC offsets are also wrong in this case, but after this
patch
> they are at least closer to accurate (they're zero or otherwise smaller
than
> correct, instead of typically being multiple seconds as before this
patch).
> Possibly this could be improved further by assuming some reasonable
default
> delay instead of zero when the delay is not measurable.
>   The motivation for this might be a little complicated, so let me know if
> you want more explanation.  But I consider this a net improvement in
> behaviour even if it's not perfect.  (I'm not sure if perfect is even
> achievable, given limitations of information readable from the network in
> reversed-wiring scenarios.)
>   Note also that there are still some issues (from before this patch) with
DC
> synch and the backup link that are still unaddressed.
> 
> gavinl-1010-redundant_device:
>   This changes several places where a slave's ring position is logged to
also
> display the link name ("main" or "backup").  This is because when
master-port
> redundancy was active, the first slave responding on each link would both
> have been "0-0" previously; now one is "0-main-0" and the other is
"0-backup-
> 0".  This will still affect the log output even if redundancy is not
enabled.
> 
> 
> One final note is that several of these patches alter the ioctl interface.
> I've intentionally excluded changes to the EC_IOCTL_VERSION_MAGIC because
> depending on how the various patches are applied (all at once or in
smaller
> doses) this may need to change once or several times.  But if you're
testing
> out these patches, don't forget to change it -- it helps catch build
> incompatibilities.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: gavinl-1005-quick_op_watchdog.patch
Type: application/octet-stream
Size: 5253 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20150313/0b0eb6d7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gavinl-1011-e1000e_watchdog.patch
Type: application/octet-stream
Size: 12667 bytes
Desc: not available
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20150313/0b0eb6d7/attachment-0003.obj>


More information about the etherlab-dev mailing list