[etherlab-dev] Multiple mailbox protocols and other issues

Knud Baastrup kba at deif.com
Wed Sep 17 10:21:45 CEST 2014


Hi Gavin, Florian and others,

I have updated the 06_mailbox.patch that had a major bug that disrupted the usage of the LRD and LRW commands. I have additionally added the 13_domain_lock.patch that solves an issue with concurrent access to the ecrt_domain_queue() and ecrt_domain_process() functions. These functions are likely to be called from multiple application tasks managing different domains and I believe the master should protect itself as the lack of protection can result in a corrupted master datagram queue. The error revealed itself as UNMATCHED datagrams in syslog with indexes no longer present in the master datagram queue.

Mvh. Knud


-----Original Message-----
From: etherlab-dev-bounces at etherlab.org [mailto:etherlab-dev-bounces at etherlab.org] On Behalf Of Knud Baastrup
Sent: 22. august 2014 11:25
To: Gavin Lambert
Cc: etherlab-dev at etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Hi Gavin, Florian and others,

I have made a few changes in the patches based on your previous comments and as well added a new 12_sdo_directory.patch that I believe will solve the issue with the SDO directory fetching that is very time consuming task during slave scanning. I have attached the patches along with a series files that works well with the hg mq commands.

Patch 12 (12_sdo_directory.patch)
I have added the define EC_SKIP_SDO_DICT that when set will skip the fsm_master SDO directory fetching during slave scanning. I have instead added the SDO directory scanning to the fsm_slave (where it belongs) where a directory fetch can be requested in the same manner as an SDO/register/FoE/SoE request. I have added an ecrt_master_dict_upload in ecrt.h and as well updated the ethercat tool to perform an directory request the first time an "ethercat sdos" command is executed.

Mvh. Knud

-----Original Message-----
From: etherlab-dev-bounces at etherlab.org [mailto:etherlab-dev-bounces at etherlab.org] On Behalf Of Knud Baastrup
Sent: 8. august 2014 11:26
To: Gavin Lambert
Cc: etherlab-dev at etherlab.org
Subject: Re: [etherlab-dev] Multiple mailbox protocols and other issues

Thanks for your feed-back Gavin!

I have in-lined some comments!

Patch 1 to 5 were prepared by a colleague some time back but were at that time not forwarded to the etherlab-dev.

Mvh. Knud

-----Original Message-----
From: Gavin Lambert [mailto:gavinl at compacsort.com]
Sent: 6. august 2014 09:41
To: Knud Baastrup
Cc: 'Florian Pose'; etherlab-dev at etherlab.org
Subject: RE: [etherlab-dev] Multiple mailbox protocols and other issues

On 11 July 2014, quoth Knud Baastrup:
> Patch 2 (ethercat_152_patch_2_foe.patch):
> Wrong datagram used for timeout calculation.

You'll be happy to learn that this patch is not required if you use the latest code; it was fixed in 8bb574da5da2.
[Knud] Great, we use the ec403cf308eb version released 2013-02-12 as baseline and yes I can see that the fix were provided 2 days later in 8bb574da5da2 by 2013-02-14.

> Patch 4 (ethercat_152_patch_4_eoe_mac.patch)
> Change the MAC address for eoe0sY devices to real local administrated 
> MAC addresses based on the NIC part of eth0 and the EoE slaves ring 
> position.

In the line "if (ETH_ALEN >= 5)", shouldn't that be > 5 or >= 6?  Also, if this test fails (which seems unlikely, but then why test if it can't happen?), this will leave the address uninitialized, which seems undesirable.  Maybe it should fall back to the prior code in that case?
[Knud] I agree that it should be either > 5 or >= 6. I will correct and test.

> Patch 5 (ethercat_152_patch_5_priority_inheritance.patch)
> Replaced semaphores with mutexes to utilize priority inheritance and 
> limit impact from lower priority tasks (EtherCAT-EOE) running as 
> sched_other task.

While I like the idea of using RT-mutexes, they do have a minimum kernel version (I'm not sure exactly what that is except that it's somewhere in the 2.6.x series) and currently Etherlab provides drivers for some kernel versions that are before that cutoff, I suspect.  (And do rt_mutexes and RTAI play nicely together or not?)  So this might break compatibility, and so possibly should be a configure option.  Or maybe nobody cares about those older kernel versions any more?  (I don't personally, I'm just wondering if it might be a concern.) [Knud] I agree that it might need to be a configurable option. I do not know how to prepare that in a proper way and will leave it Florian or others to decide.

Also I found a few cases where "down" and "up" hadn't been changed to "rt_mutex_lock" etc.  Not sure if this was the result of a patch application failure or if this was code added since the patches' base version.
[Knud] Probably because I have used ec403cf308eb as base version for the patches.

> Patch 6 (ethercat_152_patch_6_mailbox.patch)
> Alternative solution to Patch 9-10-11 provided by Frank Heckenbach for
> 1.5.0 (that I did not succeed to get up running on 1.5.2).

Did you look at the updated-to-1.5.2 patches that I posted?
[Knud] The ones named ethercat-1.5.0-patches-v2.tar.bz2 ? No I was at that point quite happy with my current alternative implementation.

> In this solution I accept that a mailbox read request (e.g. FP-RD) for 
> a given mailbox protocol can return data from any other mailbox 
> protocol running at the same time.  [...]

In master/master.c near line 1240, on receipt of a datagram you're searching through the slave list to check its mailbox settings.  This code appears unsafe in the case when this search fails (which presumably could occur eg.
if a datagram with a corrupted address arrives or if slave scanning has just started and cleared master->slaves).
[Knud] I agree, I must ensure that the datagram is ignored if no matching station address. In the current patch such a datagram will be managed by the last slave and that were not the intention. I will correct and test.

Also this seems to generate quite a lot of "Await configured mailbox address" spam at debug level 1 even when no mailbox activity is taking place..?
[Knud] I will remove the debug message, it is not that relevant as you will know from other messages if mailbox offset address configuration fails for a slave. I will correct and test.

When compiling with a recent kernel version (3.13) I needed to #include <linux/rtmutex.h> in master/slave.h in order to compile several files (the first is master/fmmu_config.c).  This doesn't appear to be needed in 3.2 however (or by some of the other .c files); I guess the indirect includes have changed?
[Knud] I have been testing with 3.4.37-rt51

> Patch 10 (ethercat_152_patch_10_scan_skip_stats.patch)
> No reason to write output statistics in syslog when issuing a slave
scanning 
> where UNMATCHED datagrams are expected behavior.

I'm not sure I follow this one.  Why are unmatched datagrams expected?
Also, this patch isn't going to stop them printing, just delay it until the scan completes.
[Knud] Unprocessed datagrams will appear as UNMATCHED when you initiate an ethercat rescan but I just need to know that a rescan has taken place (as I already know that this will impact the EtherCAT bus and ongoing datagrams). After the rescan I want indeed to know about UNMATCHED datagrams as these in that case will appears for other reasons not related to the slave scanning.

Otherwise, this patchset seems to work ok.  I've only given it fairly basic testing thus far though.

However there was a bit of fuzz and other conflicts when trying to apply these to the latest HG source, so in the interests of making it easier for Florian and anyone else using the latest source to examine and test these patches, I've attached updated versions.  These have only had the following
changes:
 - de-fuzzed based on 8dd49f6f6d32 (close to stable-1.5 tip).
 - various tabs converted to whitespace and related minor whitespace changes.
   - I noticed some inconsistent newline-brace styles but I left those alone.
 - omitted patch 2 as specified above (it was empty after defuzzing).
 - converted a few extra down/ups in patch 5 as specified above.
 - added #include to master/slave.h in patch 6 as specified above.
[Knud] Thanks :-)

In particular I've basically just included changes required to compile; I haven't tried to fix any of the other possible issues mentioned above.
[Knud] I will apply the remaining changes and send some new patch files based on the ones you updated for me.

I've also included a series file, so in theory it should immediately be MQ-compatible if extracted into the .hg dir.  (You might need to "hg qq -c knud" first.) [Knud] Great, I used some time to understand hg and realize now how great the MQ extension is to manage patches. We use however GIT for our patches, so I will try to apply the same in GIT.

Regards,
Gavin Lambert

_______________________________________________
etherlab-dev mailing list
etherlab-dev at etherlab.org
http://lists.etherlab.org/mailman/listinfo/etherlab-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patches-knud-17092014.zip
Type: application/x-zip-compressed
Size: 33978 bytes
Desc: patches-knud-17092014.zip
URL: <http://lists.etherlab.org/pipermail/etherlab-dev/attachments/20140917/b99135a8/attachment-0001.bin>


More information about the etherlab-dev mailing list