[etherlab-users] r8169 patch - packet timeout boot failures

Tue Dec 3 12:50:39 CET 2013

Just a thought since you mention booting: is it possible your driver is
sometimes simply loaded before the master is (and fails to register) ? You
mention that you crashed upon boot without the spinlocks and the only way
to do that should be that you run as a regular netdev device (line 4170)
incl. irqs. Could also explain why the e1000 has the problem.

I suspect that adding the link status check merely causes an extra delay
which could lead to the master being loaded earlier.

J.

2013/12/3 Raz <raziebe at gmail.com>

> All i am doing is more of a trial and error. I do not know the realtek
> driver at all.
> The spinlock are needed because they are protected in the original driver
> code flow . i had a boot lockup in one of my trials without them.  This
> patch does not eliminate the problem entirely, but from 10 trials with 6
> drives with a 100% failures to 1 out of 10 I believe it important enough to
> mail to the community. as for e1000e i do not know what the problem is, i
> need to check it and email you.
>
>
>
> On Tue, Dec 3, 2013 at 1:16 PM, Jeroen Van den Keybus <
> jeroen.vandenkeybus at gmail.com> wrote:
>
>> Why the spinlock ? This driver instance shouldn't ever be reentering.
>>
>> I'm a bit worried that it would complicate the use of e.g. RTAI and
>> Xenomai.
>>
>> How comes the e1000 has the same issue ?
>>
>> J.
>>
>>
>>
>> 2013/12/3 Raz <raziebe at gmail.com>
>>
>>> The bellow patch seemed to eliminate the problem. I believe the problem
>>> relates to resetting some registers when link up is detected.
>>>
>>> diff --git a/local_src/r8169-3.2/r8169.c b/local_src/r8169-3.2/r8169.c
>>> index 6df1793..a483fb5 100644
>>> --- a/local_src/r8169-3.2/r8169.c
>>> +++ b/local_src/r8169-3.2/r8169.c
>>> @@ -1290,6 +1290,9 @@ static void __rtl8169_check_link_status(struct
>>> net_device *dev,
>>>
>>>         if (tp->ecdev) {
>>>                 ecdev_set_link(tp->ecdev, tp->link_ok(ioaddr) ? 1 : 0);
>>> +               spin_lock_irqsave(&tp->lock, flags);
>>> +               rtl_link_chg_patch(tp);
>>> +               spin_unlock_irqrestore(&tp->lock, flags);
>>>                 return;
>>>         }
>>>
>>>
>>>
>>> On Tue, Dec 3, 2013 at 11:56 AM, Jeroen Van den Keybus <
>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>
>>>> Perhaps try hooking up a normal eth interface to the drive and see what
>>>> the autoneg comes up with using ethtool. In the past, I have had trouble
>>>> interfacing an FPGA IP core to a PC Ethernet card when the core was hard
>>>> wired to 100M FD instead of advertising this using autoneg. The PC card
>>>> tried to autoneg and then fell back to 100M HD.
>>>>
>>>> You could try testing with an EK1100 in between the PC and the drive.
>>>>
>>>> J.
>>>>
>>>>
>>>> 2013/12/3 Raz <raziebe at gmail.com>
>>>>
>>>>> I do not have ethtool over the ethercat device as it is removed. How
>>>>> can I tell ? eth0 is 100Mbps but it is my public interface. eth1 is my
>>>>> ethercat interface.
>>>>>
>>>>> There is always a link.  the first slave is a drive, not an io device
>>>>> . This drive is running xilinix with port stack and ip core of beckhof.
>>>>> I am trying to debug now the realtek driver, let see...
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 3, 2013 at 11:36 AM, Jeroen Van den Keybus <
>>>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>>>
>>>>>> It would be very useful to know whether e.g. the interfaces ended up
>>>>>> in 100M half duplex or so. Is there a link in those cases ? What's the
>>>>>> first EtherCAT station ? Maybe it doesn't handle autoneg properly during
>>>>>> its reset phase ?
>>>>>>
>>>>>> J.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2013/12/3 Raz <raziebe at gmail.com>
>>>>>>
>>>>>>> hey
>>>>>>> Problem happens with intel e1000e as well as realtek.  One way to
>>>>>>> bypass it is to boot the master while the ethernet-ethercat cable is
>>>>>>> disconnected, and once master claims the interface , connect this cable.
>>>>>>> This appears to work.
>>>>>>> So , There some sort of of initialisation error.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 2, 2013 at 11:32 AM, Raz <raziebe at gmail.com> wrote:
>>>>>>>
>>>>>>>> I still do not have a scenario. it "sometimes" happens. The
>>>>>>>> -DRTL8169_DEBUG is something i did not know, so i will check and see. thx
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 11:27 AM, Jeroen Van den Keybus <
>>>>>>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Is there a difference between cold and warm boot ? Does unloading
>>>>>>>>> the ec driver, loading/unloading the stock r8169 driver and then reloading
>>>>>>>>> the ec driver work better ? Same scenario but with Realtek drivers (r8168)
>>>>>>>>> ? Also perhaps compile with -DRTL8169_DEBUG ?
>>>>>>>>>
>>>>>>>>> Just some thoughts.
>>>>>>>>>
>>>>>>>>> J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2013/12/2 Raz <raziebe at gmail.com>
>>>>>>>>>
>>>>>>>>>> The timeouts happens after the system boots and not while slaves
>>>>>>>>>> are in in OP mode. So my transmit is irrelevant here, even though a
>>>>>>>>>> transmit happens only from a single thread of through an ioctl ( SDO reads
>>>>>>>>>> and so on..)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 2, 2013 at 11:01 AM, Jeroen Van den Keybus <
>>>>>>>>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 1. why do you disable the rtl8169_phy_timer  timer ?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The rtl8169_phy_timer is regularly polled in ec_poll instead.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 2.  In rtl_hw_start_8168 : why do disable RTL_W16(IntrMask,
>>>>>>>>>>>> tp->intr_event); ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> The drivers are all non-blocking and interrupt-free. All work
>>>>>>>>>>> that interrupt handlers normally do is done in ec_poll instead.
>>>>>>>>>>>
>>>>>>>>>>> If you cannot send packets anymore, I suspect that you may have
>>>>>>>>>>> overrun the tx queue, i.e. sent a packet before the previous one has been
>>>>>>>>>>> completed. You're also not calling the ethercat transmission functions from
>>>>>>>>>>> different threads, right ?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> thank you
>>>>>>>>>>>> raz
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> etherlab-users mailing list
>>>>>>>>>>>> etherlab-users at etherlab.org
>>>>>>>>>>>> http://lists.etherlab.org/mailman/listinfo/etherlab-users
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> https://sites.google.com/site/ironspeedlinux/
>>>
>>
>>
>
>
> --
> https://sites.google.com/site/ironspeedlinux/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20131203/99e20da8/attachment-0003.htm>