[etherlab-users] r8169 patch - packet timeout boot failures

Tue Dec 3 12:16:54 CET 2013

Why the spinlock ? This driver instance shouldn't ever be reentering.

I'm a bit worried that it would complicate the use of e.g. RTAI and Xenomai.

How comes the e1000 has the same issue ?

J.

2013/12/3 Raz <raziebe at gmail.com>

> The bellow patch seemed to eliminate the problem. I believe the problem
> relates to resetting some registers when link up is detected.
>
> diff --git a/local_src/r8169-3.2/r8169.c b/local_src/r8169-3.2/r8169.c
> index 6df1793..a483fb5 100644
> --- a/local_src/r8169-3.2/r8169.c
> +++ b/local_src/r8169-3.2/r8169.c
> @@ -1290,6 +1290,9 @@ static void __rtl8169_check_link_status(struct
> net_device *dev,
>
>         if (tp->ecdev) {
>                 ecdev_set_link(tp->ecdev, tp->link_ok(ioaddr) ? 1 : 0);
> +               spin_lock_irqsave(&tp->lock, flags);
> +               rtl_link_chg_patch(tp);
> +               spin_unlock_irqrestore(&tp->lock, flags);
>                 return;
>         }
>
>
>
> On Tue, Dec 3, 2013 at 11:56 AM, Jeroen Van den Keybus <
> jeroen.vandenkeybus at gmail.com> wrote:
>
>> Perhaps try hooking up a normal eth interface to the drive and see what
>> the autoneg comes up with using ethtool. In the past, I have had trouble
>> interfacing an FPGA IP core to a PC Ethernet card when the core was hard
>> wired to 100M FD instead of advertising this using autoneg. The PC card
>> tried to autoneg and then fell back to 100M HD.
>>
>> You could try testing with an EK1100 in between the PC and the drive.
>>
>> J.
>>
>>
>> 2013/12/3 Raz <raziebe at gmail.com>
>>
>>> I do not have ethtool over the ethercat device as it is removed. How can
>>> I tell ? eth0 is 100Mbps but it is my public interface. eth1 is my ethercat
>>> interface.
>>>
>>> There is always a link.  the first slave is a drive, not an io device .
>>> This drive is running xilinix with port stack and ip core of beckhof.
>>> I am trying to debug now the realtek driver, let see...
>>>
>>>
>>>
>>>
>>> On Tue, Dec 3, 2013 at 11:36 AM, Jeroen Van den Keybus <
>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>
>>>> It would be very useful to know whether e.g. the interfaces ended up in
>>>> 100M half duplex or so. Is there a link in those cases ? What's the first
>>>> EtherCAT station ? Maybe it doesn't handle autoneg properly during its
>>>> reset phase ?
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>> 2013/12/3 Raz <raziebe at gmail.com>
>>>>
>>>>> hey
>>>>> Problem happens with intel e1000e as well as realtek.  One way to
>>>>> bypass it is to boot the master while the ethernet-ethercat cable is
>>>>> disconnected, and once master claims the interface , connect this cable.
>>>>> This appears to work.
>>>>> So , There some sort of of initialisation error.
>>>>>
>>>>>
>>>>> On Mon, Dec 2, 2013 at 11:32 AM, Raz <raziebe at gmail.com> wrote:
>>>>>
>>>>>> I still do not have a scenario. it "sometimes" happens. The
>>>>>> -DRTL8169_DEBUG is something i did not know, so i will check and see. thx
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 2, 2013 at 11:27 AM, Jeroen Van den Keybus <
>>>>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>>>>
>>>>>>> Is there a difference between cold and warm boot ? Does unloading
>>>>>>> the ec driver, loading/unloading the stock r8169 driver and then reloading
>>>>>>> the ec driver work better ? Same scenario but with Realtek drivers (r8168)
>>>>>>> ? Also perhaps compile with -DRTL8169_DEBUG ?
>>>>>>>
>>>>>>> Just some thoughts.
>>>>>>>
>>>>>>> J.
>>>>>>>
>>>>>>>
>>>>>>> 2013/12/2 Raz <raziebe at gmail.com>
>>>>>>>
>>>>>>>> The timeouts happens after the system boots and not while slaves
>>>>>>>> are in in OP mode. So my transmit is irrelevant here, even though a
>>>>>>>> transmit happens only from a single thread of through an ioctl ( SDO reads
>>>>>>>> and so on..)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 2, 2013 at 11:01 AM, Jeroen Van den Keybus <
>>>>>>>> jeroen.vandenkeybus at gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 1. why do you disable the rtl8169_phy_timer  timer ?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The rtl8169_phy_timer is regularly polled in ec_poll instead.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> 2.  In rtl_hw_start_8168 : why do disable RTL_W16(IntrMask,
>>>>>>>>>> tp->intr_event); ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> The drivers are all non-blocking and interrupt-free. All work that
>>>>>>>>> interrupt handlers normally do is done in ec_poll instead.
>>>>>>>>>
>>>>>>>>> If you cannot send packets anymore, I suspect that you may have
>>>>>>>>> overrun the tx queue, i.e. sent a packet before the previous one has been
>>>>>>>>> completed. You're also not calling the ethercat transmission functions from
>>>>>>>>> different threads, right ?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> thank you
>>>>>>>>>> raz
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> etherlab-users mailing list
>>>>>>>>>> etherlab-users at etherlab.org
>>>>>>>>>> http://lists.etherlab.org/mailman/listinfo/etherlab-users
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> https://sites.google.com/site/ironspeedlinux/
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> https://sites.google.com/site/ironspeedlinux/
>>>
>>
>>
>
>
> --
> https://sites.google.com/site/ironspeedlinux/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.etherlab.org/pipermail/etherlab-users/attachments/20131203/6878eee0/attachment-0003.htm>