Openmoko Bug #2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Openmoko Bug #2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

Openmoko Public Trac
#2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4"
when DEBUG_KERNEL/PREEMPT are disabled
-----------------------------+----------------------------------------------
 Reporter:  lindi            |          Owner:  openmoko-kernel
     Type:  defect           |         Status:  new            
 Priority:  normal           |      Milestone:                
Component:  System Software  |        Version:                
 Severity:  normal           |       Keywords:                
 Haspatch:  0                |      Blockedby:                
Estimated:                   |    Patchreview:                
 Blocking:                   |   Reproducible:  sometimes      
-----------------------------+----------------------------------------------
 Steps to reproduce:
 1) echo s3c2440-sdi > /sys/bus/platform/drivers/s3c2440-sdi/bind
 2) ip link show

 Expected results:
 2) the wifi interface appears in the list

 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 2: usb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
 UP qlen 1000
     link/ether 00:1f:11:01:1f:6b brd ff:ff:ff:ff:ff:ff
 33: wlan: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
     link/ether 00:12:cf:8e:ee:09 brd ff:ff:ff:ff:ff:ff

 Actual results:
 2) the wifi interface does not appear in the list

 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 2: usb: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
 UP qlen 1000
     link/ether 00:1f:11:01:1f:6b brd ff:ff:ff:ff:ff:ff

 More info:
 1) This started to happen when I upgraded from andy-tracking
 a3587e4ed77974ad to andy-tracking a15608f241a40b41 and disabled
 CONFIG_DEBUG_KERNEL and CONFIG_PREEMPT.
 2) I have hit the bug now three times and used wifi succesfully at least
 20 times so there is probably some race somewhere that causes it to only
 happen sometimes.
 3) dmesg from working case:

 {{{
 72651.735000] s3c2440-sdi s3c2440-sdi: powered down.
 72660.080000] s3c2440-sdi s3c2440-sdi: host detect has no irq available
 72660.080000] mapped channel 0 to 0
 72660.085000] s3c2440-sdi s3c2440-sdi: powered down.
 72660.090000] s3c2440-sdi s3c2440-sdi: initialisation done.
 72660.100000] s3c2440-sdi s3c2440-sdi: running at 0kHz (requested: 0kHz).
 72660.110000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72660.110000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72660.125000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72660.130000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72660.150000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72660.160000] mmc1: queuing CIS tuple 0x01 length 3
 72660.175000] mmc1: queuing CIS tuple 0x1a length 5
 72660.180000] mmc1: queuing CIS tuple 0x1b length 8
 72660.185000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested:
 25000kHz).
 72660.190000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested:
 25000kHz).
 72660.200000] mmc1: queuing CIS tuple 0x80 length 1
 72660.200000] mmc1: queuing CIS tuple 0x81 length 1
 72660.205000] mmc1: queuing CIS tuple 0x82 length 1
 72660.210000] mmc1: new SDIO card at address 0001
 72661.275000] BMI Get Target Info: Exit (ver: 0x20000059 type: 0x1)
 72661.330000] AR6000 Reg Code = 0x40000060
 72661.355000] udev: renamed network interface eth0 to wlan
 }}}

 4) dmesg from non-working case:

 {{{
 72138.450000] s3c2440-sdi s3c2440-sdi: host detect has no irq available
 72138.450000] mapped channel 0 to 0
 72138.450000] s3c2440-sdi s3c2440-sdi: powered down.
 72138.455000] s3c2440-sdi s3c2440-sdi: initialisation done.
 72138.465000] s3c2440-sdi s3c2440-sdi: running at 0kHz (requested: 0kHz).
 72138.475000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72138.480000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72138.485000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72138.510000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72138.530000] s3c2440-sdi s3c2440-sdi: running at 196kHz (requested:
 195kHz).
 72138.540000] mmc1: queuing CIS tuple 0x01 length 3
 72138.555000] mmc1: queuing CIS tuple 0x1a length 5
 72138.560000] mmc1: queuing CIS tuple 0x1b length 8
 72138.560000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested:
 25000kHz).
 72138.565000] s3c2440-sdi s3c2440-sdi: running at 25000kHz (requested:
 25000kHz).
 72138.570000] mmc1: queuing CIS tuple 0x80 length 1
 72138.575000] mmc1: queuing CIS tuple 0x81 length 1
 72138.580000] mmc1: queuing CIS tuple 0x82 length 1
 72138.585000] mmc1: new SDIO card at address 0001
 72138.600000] sdio_ar6000 mmc1:0001:1: deviceInsertedHandler: -1
 72138.600000] sdio_ar6000 mmc1:0001:1: kthread_stop (ar6000_io): -4
 72138.605000] ar6000_activate: Failed to activate -4
 72138.610000] sdio_ar6000: probe of mmc1:0001:1 failed with error -4
 }}}

--
Ticket URL: <https://docs.openmoko.org/trac/ticket/2327>
docs.openmoko.org <http://docs.openmoko.org/trac/>
openmoko trac
Reply | Threaded
Open this post in threaded view
|

Re: Openmoko Bug #2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

Openmoko Public Trac
#2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4"
when DEBUG_KERNEL/PREEMPT are disabled
-----------------------------+----------------------------------------------
 Reporter:  lindi            |          Owner:  openmoko-kernel
     Type:  defect           |         Status:  new            
 Priority:  normal           |      Milestone:                
Component:  System Software  |        Version:                
 Severity:  normal           |       Keywords:                
 Haspatch:  0                |      Blockedby:                
Estimated:                   |    Patchreview:                
 Blocking:                   |   Reproducible:  sometimes      
-----------------------------+----------------------------------------------

Comment(by gena2x):

 I reviewed this problem, first i got as close to cause of problem as
 following:

 The s2c sdi driver is getting syncronous (not from any thread) write
 request from ar6000 driver, but get "error: data timeout" from host sdio.
 Request seem to disable interrupts on device ( __DevDisableInterrupts__
 function). As this request is part of initializations, whole
 initialization fails.

 This happens in process of initialization. enabling any debug messages
 eliminates this error, as they seem add required delay.

 Real case of error is not available on level of as some different errors
 are hidden under 'EILSEQ' Illegal byte sequence. This cause is available
 on lowest level of ar6000  - process_request function which calls sdio
 level functions.

 I saw two ways to fix problem:

 1. for the first request, retry the request if is fails.
 2. add some delay to initalization sequence.

 way (1) look better, in fact, first i added code to process_request to
 retry attempt, and that seem worked, but it turned out that this require
 relatively large patch to find place where to store the state that this
 request is first. Other variant was to retry any write request, which
 fails with same error and print some kernel message to check that we have
 no problems with such requests, and keeping in mind that for some kinds of
 requests return value is not even checked. But is bad solution, as where
 is some other kind of errors where retry is not need.

 way (2) this way was considered as solution at all because of following
 comment in code:
  * - Atheros call deviceInsertedHandler from a thread spawned off the
 probe or
  *   device insertion function. The original explanation for the Atheros
 SDIO
  *   stack said that this is done because a delay is needed to let the
 chip
  *   complete initialization. There is indeed a one second delay in the
 thread.
  *
  *   The Atheros Linux SDIO HIF removes the delay and only retains the
 thread.
  *   Experimentally removing the thread didn't show any conflicts, so
 let's get
  *   rid of it for good.

 So, as we are not willing one second as delay - I added just 10millisecond
 delay, and this works fine. Why 10milliseconds? it seems large enough for
 device (according to tests) and negligible for suspend/resume and normal
 boot. Review and comments are welcome.

 So, I propose simple patch attached to fix problem.  It have advantage
 that it's simple and will not harm anything in any case.

 Until any other solution will be found or review done, i'll include it to
 my kernel build, which is available at:
 http://www.bsdmn.com/openmoko/kernel/optim-stage2/ . This kernel include
 nodebug/nopreempt options.

--
Ticket URL: <https://docs.openmoko.org/trac/ticket/2327#comment:1>
docs.openmoko.org <http://docs.openmoko.org/trac/>
openmoko trac
Reply | Threaded
Open this post in threaded view
|

Re: Openmoko Bug #2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

Openmoko Public Trac
In reply to this post by Openmoko Public Trac
#2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4"
when DEBUG_KERNEL/PREEMPT are disabled
-----------------------------+----------------------------------------------
 Reporter:  lindi            |          Owner:  openmoko-kernel
     Type:  defect           |         Status:  new            
 Priority:  normal           |      Milestone:                
Component:  System Software  |        Version:                
 Severity:  normal           |       Keywords:                
 Haspatch:  0                |      Blockedby:                
Estimated:                   |    Patchreview:                
 Blocking:                   |   Reproducible:  sometimes      
-----------------------------+----------------------------------------------

Comment(by lindi):

 I tested delay_patch.diff. After a while I got again to the buggy state
 where wifi is powered on but no interface exists. This time the log does
 not show error messages from kernel but only has dhcp client complaining
 (I did verify later that "ip link" indeed does not show the interface):

 {{{
 kernel: [ 2137.870000] s3c2440-sdi s3c2440-sdi: host detect has no irq
 available
 kernel: [ 2137.875000] mapped channel 0 to 0
 kernel: [ 2137.875000] s3c2440-sdi s3c2440-sdi: powered down.
 kernel: [ 2137.880000] s3c2440-sdi s3c2440-sdi: initialisation done.
 kernel: [ 2137.895000] s3c2440-sdi s3c2440-sdi: running at 0kHz
 (requested: 0kHz).
 kernel: [ 2137.895000] s3c2440-sdi s3c2440-sdi: running at 196kHz
 (requested: 195kHz).
 kernel: [ 2137.905000] s3c2440-sdi s3c2440-sdi: running at 196kHz
 (requested: 195kHz).
 kernel: [ 2137.915000] s3c2440-sdi s3c2440-sdi: running at 196kHz
 (requested: 195kHz).
 kernel: [ 2137.920000] s3c2440-sdi s3c2440-sdi: running at 196kHz
 (requested: 195kHz).
 kernel: [ 2137.945000] s3c2440-sdi s3c2440-sdi: running at 196kHz
 (requested: 195kHz).
 kernel: [ 2137.955000] mmc1: queuing CIS tuple 0x01 length 3
 kernel: [ 2137.975000] mmc1: queuing CIS tuple 0x1a length 5
 kernel: [ 2137.985000] mmc1: queuing CIS tuple 0x1b length 8
 kernel: [ 2137.985000] s3c2440-sdi s3c2440-sdi: running at 25000kHz
 (requested: 25000kHz).
 kernel: [ 2137.990000] s3c2440-sdi s3c2440-sdi: running at 25000kHz
 (requested: 25000kHz).
 kernel: [ 2138.005000] mmc1: queuing CIS tuple 0x80 length 1
 kernel: [ 2138.005000] mmc1: queuing CIS tuple 0x81 length 1
 kernel: [ 2138.010000] mmc1: queuing CIS tuple 0x82 length 1
 kernel: [ 2138.015000] mmc1: new SDIO card at address 0001
 dhclient: Internet Systems Consortium DHCP Client V3.1.3
 dhclient: Copyright 2004-2009 Internet Systems Consortium.
 dhclient: All rights reserved.
 dhclient: For info, please visit https://www.isc.org/software/dhcp/
 dhclient:
 dhclient: Bind socket to interface: No such device
 }}}

 Also I am using andy-tracking a3587e4ed77974ad this time since I'm
 wondering if newer andy-tracking version introduced a TS bug or not.

--
Ticket URL: <https://docs.openmoko.org/trac/ticket/2327#comment:2>
docs.openmoko.org <http://docs.openmoko.org/trac/>
openmoko trac
Reply | Threaded
Open this post in threaded view
|

Re: Openmoko Bug #2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4" when DEBUG_KERNEL/PREEMPT are disabled

Openmoko Public Trac
In reply to this post by Openmoko Public Trac
#2327: wifi sometimes fails to start with "ar6000_activate: Failed to activate -4"
when DEBUG_KERNEL/PREEMPT are disabled
-----------------------------+----------------------------------------------
 Reporter:  lindi            |          Owner:  openmoko-kernel
     Type:  defect           |         Status:  new            
 Priority:  normal           |      Milestone:                
Component:  System Software  |        Version:                
 Severity:  normal           |       Keywords:                
 Haspatch:  0                |      Blockedby:                
Estimated:                   |    Patchreview:                
 Blocking:                   |   Reproducible:  sometimes      
-----------------------------+----------------------------------------------

Comment(by lindi):

 gena, can you make the sleep a module option so that people can experiment
 with longer delays and see if it helps?

--
Ticket URL: <https://docs.openmoko.org/trac/ticket/2327#comment:3>
docs.openmoko.org <http://docs.openmoko.org/trac/>
openmoko trac