Status of bug #1024 (periodic signal lost and re-registration)

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Status of bug #1024 (periodic signal lost and re-registration)

Dieter Spaar-3
Hello,


here is an update about the status of bug #1024.

First some background information why it is so hard for us to
solve this bug:


  - we (or better, those who work on the GSM stuff) cannot reproduce
    this bug.


  - OM does only have a small part of the GSM firmware as source code.
    Basically its the AT command interface and some drivers. The rest
    is delivered by TI as binary libraries only, especially the GSM
    protocol stack and Layer 1. So we cannot just have a look at the
    source code and search for errors.

 
  - To get an impression of what we talk about, here are some C metric
    numbers from a comparable GSM firmware:

     GSM Protocol stack:  700 files  400.000 lines   127.000 statements
     Layer 1:             130 file   130.000 lines    31.000 statements


  - the actual low-level RF work of decoding the GSM frames is done
    by the DSP in the Calypso (there is an ARM and a DSP core inside).
    The DSP has its code in ROM and OM has no documentation about it.


  - The Calypso chipset is already "end of life" for quite some time,
    there is not much support from TI for it any more.


The above should be no excuse, it should show why it is rather difficult
for us to fix this problem.

What we know so far about #1024:

  - We have some PCO2 traces (PCO2 is an internal TI tool) which show
    that in Idle Mode (the phone is registered to the cell but there is
    no voice or data traffic) the periodic reading of the BCCH (Broadcast
    Control Channel) of the serving cell at some point fails. We don't
    know yet what exactly fails, just that an error flag set. When this
    happens, the error does no longer go away and most certainly after
    some timeout causes the "signal lost" indication and finally the
    re-registering in the cell.


  - In traces where bug #1024 does not occur, this error flag is only
    set very rarely. And if it is set, it usually goes away within the
    next few readings. This is similar if the "AT%SLEEP" workaround is
    applied, the error flag is nearly never set.


  - This periodic reading of the BCCH occur about every two seconds,
    there is no difference with or without #1024 occurring.

 
  - This periodic reading basically works like that: A special
    timer ("special" because it is designed to support the
    GSM frame timing very well) is programmed to wake up the
    chip at the correct time so that the GSM frame of interest
    can be received. Then the chip starts to sleep and waits
    for the interrupt of the timer. There are two different
    sleep modes, "Big Sleep" and "Deep Sleep".

 
  - #1024 only occurs if "Deep Sleep" mode is active (this is the
    standard behaviour, AT%SLEEP=2 disables it and only "Big Sleep"
    is used). The special thing about "Deep Sleep" mode is that the
    fast oscillator of the Calypso is turned off and it relies on the
    32kHz oscillator only.

 
  - "Big Sleep" draws less current than "Deep Sleep" so its not a
    perfect workaround to disable "Deep Sleep" completely. We have not
    yet measured how exact the standby time of the phone is influenced
    if "Deep Sleep" is turned off. I assume that it has an influence
    which should not be neglected.

 
There are several open questions:


  - The problem could come from "drifting away" in "Deep Sleep" mode from
    the right point of time to receive the frame. The firmware does some
    adjusting of the 32kHz oscillator, but there are several things which
    could go wrong (software and/or hardware issue).

 
  - We should check the 32kHz oscillator, especially have a look at
    the 220k resistor R1050. In one of the Calypso docs and in the TI
    reference implementation this resistor is 100k. TI is very picky
    about the 32KHz resonator, they mention quite a lot of things about
    what to take care. Is there a reason why we choose 220k ?

 
  - Is there a regular pattern when bug #1024 occurs ? For example
    does it depend on temperature ? Or does it depend on the charging
    level of the battery ?

 
  - Is there a way to reproduce #1024 ? Does it only occur with certain
    phones ? Or does it depend on the cell where the phone is registered ?
 

Please feel free to add your comments and thoughts, we are really trying
to fix this problem but we need your help by reporting as much details
as possible about the circumstances for bug #1024. Thank you very much.

Best regards,
  Dieter

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Sargun Dhillon
I understand the difficult in reproducing #1024. I can reproduce it
very easily (San Jose, CA, USA, Pleasanton, CA, USA: (Providers:
T-mobile, AT&T)) . How much would an RF dump help? I mean, I can give
you SSH access to Freerunner, if you'd like. What can the community do
to help you?

For possible cheap ways to gather more data:
-If you have a firmware that can save out the RF. We could deploy it
on our Freerunners that are exhibiting the problem, and send up the rf
save out. This might be very naive, as I don't really understand how
the Calypso chip works. These guys seem to be working on something
similar: http://wiki.thc.org/gsm/opentsm


-Get a USRP(2), and the proper transceiver boards. We can do the
captures. Obviously there would be a lot of noise, so we'd have to
figure out a way to get this out. These devices are expensive, so I'm
not volunteering to purchase one, but I would split the cost with a
few other Freerunner owners, and give OM access to the USRP, and a
Freerunner in its proximity remotely.

Dieter, I may be babbling, feel free to call me out if I am, but will
any of this help?

On Mon, Dec 22, 2008 at 4:06 AM, Dieter Spaar <[hidden email]> wrote:

> Hello,
>
>
> here is an update about the status of bug #1024.
>
> First some background information why it is so hard for us to
> solve this bug:
>
>
>  - we (or better, those who work on the GSM stuff) cannot reproduce
>    this bug.
>
>
>  - OM does only have a small part of the GSM firmware as source code.
>    Basically its the AT command interface and some drivers. The rest
>    is delivered by TI as binary libraries only, especially the GSM
>    protocol stack and Layer 1. So we cannot just have a look at the
>    source code and search for errors.
>
>
>  - To get an impression of what we talk about, here are some C metric
>    numbers from a comparable GSM firmware:
>
>     GSM Protocol stack:  700 files  400.000 lines   127.000 statements
>     Layer 1:             130 file   130.000 lines    31.000 statements
>
>
>  - the actual low-level RF work of decoding the GSM frames is done
>    by the DSP in the Calypso (there is an ARM and a DSP core inside).
>    The DSP has its code in ROM and OM has no documentation about it.
>
>
>  - The Calypso chipset is already "end of life" for quite some time,
>    there is not much support from TI for it any more.
>
>
> The above should be no excuse, it should show why it is rather difficult
> for us to fix this problem.
>
> What we know so far about #1024:
>
>  - We have some PCO2 traces (PCO2 is an internal TI tool) which show
>    that in Idle Mode (the phone is registered to the cell but there is
>    no voice or data traffic) the periodic reading of the BCCH (Broadcast
>    Control Channel) of the serving cell at some point fails. We don't
>    know yet what exactly fails, just that an error flag set. When this
>    happens, the error does no longer go away and most certainly after
>    some timeout causes the "signal lost" indication and finally the
>    re-registering in the cell.
>
>
>  - In traces where bug #1024 does not occur, this error flag is only
>    set very rarely. And if it is set, it usually goes away within the
>    next few readings. This is similar if the "AT%SLEEP" workaround is
>    applied, the error flag is nearly never set.
>
>
>  - This periodic reading of the BCCH occur about every two seconds,
>    there is no difference with or without #1024 occurring.
>
>
>  - This periodic reading basically works like that: A special
>    timer ("special" because it is designed to support the
>    GSM frame timing very well) is programmed to wake up the
>    chip at the correct time so that the GSM frame of interest
>    can be received. Then the chip starts to sleep and waits
>    for the interrupt of the timer. There are two different
>    sleep modes, "Big Sleep" and "Deep Sleep".
>
>
>  - #1024 only occurs if "Deep Sleep" mode is active (this is the
>    standard behaviour, AT%SLEEP=2 disables it and only "Big Sleep"
>    is used). The special thing about "Deep Sleep" mode is that the
>    fast oscillator of the Calypso is turned off and it relies on the
>    32kHz oscillator only.
>
>
>  - "Big Sleep" draws less current than "Deep Sleep" so its not a
>    perfect workaround to disable "Deep Sleep" completely. We have not
>    yet measured how exact the standby time of the phone is influenced
>    if "Deep Sleep" is turned off. I assume that it has an influence
>    which should not be neglected.
>
>
> There are several open questions:
>
>
>  - The problem could come from "drifting away" in "Deep Sleep" mode from
>    the right point of time to receive the frame. The firmware does some
>    adjusting of the 32kHz oscillator, but there are several things which
>    could go wrong (software and/or hardware issue).
>
>
>  - We should check the 32kHz oscillator, especially have a look at
>    the 220k resistor R1050. In one of the Calypso docs and in the TI
>    reference implementation this resistor is 100k. TI is very picky
>    about the 32KHz resonator, they mention quite a lot of things about
>    what to take care. Is there a reason why we choose 220k ?
>
>
>  - Is there a regular pattern when bug #1024 occurs ? For example
>    does it depend on temperature ? Or does it depend on the charging
>    level of the battery ?
>
>
>  - Is there a way to reproduce #1024 ? Does it only occur with certain
>    phones ? Or does it depend on the cell where the phone is registered ?
>
>
> Please feel free to add your comments and thoughts, we are really trying
> to fix this problem but we need your help by reporting as much details
> as possible about the circumstances for bug #1024. Thank you very much.
>
> Best regards,
>  Dieter
>
> _______________________________________________
> hardware mailing list
> [hidden email]
> http://lists.openmoko.org/mailman/listinfo/hardware
>

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Dieter Spaar-3
Hello Sargun,

> I understand the difficult in reproducing #1024. I can reproduce it
> very easily (San Jose, CA, USA, Pleasanton, CA, USA: (Providers:
> T-mobile, AT&T)) . How much would an RF dump help? I mean, I can give
> you SSH access to Freerunner, if you'd like. What can the community do
> to help you?
>  

One of the interesting questions is to find out if there is a "pattern"
for #1024. Do you
experience it all the time or only in certain areas ? And does #1024
occur all the time
or only sometimes, maybe depending on temperature or battery charge
level ? Do you
know other Freerunner users in your area and do they also experience
#1024 ?

In the moment I don't think that other traces from the phone would give
us more
information. We have to find out why this happens what we can see in the
traces.
I don't expect that RF dumps would currently give us more information,
as it seems
right now, the problem probably comes from the phone and not from some
faulty
or disturbed RF signals from the basestation. This would be different if
other phones
also loose the signal, but as it seems, only the OM phones seem to be
affected. RF
traces might help at some later point (see below).

The best would be to have a clear description how to reproduce #1024.
E.g. something
like "cool the phone down in the refrigerator and than you experience
#1024" (very
simplified example of course). Then we could try to solve the problem
(of course
this can still be lots of effort). In the moment we are somehow "poking
in the dark",
we could of course try to change something here and there and see if
this has some
influence but this would require lots of effort, not just on our side,
because we also
need someone to try it out and report about the results as long we can't
reproduce
#1024.

> For possible cheap ways to gather more data:
> -If you have a firmware that can save out the RF. We could deploy it
> on our Freerunners that are exhibiting the problem, and send up the rf
> save out. This might be very naive, as I don't really understand how
> the Calypso chip works. These guys seem to be working on something
> similar: http://wiki.thc.org/gsm/opentsm
>  

Yes, I know this project very well ;-) The problem is that the software
of this phone is about
two years older than the software OM uses. Additionally this older
version does not yet
support the "Deep Sleep" mode so it can not be used as a reference to
check what is
going on during "Deep Sleep".

The other problem is the DSP inside the Calypso. It does the basic RF
stuff. However you
don't find a lot of documentation about what this DSP does. Its has its
code in ROM, only
a few binary-only patches are applied to the ROM code when the Calypso
starts up. The
project you  reference above also does not have the source code for the
DSP, additionally
there is an older ROM version in those phones.

I will try to find out a bit more about this "error flag" which is set
when #1024 occurs,
but the problem here is the missing documentation, source code and the
DSP with
its ROM code. However its not impossible to find out some more details,
but this
can take a _lot_ of time. And life would be easier if we would know how
to reproduce
#1024 :-)

> -Get a USRP(2), and the proper transceiver boards. We can do the
> captures. Obviously there would be a lot of noise, so we'd have to
> figure out a way to get this out. These devices are expensive, so I'm
> not volunteering to purchase one, but I would split the cost with a
> few other Freerunner owners, and give OM access to the USRP, and a
> Freerunner in its proximity remotely.
>  

I think currently the status of making GSM captures with an USRP is not
that perfect.
However there is a lot of work going on and I expect some progress from
the upcoming
CCC congress in Berlin where also several people related to GSM projects
will meet
(http://events.ccc.de/congress/2008/wiki/GSM). I hope to have a better
understanding
if an USRP could help to find out more about #1024 after the congress,
than we
can decide if it makes sense to capture GSM traffic when bug #1024
occurs and
_maybe_ even play it back to reproduce #1024 (not sure if I am too
optimistic
here, and if the OpenBTS project could help, but lets wait till the
congress is over).
And of course recording and playing back only makes sense if we know that
its the RF signal which causes #1024.

> Dieter, I may be babbling, feel free to call me out if I am, but will
> any of this help?
>  

Your ideas and thought are very welcome. The USRP is probably something
we should
look closer, but right now I think the software side for GSM capturing
is not yet that
perfect. And currently we don't know for sure if it is really some
parameter of the
basestation RF signal which triggers #1024. It could be the combination
of the signal
from the serving cell basestation and its neighbor cells,  than it will
be _really_ difficult
and probably nearly impossible to capture the RF signals and play it
back, even with
more than one USRP.

Best regards,
  Dieter

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Sebastian Reichel
In reply to this post by Dieter Spaar-3
Hi,

Since I saw a lot of "Registered to:" in zhone recently I had a look
in .xsession-errors and saw quite a lot "network status changed"
messages produced by zhone. Running Distro is current Debian.

So I wrote a small fso script [1] which logs connection status. I
will include the log for the last 3 hours, my Freerunner was on my
desk the whole time:

[18:22:53] code: 26201 cid: DEA8 lac: 430D strength: 79
[18:37:23] code: 26201 cid: 9DF7 lac: 430D strength: 81
[18:45:34] code: 26201 cid: DEA8 lac: 430D strength: 81
[18:53:54] unregistered!
[18:53:57] code: 26201 cid: DEA8 lac: 430D
[18:54:25] unregistered!
[18:54:27] code: 26201 cid: DEA8 lac: 430D
[18:54:56] unregistered!
[18:54:59] code: 26201 cid: DEA8 lac: 430D
[18:55:30] unregistered!
[18:55:33] code: 26201 cid: DEA8 lac: 430D
[18:56:00] unregistered!
[18:56:02] code: 26201 cid: DEA8 lac: 430D
[18:56:31] unregistered!
[18:56:33] code: 26201 cid: DEA8 lac: 430D
[19:43:13] code: 26201 cid: 9DF7 lac: 430D strength: 81
[19:45:10] code: 26201 cid: DEA8 lac: 430D strength: 81
[19:57:18] unregistered!
[19:57:20] code: 26201 cid: DEA8 lac: 430D
[19:57:54] unregistered!
[19:57:56] code: 26201 cid: DEA8 lac: 430D
[19:58:26] unregistered!
[19:58:28] code: 26201 cid: DEA8 lac: 430D
[19:58:59] unregistered!
[19:59:01] code: 26201 cid: DEA8 lac: 430D
[19:59:37] unregistered!
[19:59:40] code: 26201 cid: DEA8 lac: 430D
[20:00:11] unregistered!
[20:00:13] code: 26201 cid: DEA8 lac: 430D
[20:15:18] code: 26201 cid: 3385 lac: 430D strength: 77
[20:27:04] code: 26201 cid: DEA8 lac: 430D strength: 77
[21:00:39] unregistered!
[21:00:41] code: 26201 cid: DEA8 lac: 430D
[21:01:07] unregistered!
[21:01:09] code: 26201 cid: DEA8 lac: 430D
[21:01:34] unregistered!
[21:01:36] code: 26201 cid: DEA8 lac: 430D
[21:02:10] unregistered!
[21:02:12] code: 26201 cid: DEA8 lac: 430D
[21:02:39] unregistered!
[21:02:40] code: 26201 cid: DEA8 lac: 430D
[21:03:43] code: 26201 cid: 9DF7 lac: 430D strength: 83
[21:04:09] code: 26201 cid: DEA8 lac: 430D strength: 83
[21:04:34] unregistered!
[21:04:37] code: 26201 cid: DEA8 lac: 430D

As you can see the switching from one network to another happens
without disconnecting. Apart from that there are approximately six
disconnects in a very short timespan each hour.

I will bring this device with me to 25C3, so you can have a look at
it there. Assuming, that it happens in Berlin, too ;)

P.S.: Can somebody @25C3 install the Buzz-Fix [2] on my Freerunner?

[1] http://www.informatik.uni-oldenburg.de/~sebi/debian/gsm-status.py
[2] http://people.openmoko.org/joerg/GSM_EMI_noise/big-C_rework_SOP__DRAFT3__.pdf

-- Sebastian Reichel

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Dieter Spaar-3
Hello Sebastian,

> As you can see the switching from one network to another happens
> without disconnecting. Apart from that there are approximately six
> disconnects in a very short timespan each hour.
>  

Thanks for the log. It seems that in all those cases of #1024 I have
seen so far, there is always more than one cell with similar strength
involved so that switching between the cells could occur. Of course
just switching between two cells alone is no problem, however
loosing the signal and then re-registering is a problem.

> I will bring this device with me to 25C3, so you can have a look at
> it there. Assuming, that it happens in Berlin, too ;)
>  

Will be interesting to see if we find a similar situation in Berlin.

> P.S.: Can somebody @25C3 install the Buzz-Fix [2] on my Freerunner?
>  

You might ask Joerg, he is the specialist :-)

Best regards,
  Dieter

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Flemming Richter Mikkelsen
I don't know if this is related, but it is interesting:

I use GPRS and can stay connected for days without
problems unless I download fast.

I tried to wget a large ogg-file [1] and the GSM modem
dropped the GPRS connection (reset?) after  downloading
121k.

I rebooted and tried again.
I did this several times, and the same thing happened each
time and I got exactly the same filesize before the reset.

Then I limited the speed (wget --limit-speed=1k) and I could
now download the complete file without any problems.

[1]: wget ftp://ftp.ccc.de/congress/25c3/audio_only/25c3-2997-en-locating_mobile_phones_using_ss7.ogg

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Dima Kogan
In reply to this post by Dieter Spaar-3
My phone readily exhibits #1024, so I just spent a bit of time looking
into the stability of the 32KHz oscillator at R1050. I scoped across
C1050, and looked for instabilities in the clock waveform as the phone
repeatedly lost and regained registration. I found none. The waveform
seemed stable at 32.786KHz. I looked at the waveform on a very slow
timescale, looking for dropouts, and on a faster time scale with a long
trigger delay, looking for phase drift. I didn't find any instability
with either method. It's still possible that a very quick instability
got by the scope, and I missed it, but if it's there, it's subtle. I
did not replace the resistor, but I can if you think it would be
useful.

While playing with this, I discovered that issuing AT%SLEEP=2 does not
completely eliminate recamping for me. If the registration was stable,
and I issued AT%SLEEP=2, no recamping would occur. However, if the
phone was very recently recamping, recamping would continue
even after I issued AT%SLEEP=2. In the new setting it would still
recamp a few times, and then settle and become stable. It feels like in
sleep=4, the phone goes in and out of an unstable state, and that
issuing AT%SLEEP=2 doesn't kick the phone out of this unstable state,
but rather doesn't let this it ENTER the unstable state.

Another thing I noticed is that when I issue AT%SLEEP=4, the calypso
responds with "EXT: I" and "OK". If registration is stable and I issue
AT%SLEEP=2, I also get "EXT: I" and "OK", and most of the time no
recamping happens (but not always). If recamping was recently
happening, AT%SLEEP=2 only says OK, no "EXT: I". If I issue the command
a second time, I get "EXT: I" also. In this situation, the calypso
seems to always recamp at least once. This is all based on one night's
worth of experimenting, so it's not conclusive, obviously. It may still
be interesting to understand what EXT: I is. It probably doesn't mean
we're currently not in Deep Sleep because that would imply that we're
often NOT in Deep Sleep when the phone is sitting on my desk, doing
nothing (if we're in sleep=4, but registration is stable). Is there a
way to tell if we're CURRENTLY in Deep Sleep? The mickeyterm logs from
my sessions that demonstrate the described behaviour are at

http://secretsauce.net:5050/recamping.log

I can easily reproduce 1024 most of the time, so I can do experiments
if needed.

Dima

On Mon, 22 Dec 2008 13:06:27 +0100
Dieter Spaar <[hidden email]> wrote:

> Hello,
>
>
> here is an update about the status of bug #1024.
>
> First some background information why it is so hard for us to
> solve this bug:
>
>
>   - we (or better, those who work on the GSM stuff) cannot reproduce
>     this bug.
>
>
>   - OM does only have a small part of the GSM firmware as source code.
>     Basically its the AT command interface and some drivers. The rest
>     is delivered by TI as binary libraries only, especially the GSM
>     protocol stack and Layer 1. So we cannot just have a look at the
>     source code and search for errors.
>
>  
>   - To get an impression of what we talk about, here are some C metric
>     numbers from a comparable GSM firmware:
>
>      GSM Protocol stack:  700 files  400.000 lines   127.000
> statements Layer 1:             130 file   130.000 lines    31.000
> statements
>
>
>   - the actual low-level RF work of decoding the GSM frames is done
>     by the DSP in the Calypso (there is an ARM and a DSP core inside).
>     The DSP has its code in ROM and OM has no documentation about it.
>
>
>   - The Calypso chipset is already "end of life" for quite some time,
>     there is not much support from TI for it any more.
>
>
> The above should be no excuse, it should show why it is rather
> difficult for us to fix this problem.
>
> What we know so far about #1024:
>
>   - We have some PCO2 traces (PCO2 is an internal TI tool) which show
>     that in Idle Mode (the phone is registered to the cell but there
> is no voice or data traffic) the periodic reading of the BCCH
> (Broadcast Control Channel) of the serving cell at some point fails.
> We don't know yet what exactly fails, just that an error flag set.
> When this happens, the error does no longer go away and most
> certainly after some timeout causes the "signal lost" indication and
> finally the re-registering in the cell.
>
>
>   - In traces where bug #1024 does not occur, this error flag is only
>     set very rarely. And if it is set, it usually goes away within the
>     next few readings. This is similar if the "AT%SLEEP" workaround is
>     applied, the error flag is nearly never set.
>
>
>   - This periodic reading of the BCCH occur about every two seconds,
>     there is no difference with or without #1024 occurring.
>
>  
>   - This periodic reading basically works like that: A special
>     timer ("special" because it is designed to support the
>     GSM frame timing very well) is programmed to wake up the
>     chip at the correct time so that the GSM frame of interest
>     can be received. Then the chip starts to sleep and waits
>     for the interrupt of the timer. There are two different
>     sleep modes, "Big Sleep" and "Deep Sleep".
>
>  
>   - #1024 only occurs if "Deep Sleep" mode is active (this is the
>     standard behaviour, AT%SLEEP=2 disables it and only "Big Sleep"
>     is used). The special thing about "Deep Sleep" mode is that the
>     fast oscillator of the Calypso is turned off and it relies on the
>     32kHz oscillator only.
>
>  
>   - "Big Sleep" draws less current than "Deep Sleep" so its not a
>     perfect workaround to disable "Deep Sleep" completely. We have not
>     yet measured how exact the standby time of the phone is influenced
>     if "Deep Sleep" is turned off. I assume that it has an influence
>     which should not be neglected.
>
>  
> There are several open questions:
>
>
>   - The problem could come from "drifting away" in "Deep Sleep" mode
> from the right point of time to receive the frame. The firmware does
> some adjusting of the 32kHz oscillator, but there are several things
> which could go wrong (software and/or hardware issue).
>
>  
>   - We should check the 32kHz oscillator, especially have a look at
>     the 220k resistor R1050. In one of the Calypso docs and in the TI
>     reference implementation this resistor is 100k. TI is very picky
>     about the 32KHz resonator, they mention quite a lot of things
> about what to take care. Is there a reason why we choose 220k ?
>
>  
>   - Is there a regular pattern when bug #1024 occurs ? For example
>     does it depend on temperature ? Or does it depend on the charging
>     level of the battery ?
>
>  
>   - Is there a way to reproduce #1024 ? Does it only occur with
> certain phones ? Or does it depend on the cell where the phone is
> registered ?
>
> Please feel free to add your comments and thoughts, we are really
> trying to fix this problem but we need your help by reporting as much
> details as possible about the circumstances for bug #1024. Thank you
> very much.
>
> Best regards,
>   Dieter
>
> _______________________________________________
> hardware mailing list
> [hidden email]
> http://lists.openmoko.org/mailman/listinfo/hardware

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Sargun Dhillon
Oh, yeah, I may have forgotten to send this to list, but this post
reminds me. I have also experienced #1024 when deep_sleep = never. It
seems that it auto-recovers pretty quickly though. It also happens for
2-3 minutes, (at a cycle between 15-30 seconds). After a few minutes,
the modem goes back to normal like nothing ever happened.


On Wed, Jan 7, 2009 at 6:05 AM, Dima Kogan <[hidden email]> wrote:

> My phone readily exhibits #1024, so I just spent a bit of time looking
> into the stability of the 32KHz oscillator at R1050. I scoped across
> C1050, and looked for instabilities in the clock waveform as the phone
> repeatedly lost and regained registration. I found none. The waveform
> seemed stable at 32.786KHz. I looked at the waveform on a very slow
> timescale, looking for dropouts, and on a faster time scale with a long
> trigger delay, looking for phase drift. I didn't find any instability
> with either method. It's still possible that a very quick instability
> got by the scope, and I missed it, but if it's there, it's subtle. I
> did not replace the resistor, but I can if you think it would be
> useful.
>
> While playing with this, I discovered that issuing AT%SLEEP=2 does not
> completely eliminate recamping for me. If the registration was stable,
> and I issued AT%SLEEP=2, no recamping would occur. However, if the
> phone was very recently recamping, recamping would continue
> even after I issued AT%SLEEP=2. In the new setting it would still
> recamp a few times, and then settle and become stable. It feels like in
> sleep=4, the phone goes in and out of an unstable state, and that
> issuing AT%SLEEP=2 doesn't kick the phone out of this unstable state,
> but rather doesn't let this it ENTER the unstable state.
>
> Another thing I noticed is that when I issue AT%SLEEP=4, the calypso
> responds with "EXT: I" and "OK". If registration is stable and I issue
> AT%SLEEP=2, I also get "EXT: I" and "OK", and most of the time no
> recamping happens (but not always). If recamping was recently
> happening, AT%SLEEP=2 only says OK, no "EXT: I". If I issue the command
> a second time, I get "EXT: I" also. In this situation, the calypso
> seems to always recamp at least once. This is all based on one night's
> worth of experimenting, so it's not conclusive, obviously. It may still
> be interesting to understand what EXT: I is. It probably doesn't mean
> we're currently not in Deep Sleep because that would imply that we're
> often NOT in Deep Sleep when the phone is sitting on my desk, doing
> nothing (if we're in sleep=4, but registration is stable). Is there a
> way to tell if we're CURRENTLY in Deep Sleep? The mickeyterm logs from
> my sessions that demonstrate the described behaviour are at
>
> http://secretsauce.net:5050/recamping.log
>
> I can easily reproduce 1024 most of the time, so I can do experiments
> if needed.
>
> Dima
>
> On Mon, 22 Dec 2008 13:06:27 +0100
> Dieter Spaar <[hidden email]> wrote:
>
>> Hello,
>>
>>
>> here is an update about the status of bug #1024.
>>
>> First some background information why it is so hard for us to
>> solve this bug:
>>
>>
>>   - we (or better, those who work on the GSM stuff) cannot reproduce
>>     this bug.
>>
>>
>>   - OM does only have a small part of the GSM firmware as source code.
>>     Basically its the AT command interface and some drivers. The rest
>>     is delivered by TI as binary libraries only, especially the GSM
>>     protocol stack and Layer 1. So we cannot just have a look at the
>>     source code and search for errors.
>>
>>
>>   - To get an impression of what we talk about, here are some C metric
>>     numbers from a comparable GSM firmware:
>>
>>      GSM Protocol stack:  700 files  400.000 lines   127.000
>> statements Layer 1:             130 file   130.000 lines    31.000
>> statements
>>
>>
>>   - the actual low-level RF work of decoding the GSM frames is done
>>     by the DSP in the Calypso (there is an ARM and a DSP core inside).
>>     The DSP has its code in ROM and OM has no documentation about it.
>>
>>
>>   - The Calypso chipset is already "end of life" for quite some time,
>>     there is not much support from TI for it any more.
>>
>>
>> The above should be no excuse, it should show why it is rather
>> difficult for us to fix this problem.
>>
>> What we know so far about #1024:
>>
>>   - We have some PCO2 traces (PCO2 is an internal TI tool) which show
>>     that in Idle Mode (the phone is registered to the cell but there
>> is no voice or data traffic) the periodic reading of the BCCH
>> (Broadcast Control Channel) of the serving cell at some point fails.
>> We don't know yet what exactly fails, just that an error flag set.
>> When this happens, the error does no longer go away and most
>> certainly after some timeout causes the "signal lost" indication and
>> finally the re-registering in the cell.
>>
>>
>>   - In traces where bug #1024 does not occur, this error flag is only
>>     set very rarely. And if it is set, it usually goes away within the
>>     next few readings. This is similar if the "AT%SLEEP" workaround is
>>     applied, the error flag is nearly never set.
>>
>>
>>   - This periodic reading of the BCCH occur about every two seconds,
>>     there is no difference with or without #1024 occurring.
>>
>>
>>   - This periodic reading basically works like that: A special
>>     timer ("special" because it is designed to support the
>>     GSM frame timing very well) is programmed to wake up the
>>     chip at the correct time so that the GSM frame of interest
>>     can be received. Then the chip starts to sleep and waits
>>     for the interrupt of the timer. There are two different
>>     sleep modes, "Big Sleep" and "Deep Sleep".
>>
>>
>>   - #1024 only occurs if "Deep Sleep" mode is active (this is the
>>     standard behaviour, AT%SLEEP=2 disables it and only "Big Sleep"
>>     is used). The special thing about "Deep Sleep" mode is that the
>>     fast oscillator of the Calypso is turned off and it relies on the
>>     32kHz oscillator only.
>>
>>
>>   - "Big Sleep" draws less current than "Deep Sleep" so its not a
>>     perfect workaround to disable "Deep Sleep" completely. We have not
>>     yet measured how exact the standby time of the phone is influenced
>>     if "Deep Sleep" is turned off. I assume that it has an influence
>>     which should not be neglected.
>>
>>
>> There are several open questions:
>>
>>
>>   - The problem could come from "drifting away" in "Deep Sleep" mode
>> from the right point of time to receive the frame. The firmware does
>> some adjusting of the 32kHz oscillator, but there are several things
>> which could go wrong (software and/or hardware issue).
>>
>>
>>   - We should check the 32kHz oscillator, especially have a look at
>>     the 220k resistor R1050. In one of the Calypso docs and in the TI
>>     reference implementation this resistor is 100k. TI is very picky
>>     about the 32KHz resonator, they mention quite a lot of things
>> about what to take care. Is there a reason why we choose 220k ?
>>
>>
>>   - Is there a regular pattern when bug #1024 occurs ? For example
>>     does it depend on temperature ? Or does it depend on the charging
>>     level of the battery ?
>>
>>
>>   - Is there a way to reproduce #1024 ? Does it only occur with
>> certain phones ? Or does it depend on the cell where the phone is
>> registered ?
>>
>> Please feel free to add your comments and thoughts, we are really
>> trying to fix this problem but we need your help by reporting as much
>> details as possible about the circumstances for bug #1024. Thank you
>> very much.
>>
>> Best regards,
>>   Dieter
>>
>> _______________________________________________
>> hardware mailing list
>> [hidden email]
>> http://lists.openmoko.org/mailman/listinfo/hardware
>
> _______________________________________________
> hardware mailing list
> [hidden email]
> http://lists.openmoko.org/mailman/listinfo/hardware
>

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Dieter Spaar-3
In reply to this post by Dima Kogan
Hello Dima,

First of all, thank you very much for spending the time looking at the
problem and
reporting the results, I really appreciate it.

> My phone readily exhibits #1024, so I just spent a bit of time looking
> into the stability of the 32KHz oscillator at R1050. I scoped across
> C1050, and looked for instabilities in the clock waveform as the phone
> repeatedly lost and regained registration. I found none. The waveform
> seemed stable at 32.786KHz. I looked at the waveform on a very slow
> timescale, looking for dropouts, and on a faster time scale with a long
> trigger delay, looking for phase drift. I didn't find any instability
> with either method. It's still possible that a very quick instability
> got by the scope, and I missed it, but if it's there, it's subtle. I
> did not replace the resistor, but I can if you think it would be
> useful.
>  

If its not too much effort for you, it would be great if you can replace
the
resistor with 100k and see if this influences #1024. This way we would
be sure that this resistor is not causing the problems.

> While playing with this, I discovered that issuing AT%SLEEP=2 does not
> completely eliminate recamping for me. If the registration was stable,
> and I issued AT%SLEEP=2, no recamping would occur. However, if the
> phone was very recently recamping, recamping would continue
> even after I issued AT%SLEEP=2. In the new setting it would still
> recamp a few times, and then settle and become stable. It feels like in
> sleep=4, the phone goes in and out of an unstable state, and that
> issuing AT%SLEEP=2 doesn't kick the phone out of this unstable state,
> but rather doesn't let this it ENTER the unstable state.
>  

This is an interesting observation, in the traces I have I did not see
it yet.

> Another thing I noticed is that when I issue AT%SLEEP=4, the calypso
> responds with "EXT: I" and "OK". If registration is stable and I issue
> AT%SLEEP=2, I also get "EXT: I" and "OK", and most of the time no
> recamping happens (but not always). If recamping was recently
> happening, AT%SLEEP=2 only says OK, no "EXT: I". If I issue the command
> a second time, I get "EXT: I" also. In this situation, the calypso
> seems to always recamp at least once. This is all based on one night's
> worth of experimenting, so it's not conclusive, obviously. It may still
> be interesting to understand what EXT: I is. It probably doesn't mean
> we're currently not in Deep Sleep because that would imply that we're
> often NOT in Deep Sleep when the phone is sitting on my desk, doing
> nothing (if we're in sleep=4, but registration is stable). Is there a
> way to tell if we're CURRENTLY in Deep Sleep?

"EXT: I" is just an indication that some extended AT commands are
executed. There is currently no way to find out if "Deep Sleep" is on.
The only thing which gives some sort of indication is that the first
typed character in the terminal got lost because it is needed to
wakeup the chip.

> The mickeyterm logs from
> my sessions that demonstrate the described behaviour are at
>
> http://secretsauce.net:5050/recamping.log
>
> I can easily reproduce 1024 most of the time, so I can do experiments
> if needed.
>
> Dima
>
>  

Best regards,
  Dieter

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Dieter Spaar-3
In reply to this post by Sargun Dhillon
Hello Sargun,

> Oh, yeah, I may have forgotten to send this to list, but this post
> reminds me. I have also experienced #1024 when deep_sleep = never. It
> seems that it auto-recovers pretty quickly though. It also happens for
> 2-3 minutes, (at a cycle between 15-30 seconds). After a few minutes,
> the modem goes back to normal like nothing ever happened.
>
>  

Thank you very much for this information.

Just a question to the people experiencing bug #1024: Anyone with a GTA01
who can reproduce #1024 ? "Deep Sleep" was introduced with "Moko3",
so it would be interesting to know if a GTA01 with "Moko3" or newer also
shows #1024.

Best regards,
  Dieter

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Werner Almesberger
Dieter Spaar wrote:
> Just a question to the people experiencing bug #1024: Anyone with a GTA01
> who can reproduce #1024 ? "Deep Sleep" was introduced with "Moko3",
> so it would be interesting to know if a GTA01 with "Moko3" or newer also
> shows #1024.

Dieter, maybe you want to repost this on "devel" or even "community".
Seems that not a lot of people are reading "hardware". (Just checked
the number of subscriptions : hardware 670, kernel 692, support 1016,
devel 1262, community 2323, announce 11258.)

- Werner

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Michael 'Mickey' Lauer
In reply to this post by Dieter Spaar-3
Am Mittwoch, den 07.01.2009, 16:45 +0100 schrieb Dieter Spaar:
> > If recamping was recently
> > happening, AT%SLEEP=2 only says OK, no "EXT: I".

You're misinterpretating this. If you previously issued %SLEEP=4, then
-- depending on whether the modem is in deep-sleep or not -- your %
SLEEP=2 is treated as a 'wakeup', to which the modem answers with a
simple 'OK'.

> "EXT: I" is just an indication that some extended AT commands are
> executed. There is currently no way to find out if "Deep Sleep" is on.
> The only thing which gives some sort of indication is that the first
> typed character in the terminal got lost because it is needed to
> wakeup the chip.

Actually the whole transmission, not just the first character.
mickeyterm by default is in line-mode,not character mode. (That's what I
need for this nifty readline support :))

:M:



_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Michael 'Mickey' Lauer
In reply to this post by Dieter Spaar-3
Unfortunately I no longer have a GTA01 with such an old firmware (and
I'm not really keen to downgrade either), but memory says that our 01
prototypes (GSM firmware moko0) didn't suffer from 1024.

:M:


_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware
Reply | Threaded
Open this post in threaded view
|

Re: Status of bug #1024 (periodic signal lost and re-registration)

Werner Almesberger
Michael 'Mickey' Lauer wrote:
> Unfortunately I no longer have a GTA01 with such an old firmware (and
> I'm not really keen to downgrade either),

moko3 appears to be the version where the "deep sleep" was
first introduced. So any newer GSM firmware should have it too.

> but memory says that our 01
> prototypes (GSM firmware moko0) didn't suffer from 1024.

moko0 wouldn't have the deep sleep, so such an ancient GTA01 not
suffering from #1024 problem is consistent with our findings so
far.

By the way, if you give it a try and in case you find that #1024
does not happen on a GTA01, please also mention which hardware
revision it is.

- Werner

_______________________________________________
hardware mailing list
[hidden email]
http://lists.openmoko.org/mailman/listinfo/hardware