|
Hi, list.
Here is my next rant to freerunner slowness :) Today, topic of discussion is freerunner's most famous beast - glamo. I want to show that it is much better than some people thinking about it. Also i propose different significant optimizations: usage of DMA, changing cpu<->glamo bus timings. Also you can find analysis of high resolution software video decoding. 1. --->>> Glamo bus speed The most famous problem about our beast is legendary mythical cpu <-> glamo bus slowness. Let's measure actual bus speed and fix it. 1.1. --->>> Current situation The older statement is that glamo should have 7Mb/s transfer speed. It's not easy to find out how this number were calculated. I hope someone who did glamo speed measurement can comment this mail if i am wrong somewhere. 1.2. --->>> Theory Actually bus speed of glamo is limited by settings of memory controller in our cpu. This speed can be calculated by formulae (HCLK/TWORD)*WORDSIZE, where HCLK is frequency of memory bus == 100Mhz (in normal conditions, without overclocking), WORDSIZE = 2 bytes, and TWORD is waiting period, by default we have TWORD=4+4+4 bus clocks. So, for default settings we have CPU<->glamo bus speed (100*10^6/(4+4 +4))*2/1024^2 Mb/s=15.8Mb/s. This speed may or may not be also influenced by nwait state of cpu. We'll measure actual practical speed in section 1.3. So, together with Thomas White we did review of memory contoller settings in s3c and found out that 4+4+4 setting seem not reasonable. according to Thomas analysis of timings in glamo documentation it should be 2+4+2, which is 33% less than default and gives us: (100*10^6/(2+4+2))*2/1024^2=23.8Mb/s. As you can see, both numbers are much more than 7mb/s. 1.3. --->>> Synthetic bus speed measurements. So i used simple tool to measure actual cpu<->glamo bus transfer speed. Tool opens framebuffer device and starts memcpy or memset session for whole video frame. it's speed measurement +-1 frame. so i tried to find how many frames may be displayed with each method (memcpy or memset) in 1 second for eash s3c memory controller settings (default 4+4+4 or better 2+4+2), and use formulae (640*480*2)*nr_frame to get transfer speed. Or course, memcpy is always slower than memset as cpu need first fetch 4byte burst of data from main memory and only after that send it to glamo, but memcpy is only most common operation with glamo (in both video and mmc transfer). So, after actual measurement i produced following table: theory (see 1.2) memset memcpy 4+4+4 speed: 15.8Mb/s 12Mb/s 10.5mb/s 2+4+2 speed: 23.8Mb/s 17.5Mb/s 14.0mb/s So, as you can see both default and new settings are very far from 7Mb/s. Also you can see that glamo can really do 14mb/s which is 22 full screen 640*480 frames. Also one can notice that changing settings increase throughtput by 33%. 1.4. --->>> Profiling real application: mpeg2 video decoding with mplayer. To check effect of changing timings settings, i did complex profiling for different versions of mplayers, decoding 480x640 mpeg2 video at 5fps. Such frame rate used to avoid any case of overruns or framedrops. I tested default(AKA slow) and fast timing settings for -vo x11 and -vo fbdev. The settings with different vo's are not directly comparable, as different libc used occasionally, so only (slow/x11 and fast/x11) and (slow/fbdev and fast/fbdev) are comparable. The profiling setup is opcontrol --start; mplayer ... ; opcontrol --stop First, I recorded cpu usage values from mplayer: fbdev x11 slow 35/31 40/35 fast 31/22 42/26 So, here is comparison of slow/fast fbdev: www.bsdmn.com/openmoko/glamo/oprofile/mplayerfbdev.png And comparison of slow/fast x11: www.bsdmn.com/openmoko/glamo/oprofile/mplayerx11.png As you can see, libc memcpy call cpu usage decreased from 25.2% to 17.6% for fbdev. (i think most of this memcpys are memcpy to glamo) In x11 version, Xorg memcpy usage decreased from 19.1%->13.2%. in both cases, memcpy function from glibc is most time-consuming operation. Generally speaking about software video decoding, next by cpu consumption is yuv2rgb function. I checked mplayer code and found that this function seem not optimized specifically for arm, so it might be possible to do something with it. In general, cpu usage in 5fps mpeg2 decoding is following: 17% memcpy(probably to glamo)/9%(yuv2rbg)/3% mpeg_decode_slice/41% idle/11% oprofile/1.6%io/ + others. others 100% is all cpu. You may find full profile results at: www.bsdmn.com/openmoko/glamo/oprofile/profiles 1.5. --->>> Open questions Actually, out of glamo documentation, speed should be set to 1+4+2. But this seem not working, and we didn't found why. So this question is open for further unvestigation. 1.6. --->>> How to test/use it I used this settings by default for several days with qtmokoV24 and with debian on usd. I prepared u-boot to set new timings by default: www.bsdmn.com/openmoko/glamo/242/u-boot_glamo242.udfu You may check current timings value with following tool: www.bsdmn.com/openmoko/glamo/timings/memwrite to check run: #./memwrite 1207959560 You should get for new timings: Old value: addr[48000008]=0x80 0x13 0x00 0x00 Expecting some reports from users, did it work for you flawlessly? 2. --->>> DMA transfers 2.1. --->>> Theory and current situation Currently, both mmc and glamo X driver using memcpy memory transfer. this means two things: a. cpu is 100% busy while transfer. b. cpu may shedule other task instead of transfer. Older investigation of DMA transfers result were 'slight slower on small transfer' and 'small benefit on large transfer. 2.2. --->>> Synthetyc testing i wrote proof-of-concept kernel module to do dma transfer directly from userspace to glamo memory. Contrary to prevoius investigations, I found that dma transfer is working exactly at same speed to usual memcpy to framebuffer. Also DMA transfer not 'blocks' memory bus. i did lmbench testing under 100% continious dma transfer to glamo, it were significantly slower but system worked. (all tests were done with default timings) Conclusion -> implementing dma for mmc and glamo may provide speedup to system, also it may reduce emergy consumption (as cpu may sleep while dma transfer is active) Gennady. _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
Gennady Kupava <[hidden email]> writes: > I tested default(AKA slow) and fast timing settings for -vo x11 and -vo > fbdev. The settings with different vo's are not directly comparable, as > different libc used occasionally, so only (slow/x11 and fast/x11) and > (slow/fbdev and fast/fbdev) are comparable. > > The profiling setup is opcontrol --start; mplayer ... ; opcontrol --stop Which kernel? branch and commit hash + config file would be important to be able to reproduce these results. Also which version of mplayer? Can you put the video file online too so that others can test? Did you encode it yourself? (If yes, how? :-)) _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
O Sábado, 17 de Xullo de 2010 22:42:19 Gennady Kupava escribiu:
> Hi, list. > > Here is my next rant to freerunner slowness :) Being a FR user since the beginning, it's really amazing to see how it improved since 2007.2 and 2008.x/2009.x distros, in usability, prettiness and speed (specially due to Thomas work on xorg glamo, your findings on kernel speedup and CPU/memory clocking, and FSO port to vala) Thank you very much to you and Thomas for not throwing the towel, and of course, thank you to all people behind FSO, e, SHR, Debian on FR, qtmoko, Hackable:1... Our "Free mobile" dream is nearest than ever! :o) _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
2010/7/17 Gennady Kupava <[hidden email]>:
> Here is my next rant to freerunner slowness :) A great way to rant :) The glamo timing fix seems to work great here so far. Regarding the older thread, I finally tested also it with the 500/83 CPU/memclock setting (533/88 hanged during boot when most of the system was already running), and it also worked fine. However, CPU overclocking probably increases power consumption, while the glamo boosting is probably quite pure win over-all, especially considering that glamo is the biggest bottle-neck anyway in the device. > Expecting some reports from users, did it work for you flawlessly? So far, yes. > 2. --->>> DMA transfers ... > Conclusion -> Â implementing dma for mmc and glamo may provide speedup to > system, also it may reduce emergy consumption (as cpu may sleep while > dma transfer is active) That's interesting as well, since it would be again be overall win in all aspects. Thanks a lot for the work so far, you are the heroes making FreeRunner project all the more interesting and unique :) -Timo _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
On Sun, 18 Jul 2010 00:42:19 +0400
Gennady Kupava <[hidden email]> (GK) wrote: >1.6. --->>> How to test/use it > >I used this settings by default for several days with qtmokoV24 and >with debian on usd. > >I prepared u-boot to set new timings by default: >www.bsdmn.com/openmoko/glamo/242/u-boot_glamo242.udfu sounds great! is there a qi version setting up these parameters? running system from uSD is much easier with qi... thank you Petr _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
O Sábado, 17 de Xullo de 2010 22:42:19 Gennady Kupava escribiu:
> Hi, list. > > Here is my next rant to freerunner slowness :) > You should get for new timings: > Old value: > addr[48000008]=0x80 0x13 0x00 0x00 > > Expecting some reports from users, did it work for you flawlessly? root@om-gta02 ~/Cosas/Soft/utils # ./memwrite 1207959560 pagesize = 4096 basepage=48000000, baseoff = 00000008, destaddr = 48000008 Old value: addr[88142008]=0x80 0x13 0x00 0x00 It seems to work ok for me, tested on SHR-U and qtmoko I'm currently using shr-u, will report any problem... _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
O Domingo, 18 de Xullo de 2010 18:17:07 David Garabana Barro escribiu:
> O Sábado, 17 de Xullo de 2010 22:42:19 Gennady Kupava escribiu: > I'm currently using shr-u, will report any problem... I had a WSOD shutting down, but not yet during normal use. As with new kernel WSOD are back again (at least rotating screen), I don't know if it's related to glamo timings or it was a kernel problem. I made some tests, with neon and neolight: neolight strobe @ 80 ms Although you can "see" filling, it's noticeably faster @ 242 CPU use @ 444 xorg 60 python 10 idle 30 CPU use @ 242 xorg 44 python 7 idle 47 neon continously scrolling big (1379x1916) photo. Scroll is visibly smoother CPU use @ 444 python 50 xorg 44 idle 0 CPU use @ 242 python 55 xorg 40 idle 0 _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
Wow, you're doing an awesome job here!! I've tried it on my QtMoko V24 running from NAND and it works great, the interface is noticeably faster and so is a more intensive program like Navit. Everything seems to work as it should, suspending and resuming is fine.
As Timo Jyrinki already mentioned, this is just pure performance gain, superb! I am, however, very much interested in a stable overclock as well :-) (as suspend and resume don't work on QtMoko v24, as reported in the other thread) If there is anything I (we) can help you with or test for you, please me know. Cheers, Tha_Man |
|
In reply to this post by Gennady Kupava
O Sábado, 17 de Xullo de 2010 22:42:19 Gennady Kupava escribiu:
> Hi, list. > > Expecting some reports from users, did it work for you flawlessly? One small problem I found When you get a WSOD (rotating screen), if you just reboot, you obtain a WSOD on every following reboot You MUST halt device for getting it back to normality. I suppose something is not correctly initialized in glamo with modified uboot. _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Petr Vanek
>>I prepared u-boot to set new timings by default:
>>www.bsdmn.com/openmoko/glamo/242/u-boot_glamo242.udfu > > >sounds great! is there a qi version setting up these parameters? >running system from uSD is much easier with qi... no qi? too bad :( Petr _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
On Monday, 19. July 2010 21:04:54 Petr Vanek wrote:
> >>I prepared u-boot to set new timings by default: > >>www.bsdmn.com/openmoko/glamo/242/u-boot_glamo242.udfu > > > > > >sounds great! is there a qi version setting up these parameters? > >running system from uSD is much easier with qi... > > no qi? too bad :( > > Petr I would also like to test this with qi - any chance to get this? Christian _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
I've applied the glamo patch to the qi-bootloader I use for the QtMoko installer-images. You can download the qi-bootloader here [1].
It replaces the qi.img file in the installer image so you can reflash with this new bootloader. It is, of course, also possible to flash this new bootloader using dfu-util Ghislain BaseTrend - openmobile.nl [1] http://www.openmobile.nl/pages/downloads.php#qiglamo |
|
>I've applied the glamo patch to the qi-bootloader I use for the QtMoko
>installer-images. You can download the qi-bootloader here [1]. >It replaces the qi.img file in the installer image so you can reflash >with this new bootloader. >It is, of course, also possible to flash this new bootloader using >dfu-util > >Ghislain >http://www.basetrend.nl BaseTrend - http://www.openmobile.nl >openmobile.nl > >[1] http://www.openmobile.nl/pages/downloads.php#qiglamo sounds great, seems to be this one [1], testing now thanks a milion Petr 1. http://www.openmobile.nl/modules/download_gallery/dlc.php?file=53 _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by ghislain
On Tue, Jul 20, 2010 at 09:17, ghislain <[hidden email]> wrote:
> > I've applied the glamo patch to the qi-bootloader I use for the QtMoko > installer-images. You can download the qi-bootloader here [1]. > It replaces the qi.img file in the installer image so you can reflash with > this new bootloader. > It is, of course, also possible to flash this new bootloader using dfu-util > > Ghislain > http://www.basetrend.nl BaseTrend  -  http://www.openmobile.nl openmobile.nl > > [1] http://www.openmobile.nl/pages/downloads.php#qiglamo Could you please give us also a patch for that? I would like to compile some overclocked Qi's with this Glamo tweak. -- Sebastian Krzyszkowiak dos _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Gennady Kupava
Sounds very nice !
I tried on SHR-u, booted on a beautiful "openmoko" logo, then nothing. When I tried to boot on u-boot, a single" nokernel found" message is shown ... So here's a question : Is it only for QTmoko ? _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by ghislain
On 20. juli 2010 09:17, ghislain wrote:
> > I've applied the glamo patch to the qi-bootloader I use for the QtMoko > installer-images. You can download the qi-bootloader here [1]. > It replaces the qi.img file in the installer image so you can reflash with > this new bootloader. > It is, of course, also possible to flash this new bootloader using dfu-util I tried this. Downloaded qi-s3c2442-master_c38b062a609f1442.udfu, flashed it with dfu-util. Command: dfu-util -a u-boot -R -D qi-s3c2442master_c38b062a609f1442.udfu On reboot, the screen went white, and nothing more happened. Unfortunately, there seems to be no way of fixing it. I can download other qi versions and flash them. But none works. The screen just goes black. For some versions, the vibrator will sound every 10s or so. Booting through NOR doesn't work, probably because the kernel is bigger than 2M. So qi is needed, but seems impossible to get working. Am I doing this wrong? One of my qi files is the one that worked before. But not now. Is there hope of using this device any more? Helge Hafting _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by AstHrO FR-59
On Tuesday 20 July 2010 14:37:49 Thomas HOCEDEZ wrote:
> I tried on SHR-u, booted on a beautiful "openmoko" logo, then nothing. > When I tried to boot on u-boot, a single" nokernel found" message is > shown ... > So here's a question : Is it only for QTmoko ? SHR kernel is > 2MB and uboot can have problems loading such kernel. You either need to adjust uboot env or try qi which does not have this limit. Regards Radek _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
Radek Polak <[hidden email]> writes: > SHR kernel is > 2MB and uboot can have problems loading such kernel. You > either need to adjust uboot env or try qi which does not have this limit. Why would u-boot have problems with > 2MB? Isn't it just that people have configured their u-boot to only load 2MB in the configuration? -Timo _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
|
In reply to this post by Helge Hafting
Am Dienstag 20 Juli 2010 schrieb Helge Hafting:
> I tried this. Downloaded qi-s3c2442-master_c38b062a609f1442.udfu, > flashed it with dfu-util. Command: > dfu-util -a u-boot -R -D qi-s3c2442master_c38b062a609f1442.udfu > > On reboot, the screen went white, and nothing more happened. Here the same, but beneath the white screen everything work well, so I could shutdown via ssh flash the original qi and reboot.... -- Lars Lubarsky's Law of Cybernetic Entomology: There's always one more bug. _______________________________________________ Openmoko community mailing list [hidden email] http://lists.openmoko.org/mailman/listinfo/community |
| Powered by Nabble | Edit this page |
