Reported by Nikolaus Schaller, Apr 4, 2018
reboot command restarts MLU and u-boot but the kernel does not start (again). Maybe some DPLL is not shut down or has a problem when still running.
Comment 1 by Nikolaus Schaller, Apr 4, 2018
Comment 2 by Nikolaus Schaller, Apr 4, 2018
Comment 3 by daveshah, Jun 18, 2020
Is this still a problem on the Pyra? Reboot works fine on my uEVM using latest Letux U-boot and letux-kernel 5.6.y.
Comment 4 by Nikolaus Schaller, Jun 18, 2020
AFAIK yes. There is some unknown difference to the uEVM which always reboots fine. It may (or may not) have something to do with the LCD/DSI setup which is not available on the uEVM.
Comment 5 by daveshah, Jul 23, 2020
I can reproduce this on my Pyra, both from a software reboot command and by doing a forced reset using Power+L2.
Comment 6 by daveshah, Jul 23, 2020
For reference, the full output for 5.6.y: https://dev.pyra-handheld.com/snippets/765
Comment 7 by Nikolaus Schaller, Jul 23, 2020
Yes, I can confirm almost the same log after "reboot" for letux-5.8-rc5. For completeness: the problem is very old (at least 2016) and I have found an excerpt of a boot log from back then: [ 1.504410] clock: dpll_abe_ck failed transition to 'locked' [ 2.803634] clock: dpll_abe_ck failed transition to 'locked' [ 4.102881] clock: dpll_abe_ck failed transition to 'locked' [ 5.402054] clock: dpll_abe_ck failed transition to 'locked' [ 5.404696] ------------[ cut here ]------------ [ 5.404713] WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:679 clk_disable+0x34/0x40 [ 5.404720] Modules linked in: [ 5.404736] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc3-letux+ #50 [ 5.404743] Hardware name: Generic OMAP5 (Flattened Device Tree) [ 5.404765] [<c02277dc>] (unwind_backtrace) from [<c0223f7c>] (show_stack+0x10/0x14) [ 5.404781] [<c0223f7c>] (show_stack) from [<c0517f24>] (dump_stack+0x88/0xc0) [ 5.404798] [<c0517f24>] (dump_stack) from [<c02497a0>] (__warn+0xc0/0xec) [ 5.404813] [<c02497a0>] (__warn) from [<c02497e8>] (warn_slowpath_null+0x1c/0x20) [ 5.404826] [<c02497e8>] (warn_slowpath_null) from [<c06e605c>] (clk_disable+0x34/0x40) [ 5.404842] [<c06e605c>] (clk_disable) from [<c0235e4c>] (_disable_clocks+0x18/0x70) [ 5.404856] [<c0235e4c>] (_disable_clocks) from [<c0237050>] (_enable.part.15+0x20c/0x248) [ 5.404869] [<c0237050>] (_enable.part.15) from [<c0e0ec7c>] (_setup+0xc0/0x200) [ 5.404881] [<c0e0ec7c>] (_setup) from [<c02375dc>] (omap_hwmod_for_each+0x28/0x58) [ 5.404894] [<c02375dc>] (omap_hwmod_for_each) from [<c0e0efac>] (__omap_hwmod_setup_all+0x30/0x40) [ 5.404908] [<c0e0efac>] (__omap_hwmod_setup_all) from [<c02018c0>] (do_one_initcall+0x100/0x1b8) [ 5.404920] [<c02018c0>] (do_one_initcall) from [<c0e00d28>] (do_basic_setup+0x98/0xd4) [ 5.404931] [<c0e00d28>] (do_basic_setup) from [<c0e00de8>] (kernel_init_freeable+0x84/0x124) [ 5.404946] [<c0e00de8>] (kernel_init_freeable) from [<c0803ff0>] (kernel_init+0x8/0x110) [ 5.404960] [<c0803ff0>] (kernel_init) from [<c0220708>] (ret_from_fork+0x14/0x2c) [ 5.405021] ---[ end trace fafe8ae97cceeb76 ]--- [ 5.405032] omap_hwmod: dmic: _wait_target_ready failed: -16 [ 5.405046] omap_hwmod: dmic: cannot be enabled for reset (3) [ 6.731358] clock: dpll_abe_ck failed transition to 'locked' [ 8.030578] clock: dpll_abe_ck failed transition to 'locked' [ This was still hwmod based code (4.9?) while newer one uses sysc. What is common is that there is something with the abe_ck. But it is not clear why the uEVM successfully reboots. ABE & twl6040 etc. are connected differently. What I also did dig out was running some tests in 2018. There I have found in my notes that it may be something related to timer8 (backlight PWM) which also uses and enables the abe clock (!). The uEVM does not use timer8 so that may be a significant difference. I.e. it might be a bug enabling/disabling abe clock for abe and timer8 which breaks when doing a reboot. Maybe there is some "locked" bit in the abe_clk control which is not unlocked after a reboot. Unfortunately my tests did not come to a final conclusion or fix. The initial starting point would be root@letux:~# cat <<END >/etc/modprobe.d/pwm.conf > blacklist pwm_bl > blacklist pwm_omap_dmtimer > END root@letux:~# reboot and my notes said that it then reboots fine (without backlight of course). A quick test with letux-5.8-rc5 shows the same symptom: - reboot hangs with abe locked error and - blacklisting pwm makes it succeed, although there is a 10 second pause between "Starting Kernel..." and start of the log. And there is still the "clock: dpll_abe_ck failed transition to 'locked'" Some random ideas: * is the abe_clk already locked on reboot and the driver just misses the "transition to 'locked'" it waits for? * this makes some code path for initialization fail until there is some assumption on something always initialized which ends in the problems? * maybe it can be checked by identifying the code that prints the message "failed transition to 'locked'" and adding to print the locked state bits before and after significant operations
Comment 8 by Nikolaus Schaller, Jul 23, 2020
There is missing a "not": "ABE & twl6040 etc. are connected differently." => "ABE & twl6040 etc. are NOT connected differently."
Comment 9 by daveshah, Jul 23, 2020
Moving a bit closer to a solution. Looking around at various OMAP5 code, I found https://gitlab.com/linux-omap4-dev/omapboot/-/blob/kexec_support/arch /omap5/clock.c#L335 My understanding is that both CM_CLKSEL_ABE_PLL_REF and CM_CLKSEL_WKUPAON should be set to 1 in our environment. But only CM_CLKSEL_ABE_PLL_REF was being set to 1, CM_CLKSEL_WKUPAON was 0. I wrote a very hacky patch to set CM_CLKSEL_WKUPAON to 1 as part of clock init (some ioremap and iowrite32 fudging) and reboot now seems stable. I need to decouple this from a few other hacks in my tree but all going well I should have a proper patch for this soon.
Comment 10 by daveshah, Jul 23, 2020
This patch shows what is needed to be changed and does result in a stable reboot every time: https://dev.pyra-handheld.com/snippets/770 Now I need to work out if there is some existing code that's supposed to set this bit but is failing, or whether it needs to be added somewhere.
Comment 11 by Nikolaus Schaller, Jul 23, 2020
Wow, cool! So quick to find a solution :) Please can you write to linux-omap and Tony Lindgren for further discussion?
Comment 12 by Nikolaus Schaller, Jul 24, 2020
I have tested the hack and it is like magic: "reboot" works :) This makes it also easier to test (bisect) kernel variants fully automatic, because I can now install a new kernel binary through the USB gadget driver to the SD card and reboot through ssh. Which is a very powerful and helpful tool (I had used it many time on other boards) but it was never possible to apply it to the Pyra. Let's see what Tony (or other omap developers) will suggest. Probably they are currently focussing on v5.9-rc1.
Comment 13 by Nikolaus Schaller, Oct 27, 2020
still works with v5.9.y. Can be closed although it needs an upstream solution.