run_init_process 挂死,请教大侠!!
在移植Linux 2.6.17的内核到PNX8950(MIPS架构)上,出现Init程序挂死。
文件系统采用CPIO格式,由BusyBox 1.2.1得到,编译到内核中。
交叉工具链由Buildroot构建,采用uClib库。
单板启动到“Freeing unused kernel memory: 972k freed”后,挂死;
跟踪Linux 2.6.17的内核启动代码:
static int init(void * unused)
{
。。。。。。
if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) <------运行正常
printk(KERN_WARNING "Warning: unable to open an initial console.\n");
。。。。。。
if (execute_command) {
run_init_process(execute_command); <--------挂死位置, execute_command值“/init", 在BusyBox做文件系统时,init已链接到/sbin/init;
printk(KERN_WARNING "Failed to execute %s. Attempting "
"defaults...\n", execute_command);
}
run_init_process("/sbin/init");
run_init_process("/etc/init");
run_init_process("/bin/init");
run_init_process("/bin/sh");
panic("No init found. Try passing init= option to kernel.");
}
经过定位,调用关系如下:
run_init_process->execve->sys_execve->do_execve->search_binary_handler->load_elf_binary
在 load_elf_binary挂掉,仔细分析过程,发现load_elf_binary函数的padzero时挂死,加上调试代码,代码片断如下:
static int load_elf_binary(struct linux_binprm * bprm, struct pt_regs * regs)
{
。。。。。。
loc->elf_ex.e_entry += load_bias;
elf_bss += load_bias;
elf_brk += load_bias;
start_code += load_bias;
end_code += load_bias;
start_data += load_bias;
end_data += load_bias;
printk(KERN_WARNING "loc->elf_ex.e_entry :0x%8x\n", loc->elf_ex.e_entry);
printk(KERN_WARNING "load_bias: 0x%8x\n", load_bias);
printk(KERN_WARNING "elf_bss: 0x%8x\n", elf_bss);
printk(KERN_WARNING "elf_brk: 0x%8x\n", elf_brk);
printk(KERN_WARNING "start_code: 0x%8x\n", start_code);
printk(KERN_WARNING "end_code: 0x%8x\n", end_code);
printk(KERN_WARNING "start_data: 0x%8x\n", start_data);
printk(KERN_WARNING "end_data: 0x%8x\n", end_data);
/* Calling set_brk effectively mmaps the pages that we need
* for the bss and break sections. We must do this before
* mapping in the interpreter, to make sure it doesn't wind
* up getting placed where the bss needs to go.
*/
retval = set_brk(elf_bss, elf_brk); <-----------mmap成功
printk(KERN_WARNING "retval: 0x%8x\n", retval);
if (retval) {
send_sig(SIGKILL, current, 0);
goto out_free_dentry;
}
if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) { <-------调用padzero时死掉
send_sig(SIGSEGV, current, 0);
retval = -EFAULT; /* Nobody gets to see this, but.. */
goto out_free_dentry;
}
。。。。。。。。
其中printk是加入的调试代码,调试中显示结果如下:
loc->elf_ex.e_entry :0x 400100
load_bias: 0x 0
elf_bss: 0x 4ed7e0
elf_brk: 0x 507f00
start_code: 0x 400000
end_code: 0x 4a9bf0
start_data: 0x 4ea000
end_data: 0x 4ed7e0
retval: 0x 0
后分析跟踪padzero函数代码,定位到__clear_user函数中在汇编代码处挂掉:
static inline __kernel_size_t
__clear_user(void __user *addr, __kernel_size_t size)
{
__kernel_size_t res;
printk(KERN_WARNING "In Function: %s\n", __FUNCTION__); //debug by ljq
printk(KERN_WARNING "Address_Start: 0x%8x Size: 0x%8x\n", (int)addr, (int)size);
might_sleep();
#if 0
__asm__ __volatile__(
"move\t$4, %1\n\t"
"move\t$5, $0\n\t"
"move\t$6, %2\n\t"
__MODULE_JAL(__bzero)
"move\t%0, $6"
: "=r" (res)
: "r" (addr), "r" (size)
: "$4", "$5", "$6", __UA_t0, __UA_t1, "$31");
#else
{
char *xs = addr;
while (size--)
*xs++ = 0;
}
res = 0;
#endif
printk(KERN_WARNING "res: %d\n", (int)res);
printk(KERN_WARNING "Exit Function: %s\n", __FUNCTION__);
return res;
}
改写__clear_user函数,保持功能一致(也不知道这样改写行不行);并加进调试信息,调试显示信息如下:
In Function: debug__clear_user
Address_Start: 0x 4ed7e0 Size: 0x 820
还是跑飞。
定位分析是在操作地址0x4ed7e0时跑飞,该地址在MIPS架构中KSEG段,处于User Mode状态;
令人感到疑惑的是:
在load_elf_binary函数中set_brk进行页面mmap成功后,应该可以直接访问,而不应该跑飞,请各位大侠给出些帮助,莫大感激,此问题困窘已久。
个人试着进行了以下几个方面试验,但没有得到解决:
1、BusyBox编译时采用静态(以前是动态编译的);
2、完全屏蔽__clear_user代码,虽然最后load_elf_binary可以成功退出,但是run_init_process依旧没有启动shell(/bin/sh),此时串口工作良好;
3、改写BusyBox的Init为简单Hello World程序,可以打印出”Hello World“字符,当时随后就出现系统panic;