Pcie挂多功能设备出现的一个问题
现在做的一个案子出现一个很棘手的问题,本人经验不足,还望版主不吝赐教。
在pcie switch下挂了一个FC和sas硬盘,当pcie设置为gen1时,不会有问题。
设置为gen2,pcie 会报错,有data link error的错误,接着cpu也死掉,打印如下
# irq 128: nobody cared (try booting with the "irqpoll" option)
Call Trace:
[<ffffffff80112f7c>] dump_stack+0x8/0x34
[<ffffffff8034402c>] __report_bad_irq+0x3c/0xd8
[<ffffffff8034424c>] note_interrupt+0x184/0x250
[<ffffffff80344f40>] handle_level_irq+0x138/0x170
[<ffffffff802e2164>] do_IRQ+0x2c/0x40
[<ffffffff80118d88>] plat_irq_dispatch+0x70/0xb8
[<ffffffff80100988>] ret_from_irq+0x0/0x4
[<ffffffff80342588>] handle_IRQ_event+0x40/0x190
[<ffffffff80344eb4>] handle_level_irq+0xac/0x170
[<ffffffff802e2164>] do_IRQ+0x2c/0x40
[<ffffffff80118d88>] plat_irq_dispatch+0x70/0xb8
[<ffffffff80100988>] ret_from_irq+0x0/0x4
[<ffffffff80311f04>] __do_softirq+0x7c/0x1a0
[<ffffffff80312098>] do_softirq+0x70/0x78
[<ffffffff80118d68>] plat_irq_dispatch+0x50/0xb8
[<ffffffff80100988>] ret_from_irq+0x0/0x4
[<ffffffffc00a6878>] sgv_pool_alloc+0x1f8/0xde0 [scst]
[<ffffffffc0088cbc>] scst_alloc_space+0xfc/0x300 [scst]
[<ffffffffc006e0c8>] scst_prepare_space+0x238/0x7e0 [scst]
[<ffffffffc00760e8>] scst_process_active_cmd+0x7b8/0xbb8 [scst]
[<ffffffffc00765d4>] scst_do_job_active+0xec/0x228 [scst]
[<ffffffffc0076ab0>] scst_cmd_thread+0x258/0x5f8 [scst]
[<ffffffff803255a0>] kthread+0x88/0x90
[<ffffffff802e2ca8>] kernel_thread_helper+0x10/0x18
handlers:
[<ffffffffc01328a8>] (_base_interrupt+0x0/0x4f8 [mpt2sas])
Disabling IRQ #128
ERROR PEMX_INT_SUM(0)[SE]: System Error, RC Mode Only.
(cfg_sys_err_rc)
ERROR PEMX_DBG_INFO(0)[RTLPLLE]: Received TLP has link layer error
pedc_radm_trgt1_dllp_abort & pedc__radm_trgt1_eot
ERROR PEMX_DBG_INFO(0)[RCEMRC]: Received Correctable Error Message (RC Mode
only)
pedc_radm_correctable_err
ERROR PEMX_DBG_INFO(0)[ACTO]: A Completion Timeout Occured
pedc_radm_cpl_timeout
ERROR PEMX_DBG_INFO(0)[RACUR]: Received a completion with UR status
radm_rcvd_cpl_ur
Data bus error, epc == ffffffffc0168350, ra == ffffffffc01a1424
Data bus error, epc == ffffffffc0168350, ra == ffffffffc01a1424
Oops[#1]:
现怀疑是链路上信号质量的问题,但是也不应该会导致cpu死掉吧?不知该如何下手分析