sybase 频繁大面积掉线,连接不上,报1608错误,是什么原因??

karis007 2014-09-24 01:04:41
先说下环境,是aix+sybase 15.5
然后是问题背景:前几天sybase卡死,本人无奈重启系统后发现aix的磁盘柜两颗高速缓存电池其中的一颗没电了,而且磁盘阵列没有挂载上,在修复了卷错误后成功将磁盘柜挂载上,数据库启动成功。
之后是问题:首先是这几天数据库的读写慢,上面已经交待了原因,真正的问题是,数据库偶而出现大面积掉线,连接不上,但等一会后又能正常连接。1608错误大概每半小时就一次,而且大量的报Cannot read/send, host process disconnected和nopen: accept, Software caused connection abort。

锁和用户连接数应该也没有问题

以下是日志吧:


00:09:00000:00294:2014/09/24 10:21:30.59 kernel Cannot read, host process disconnected: 2396 spid: 294
00:07:00000:00569:2014/09/24 10:22:00.09 kernel Cannot read, host process disconnected: 5640 spid: 569
00:07:00000:00553:2014/09/24 10:26:34.18 kernel Cannot read, host process disconnected: 3048 spid: 553
00:08:00000:00068:2014/09/24 10:30:32.84 kernel Cannot read, host process disconnected: 1516 spid: 68
00:10:00000:00193:2014/09/24 10:31:04.27 kernel Cannot send, host process disconnected: 4420 suid: 7
00:07:00000:00514:2014/09/24 10:35:11.61 kernel Cannot read, host process disconnected: 2440 spid: 514
00:07:00000:00470:2014/09/24 10:35:14.25 kernel Cannot read, host process disconnected: 6112 spid: 470
00:00:00000:00000:2014/09/24 10:36:25.70 kernel secleanup: time to live expired on engine 1
00:00:00000:00012:2014/09/24 10:36:25.70 kernel Terminating the listener with protocol tcp, host ibm_p550, port 10000 because the listener execution context is located on engine 1, which is not responding.
00:00:00000:00012:2014/09/24 10:36:25.70 kernel ************************************
00:00:00000:00012:2014/09/24 10:36:25.70 server SQL Text: [no text]
00:00:00000:00012:2014/09/24 10:36:25.70 kernel curdb = 0 tempdb = 2 pstat = 0x200 p2stat = 0x100000
00:00:00000:00012:2014/09/24 10:36:25.70 kernel p3stat = 0x800 p4stat = 0x0 p5stat = 0x0 p6stat = 0x0 p7stat = 0x10000
00:00:00000:00012:2014/09/24 10:36:25.70 kernel lasterror = 0 preverror = 0 transtate = 1
00:00:00000:00012:2014/09/24 10:36:25.70 kernel curcmd = 0 program =
00:00:00000:00012:2014/09/24 10:36:25.70 kernel extended error information: hostname: login:
00:00:00000:00012:2014/09/24 10:36:25.70 kernel pc: 0x0000000168b10df0 ()
00:00:00000:00012:2014/09/24 10:36:25.70 kernel pc: 0x00000001002e1694 upyield()
00:00:00000:00012:2014/09/24 10:36:25.70 kernel pc: 0x000000010056aa2c upsetaffinity__fdpr_4()
00:00:00000:00012:2014/09/24 10:36:25.70 kernel pc: 0x0000000101767804 listener_checkagain__fdpr_1()
00:00:00000:00012:2014/09/24 10:36:25.70 kernel pc: 0x000000010049551c listener__fdpr_1()
00:00:00000:00012:2014/09/24 10:36:25.70 kernel end of stack trace, spid 277, kpid 94962132, suid 0
00:00:00000:00012:2014/09/24 10:36:25.70 kernel Started a new listener task with protocol tcp, host ibm_p550, port 10000.
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.70 kernel nopen: accept, Software caused connection abort
00:00:00000:00118:2014/09/24 10:36:25.71 kernel nopen: accept, Software caused connection abort
00:02:00000:00045:2014/09/24 10:36:25.74 kernel Cannot read, host process disconnected: E420-PW8-THINK 131576 spid: 45
00:04:00000:00225:2014/09/24 10:36:25.74 kernel Cannot read, host process disconnected: E420-PW8-THINK 131576 spid: 225
00:01:00000:00439:2014/09/24 10:40:09.17 kernel Cannot send, host process disconnected: 3636 suid: 7
00:01:00000:00439:2014/09/24 10:40:09.17 server Error: 1608, Severity: 18, State: 4
00:01:00000:00439:2014/09/24 10:40:09.17 server A client process exited abnormally, or a network error was encountered. Unless other errors occurred, continue processing normally.
00:01:00000:00439:2014/09/24 10:40:09.17 kernel extended error information: hostname: login: ayd_user
00:11:00000:00398:2014/09/24 10:40:09.17 kernel Cannot send, host process disconnected: 2568 suid: 7
00:00:00000:00292:2014/09/24 10:42:41.21 kernel Cannot read, host process disconnected: USER-20140516DP 5580 spid: 292
00:07:00000:00209:2014/09/24 10:42:41.21 kernel Cannot read, host process disconnected: USER-20140516DP 5580 spid: 209
00:07:00000:00093:2014/09/24 10:42:41.21 kernel Cannot read, host process disconnected: USER-20140516DP 5580 spid: 93
00:09:00000:00251:2014/09/24 10:43:09.74 kernel Cannot read, host process disconnected: HWHCUDVIAFWAPB9 3120 spid: 251
00:00:00000:00000:2014/09/24 10:46:25.70 kernel secleanup: time to live expired on engine 8
00:07:00000:00365:2014/09/24 10:50:12.40 kernel Cannot read, host process disconnected: ZOULIHUA 568 spid: 365
00:11:00000:00151:2014/09/24 10:51:52.29 kernel Cannot read, host process disconnected: 2184 spid: 151
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.72 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
………………
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:10:00000:00118:2014/09/24 11:05:50.73 kernel nopen: accept, Software caused connection abort
00:00:00000:00417:2014/09/24 11:05:50.73 kernel Cannot send, host process disconnected: 844 suid: 7
00:00:00000:00417:2014/09/24 11:05:50.73 server Error: 1608, Severity: 18, State: 4
00:00:00000:00417:2014/09/24 11:05:50.73 server A client process exited abnormally, or a network error was encountered. Unless other errors occurred, continue processing normally.
00:00:00000:00417:2014/09/24 11:05:50.73 kernel extended error information: hostname: login: ayd_user
00:10:00000:00284:2014/09/24 11:05:50.88 kernel Cannot send, host process disconnected: 3600 suid: 7
00:10:00000:00125:2014/09/24 11:05:50.88 kernel Cannot send, host process disconnected: 3584 suid: 7
...全文
1004 4 打赏 收藏 转发到动态 举报
写回复
用AI写文章
4 条回复
切换为时间正序
请发表友善的回复…
发表回复
karis007 2014-10-08
  • 打赏
  • 举报
回复
解决了,这里说下吧,免得以后有人不知怎么解决,就如我问题所描述那样,因为高速缓存电池没电了,所以i/o瓶颈问题更加严重,磁盘读写堵死了,所以导致大面积掉线,问题解决方法也简单,换颗新的电池就好了。
karis007 2014-09-25
  • 打赏
  • 举报
回复
引用 1 楼 andkylee 的回复:
secleanup: time to live expired on engine 1 00:00:00000:00012:2014/09/24 10:36:25.70 kernel Terminating the listener with protocol tcp, host ibm_p550, port 10000 because the listener execution context is located on engine 1, which is not responding. 网卡有问题吗?
个人也认为是网络问题,这个错误每天上午快11点时都报,导致掉线问题,过一会正常,如果是网卡问题,请问如何修复呢,本人对aix不熟。
  • 打赏
  • 举报
回复
引用 2 楼 karis007 的回复:
[quote=引用 1 楼 andkylee 的回复:] secleanup: time to live expired on engine 1 00:00:00000:00012:2014/09/24 10:36:25.70 kernel Terminating the listener with protocol tcp, host ibm_p550, port 10000 because the listener execution context is located on engine 1, which is not responding. 网卡有问题吗?
个人也认为是网络问题,这个错误每天上午快11点时都报,导致掉线问题,过一会正常,如果是网卡问题,请问如何修复呢,本人对aix不熟。[/quote] 找ibm的人问问吧
  • 打赏
  • 举报
回复
secleanup: time to live expired on engine 1 00:00:00000:00012:2014/09/24 10:36:25.70 kernel Terminating the listener with protocol tcp, host ibm_p550, port 10000 because the listener execution context is located on engine 1, which is not responding. 网卡有问题吗?

2,596

社区成员

发帖
与我相关
我的任务
社区描述
Sybase相关技术讨论区
社区管理员
  • Sybase社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧