oracle rac一边实例宕了
环境 oracle11204 rac for redhat6.6
9月13日发现一边实例宕了
alert日志:
Errors in file /oracle/app/oracle/diag/rdbms/bssoradb/bssoradb2/trace/bssoradb2_lmon_13671.trc (incident=432089):
ORA-29740: evicted by instance number 1, group incarnation 46
Incident details in: /oracle/app/oracle/diag/rdbms/bssoradb/bssoradb2/incident/incdir_432089/bssoradb2_lmon_13671_i432089.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Errors in file /oracle/app/oracle/diag/rdbms/bssoradb/bssoradb2/trace/bssoradb2_lmon_13671.trc:
ORA-29740: evicted by instance number 1, group incarnation 46
LMON (ospid: 13671): terminating the instance due to error 29740
Wed Sep 13 18:39:49 2017
System state dump requested by (instance=2, osid=13671 (LMON)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/oracle/diag/rdbms/bssoradb/bssoradb2/trace/bssoradb2_diag_13661_20170913183949.trc
Instance terminated by LMON, pid = 13671
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
对应的/oracle/app/oracle/diag/rdbms/bssoradb/bssoradb2/trace/bssoradb2_lmon_13671.trc
*** 2017-09-13 18:39:44.093
kjxgrrcfgchk: Initiating reconfig, reason=3
kjxgrrcfgchk: COMM rcfg - Disk Vote Required
kjfmReceiverHealthCB_CheckAll: Recievers are healthy.
2017-09-13 18:39:44.093242 : kjxgrnetchk: start 0x43f2b35f, end 0x43f369a9
2017-09-13 18:39:44.093265 : kjxgrnetchk: Network Validation wait: 46 sec
2017-09-13 18:39:44.093282 : kjxgrnetchk: Sending comm check req to inst 1
kjxgrrcfgchk: prev pstate 6 mapsz 512
kjxgrrcfgchk: new bmp: 1 2
kjxgrrcfgchk: work bmp: 1 2
kjxgrrcfgchk: rr bmp: 1 2
*** 2017-09-13 18:39:44.093
kjxgmrcfg: Reconfiguration started, type 3
CGS/IMR TIMEOUTS:
CSS recovery timeout = 31 sec (Total CSS waittime = 65)
IMR Reconfig timeout = 75 sec
CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 44 0.
kjxgrs0h: disable CGS timeout
017-09-13 18:39:44.537303 : kjxgrcomerr: Suppressed nested communications reconfig: instance 1 (44,44)
kjxgrnetval: all instances have acknowledged
kjxgrrcfgchk: NETVAL: reconfig bitmap chksum 0x317eab92 cnt 2 master 2
SelectVoteMethod: member information
Inst 1, st 0x0017, es 0x0002, cap 0x0
Inst 2, st 0x0107, es 0x0000, cap 0x3
SelectVoteMethod: num mounted 1, unmounted 0
SelectVoteMethod: mounted capatility: nonblocking blocking
SelectVoteMethod: num unmounted nb 0 b 0
SelectVoteMethod: total insts nb 1 b 1
SelectVoteMethod: final capability nonblocking
kjxgrpropmsg: SSVOTE: Master indicates Disk Voting required
2017-09-13 18:39:44.589141 : kjxgrDiskVote: nonblocking method is chosen
2017-09-13 18:39:44.640605 : kjxgrDiskVote: start the disk vote w/ seqno 45
2017-09-13 18:39:44.640658 : kjxgrDiskVote: timeout in 56 sec
Last valid bitmap: 1 2
2017-09-13 18:39:44.640725 : kjxgrDiskVote: active members status:
Inst 1, st 0x0017, es 0x0002, cap 0x0
Inst 2, st 0x0107, es 0x0000, cap 0x3
2017-09-13 18:39:44.692219 : kjxgrDiskVote: voted w/ seq 45 and map: 2
LR trace: *** @ 1-LR1: 2017-09-13 18:39:44.744 01
- rcnt-idx 2
LR trace: *** @ 2-LR2: 2017-09-13 18:39:44.744 00
- rcnt-idx 2
*** 2017-09-13 18:39:44.745
kjxgrf_rr_lock: done - ret = 1 hist 0x12e
2017-09-13 18:39:44.745313 : kjxgrDiskVote: RR lock-get failed w/ status 1
2017-09-13 18:39:44.745350 : kjxgrDiskVote: RR update instance is 1
2017-09-13 18:39:44.849246 : kjxgrDiskVote: detected an inconsistent membership by inst 1 at seq 46
2017-09-13 18:39:44.900729 : kjxgrDiskVote: wait 0 sec for membership resolution
2017-09-13 18:39:44.900806 : kjxgrDiskVote: new membership is from inst 1
2017-09-13 18:39:44.900828 : kjxgrDiskVote: bitmap: 1
2017-09-13 18:39:44.952331 : kjxgrdtrt: Evicted by inst 1, seq (46, 46)
IMR state information
Inst 2, thread 2, state 0x4:124c, flags 0x12ca9:0x0001
RR seq commit 44 cur 46
Propstate 4 prv 3 pending 0
rcfg rsn 3, rcfg time 1139979104, mem ct 2
master 2, master rcfg time 1139979104
evicted memcnt 0, starttm 0 chkcnt 0
system load 0 (normal)
nonblocking disk voting method
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[root@bssdb2 ~]# ps -ef|grep grid
root 7804 7628 0 15:43 pts/0 00:00:00 grep grid
root 19240 1 5 Sep20 ? 1-23:44:07 /oracle/app/grid/product/11.2.0/bin/osysmond.bin
grid 19285 1 0 Sep20 ? 03:04:18 /oracle/app/grid/product/11.2.0/bin/ocssd.bin
root 22794 1 0 Oct24 ? 00:06:35 /oracle/app/grid/product/11.2.0/bin/ohasd.bin reboot
root 22887 1 0 Oct24 ? 00:05:20 /oracle/app/grid/product/11.2.0/bin/orarootagent.bin
root 22890 1 0 Oct24 ? 00:01:22 /oracle/app/grid/product/11.2.0/bin/cssdagent
root 22892 1 0 Oct24 ? 00:01:22 /oracle/app/grid/product/11.2.0/bin/cssdmonitor
grid 23000 1 0 Oct24 ? 00:01:10 /oracle/app/grid/product/11.2.0/bin/oraagent.bin
grid 23011 1 0 Oct24 ? 00:00:14 /oracle/app/grid/product/11.2.0/bin/mdnsd.bin
grid 23035 1 0 Oct24 ? 00:01:05 /oracle/app/grid/product/11.2.0/bin/gpnpd.bin
grid 23051 1 0 Oct24 ? 00:05:13 /oracle/app/grid/product/11.2.0/bin/gipcd.bin
root 23140 1 0 Oct24 ? 00:04:34 /oracle/app/grid/product/11.2.0/bin/octssd.bin reboot
grid 23195 1 0 Oct24 ? 00:03:16 /oracle/app/grid/product/11.2.0/bin/evmd.bin
---------------------------------------
ip addr
link/ether 40:f2:e9:de:53:ac brd ff:ff:ff:ff:ff:ff
inet 192.168.111.104/24 brd 192.168.113.255 scope global bond1 #oracle心跳ip
inet 169.254.49.109/16 brd 169.254.255.255 scope global bond1:1 #请问这个 ip会影响oracle rac么
inet6 fe80::42f2:e9ff:fede:53ac/64 scope link
valid_lft forever preferred_lft forever
现在数据库宕了,也拉不起来,请各位大神帮忙 zhu
ASMCMD> lsdg
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run
[oracle@bssdb2 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[root@bssdb2 ~]# /oracle/app/grid/product/11.2.0/bin/crsctl stop crs
CRS-2796: The command may not proceed when Cluster Ready Services is not running
CRS-4687: Shutdown command has completed with errors.
CRS-4000: Command Stop failed, or completed with errors.
[root@bssdb2 ~]# /oracle/app/grid/product/11.2.0/bin/crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
[grid@bssdb2 ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.cluster_interconnect.haip
1 ONLINE ONLINE bssdb2
ora.crf
1 ONLINE ONLINE bssdb2
ora.crsd
1 ONLINE OFFLINE
ora.cssd
1 ONLINE ONLINE bssdb2
ora.cssdmonitor
1 ONLINE ONLINE bssdb2
ora.ctssd
1 ONLINE ONLINE bssdb2 OBSERVER
ora.diskmon
1 OFFLINE OFFLINE
ora.drivers.acfs
1 ONLINE ONLINE bssdb2
ora.evmd
1 ONLINE INTERMEDIATE bssdb2
ora.gipcd
1 ONLINE ONLINE bssdb2
ora.gpnpd
1 ONLINE ONLINE bssdb2
ora.mdnsd
1 ONLINE ONLINE bssdb2
[grid@bssdb2 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3324
Available space (kbytes) : 258796
ID : 1806684292
Device/File Name : +OCR
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user