ORA-01410 INVAILD ROWID 无解之题啊

博客专家认证

2014-08-05 09:51:44

上个月2周内同一个库同一个用户下报了两次INVAILD ROWID 。
采用重建索引和FLUSH BUFFER 解决了。
可是无法找到原因，也没有办法如何去预防。

ORACLE 11G ACTIVE DATA GURAD 是的物理备库。 11.2.0.1 OPEN READ ONLY
主库和备库都构建在VM WARE 5.1的虚拟机器上。
虚拟机的磁盘模式（独立，持久，非持久）发现都没有选定，不知道默认是啥！会不会有缓存写呢？
实体机是IBM PC SYSTEM PC服务器，RAID5，阵列。
数据库没有开启直接路径和异步IO。
操作系统RED HAT LINUX 5.3 。

触发ORA-1410错误的，即在对数据块做逻辑读时运行到kcbz_check_objd_typ函数时，检测到OBJD 不一致的问题。由于seg.obj和diskobj不一致，而10g以后的kcbz_check_objd_typ函数负责验证块上的objd是否mistmatch，若不一致则触发ORA-1410错误。
造成objd mimatch的主要可能有几种：
1、写丢失 Lost Write，写丢失造成相关数据块没有为现有对象正常格式化，导致虽然该数据块的checksum是正确的,但对应数据字典却是不一致的。写丢失也可能由磁盘或卷组镜像同步软件的不完整复制造成。
2、一些DDL操作例如Exchange Partition 造成block级别的不一致，同一个数据块被2个数据对象所使用，而当这2个对象被使用时都可能覆盖问题数据块。实际上这种情况也可能是Lost Write所引起的。
3、文档Summary Of Bugs Containing ORA 1410 (Doc ID 422771.1)介绍了引起ORA-1410的主要BUG，其中BUG 4592596(Corruption (ORA-1410) from multi-table insert with direct load) 和 BUG 3868753 (Concurrent export / INSERT of ASSM segment can fail with ORA-1410 / ORA-8103)均为对表的Direct path/Parallel INSERT引起后续对表的SELECT操作报ORA-1410错误。
这说明了Direct Path/Parallel Insert操作有小概率引发ORA-1410错误发生的可能，而常规的conventional insert则不会引发ORA-1410。

4、 objd mimatch也可能仅仅是Oracle Buffer Cache内存中的block存在不一致，而Disk磁盘上的block仍是完好的。这一般是Oracle Buffer层的BUG引起的，对于该种现象一般flush buffer_cache即可解决问题。

ORA-1410问题相关的一些BUG罗列如下:

Bug 5637976
Abstract: ORA-8103/ORA-1410 from concurrent INSERT / export on ASSM tables
This occurs in 10gR2 when there are concurrent inserts and direct path exports. The newly created/updated blocks are not being flushed to disk, so the export is getting a stale version of the block from disk.
Fixed in 10.2.0.4 and 11.1.0.6
Unpublished Bug 4592596
Abstract: Corruption (ORA-1410) from multi-table insert with direct load
This error occurs if a SQL plan is compiled for a parallel run with a Degree of Parallelism (DOP) > 1, but at the time of running, due to lack of resources, it runs serial. Then the problem of invalid rowid will happen.
Fixed in 10.2.0.4 and 11.1.0.6.
Bug 5596325
Abstract: Text query gives wrong results or fails with ORA-1410 ORA-29903
If CONTAINS queries return ORA-1410: invalid rowid errors, and there are more than 200,000,000 documents in the index, then you may have encountered this bug.
Fixed in 10.2.0.4 and 11.1.0.6
Unpublished Bug 6444339
Abstract: TRUNCATE/PURGE DOES NOT CLEAN DEPENDENCIES PROPERLY.
DDL statements to an object were not invalidating all dependencies, so a stale rowid could remain in cache and produce a ORA-1410 if used.
Fixed in 11.2 and 10.2.0.5

Bug 8740993
Abstract: ORA-1410 OCCURRED ON ADG STANDBY DATABASE DURING TABLE SCAN.
This bug applies to standby databases and occurs when the standby is re-applying DDL for table drops/truncates/shrinks. The buffer cache is not being updated for the new object numbers.
Fixed in 12.1, 11.2.0.2

我的现象符合第4个原因和Bug 8740993 一部分。 when the standby is re-applying DDL for table drops/truncates/shrinks.
这个原因有点不可能。那两周没有做过这样的操作，在那两周之前，只是删除了个用户而已。

最后 ALTRE.LOG里面根本找不到ORA-的关键字！

开启了参数好像也没有效果样

在主库和备库都设置三个参数好像也没有用，又报了无效ROWID 这次是某个表的内存不匹配
ALTER SYSTEM set DB_BLOCK_CHECKSUM=full scope=both;
ALTER SYSTEM set DB_BLOCK_CHECKING=full scope=both;
ALTER SYSTEM set DB_LOST_WRITE_PROTECT=TYPICAL scope=both;

...全文

364 13 打赏收藏转发到动态举报

写回复

用AI写文章

13 条回复

切换为时间正序

请发表友善的回复…

发表回复

客家族_Shark曾_小凡仙 2014-08-07

打赏
举报

引用 12 楼 wildwave 的回复:

吓我一跳，12c都上生产了。你们用的是11.2.0.1，在11.2.0.2中fixed

你知道公司生产运行的什么版本就会一直运行下去。上家的是10G 现在的是11G 稳定的一般都不会升级大版本啊。顶多停机打下补丁。下面的英文我看错了不好意思 12.1.0.1 看成了 11.2.0.1.0 去了。 Fixed: This issue is fixed in •12.1.0.1 (Base Release) •11.2.0.2 (Server Patch Set)

小灰狼W 2014-08-06

打赏
举报

吓我一跳，12c都上生产了。你们用的是11.2.0.1，在11.2.0.2中fixed

客家族_Shark曾_小凡仙 2014-08-06

打赏
举报

引用 10 楼 wildwave 的回复:

Bug 8740993 ORA-1410 / ORA-8103 on ADG STANDBY during table scan after DROP/TRUNCATE/SHRINK in PRIMARY This note gives a brief overview of bug 8740993. The content was last updated on: 28-JUN-2013 Click here for details of each of the sections below. Affects: Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 11.1 but BELOW 12.1 Versions confirmed as being affected •11.2.0.1 •11.1.0.7 Platforms affected Generic (all / most platforms affected) Fixed: This issue is fixed in •12.1.0.1 (Base Release) •11.2.0.2 (Server Patch Set) •11.1.0.7.8 Database Patch Set Update •11.1.0.7 Patch 40 on Windows Platforms Symptoms: Related To: •Error May Occur •ORA-1410 / ORA-8103 •Active Dataguard (ADG) •Space Management (Locally Managed Tablespaces) •Physical Standby Database / Dataguard •Truncate •_query_on_physical Description ORA-1410 during full table scan on Active Dataguard (ADG) STANDBY database during table scan after DROP/TRUNCATE/SHRINK in PRIMARY database Here is the scenario: (1) Standby queries table T1, which brings buffer X into the buffer cache. (2) Primary drops/truncates/shrinks table T1. (3) Standby applies the redo to drop/truncates/shrinks T1 as well. (4) DROP case: Primary creates a new table T2 and inserts into T2, which uses the space that was previously used by T1. TRUNCATE/SHRINK case: Primary inserts more rows into T1, which uses the space that was previously used by T1. (5) Standby applies the redo in step (4). (6) A query is run to scan T2/T1, which finds buffer X in the buffer cache (provided that buffer X has not yet aged out). It leads to ORA-1410. Note that ORA-1410 is intermittent. Workaround Flush the buffer cache and retry the query This may also produces an ORA-8103 error when: - compatible less than 11.0 - a full table scan query fails with: ORA-08103: object no longer exists - the ORA-8103 error is 100% repeatable (so long as the table does not change) - the fix of bug 7650993 is installed but it does not help - the table is stored in an ASSM tablespace ie the following - a dump of the segment space metadata shows there is at least one unformatted block under the HiHWM. p); Workaround In version 11.2 set the static hidden parameter "_query_on_physical=FALSE" in the standby init.ora and open the standby read only. The previously failing standby queries should then return the correct results.\ Note that setting "_query_on_physical=FALSE" disables the ADG option and startup is not allowed if MRP is running. If startup is attempted while MRP is running, ORA-16669 is produced (instance cannot be opened because the Active Data Guard option is disabled).

谢谢版主！怎么说来 Fixed: This issue is fixed in •12.1.0.1 (Base Release) 我这个版本已经被FIXED了

客家族_Shark曾_小凡仙 2014-08-05

打赏
举报

引用 1 楼 wildwave 的回复:

标题的错误代码帮你改了下... 这个错误是发生在备库吗？还是主库在standby上做查询操作时触发的，然后flush standby的buffer cache，并在primary重建索引解决的？

发生在备库。在主库重建了索引。在备库flush buffer cache

小灰狼W 2014-08-05

打赏
举报

标题的错误代码帮你改了下... 这个错误是发生在备库吗？还是主库在standby上做查询操作时触发的，然后flush standby的buffer cache，并在primary重建索引解决的？

小灰狼W 2014-08-05

打赏
举报

Bug 8740993 ORA-1410 / ORA-8103 on ADG STANDBY during table scan after DROP/TRUNCATE/SHRINK in PRIMARY This note gives a brief overview of bug 8740993. The content was last updated on: 28-JUN-2013 Click here for details of each of the sections below. Affects: Product (Component) Oracle Server (Rdbms) Range of versions believed to be affected Versions >= 11.1 but BELOW 12.1 Versions confirmed as being affected •11.2.0.1 •11.1.0.7 Platforms affected Generic (all / most platforms affected) Fixed: This issue is fixed in •12.1.0.1 (Base Release) •11.2.0.2 (Server Patch Set) •11.1.0.7.8 Database Patch Set Update •11.1.0.7 Patch 40 on Windows Platforms Symptoms: Related To: •Error May Occur •ORA-1410 / ORA-8103 •Active Dataguard (ADG) •Space Management (Locally Managed Tablespaces) •Physical Standby Database / Dataguard •Truncate •_query_on_physical Description ORA-1410 during full table scan on Active Dataguard (ADG) STANDBY database during table scan after DROP/TRUNCATE/SHRINK in PRIMARY database Here is the scenario: (1) Standby queries table T1, which brings buffer X into the buffer cache. (2) Primary drops/truncates/shrinks table T1. (3) Standby applies the redo to drop/truncates/shrinks T1 as well. (4) DROP case: Primary creates a new table T2 and inserts into T2, which uses the space that was previously used by T1. TRUNCATE/SHRINK case: Primary inserts more rows into T1, which uses the space that was previously used by T1. (5) Standby applies the redo in step (4). (6) A query is run to scan T2/T1, which finds buffer X in the buffer cache (provided that buffer X has not yet aged out). It leads to ORA-1410. Note that ORA-1410 is intermittent. Workaround Flush the buffer cache and retry the query This may also produces an ORA-8103 error when: - compatible less than 11.0 - a full table scan query fails with: ORA-08103: object no longer exists - the ORA-8103 error is 100% repeatable (so long as the table does not change) - the fix of bug 7650993 is installed but it does not help - the table is stored in an ASSM tablespace ie the following - a dump of the segment space metadata shows there is at least one unformatted block under the HiHWM. p); Workaround In version 11.2 set the static hidden parameter "_query_on_physical=FALSE" in the standby init.ora and open the standby read only. The previously failing standby queries should then return the correct results.\ Note that setting "_query_on_physical=FALSE" disables the ADG option and startup is not allowed if MRP is running. If startup is attempted while MRP is running, ORA-16669 is produced (instance cannot be opened because the Active Data Guard option is disabled).

客家族_Shark曾_小凡仙 2014-08-05

打赏
举报

引用 8 楼 wildwave 的回复:

如果是这个bug，按文档描述里说的，buffer cache flush过了以后应该不会再和删除用户有关系了重建索引有效果的话，按4楼说的，检查下索引是否进行了相关操作

说明下分别发生了2次，同一个用的两个不同的表！第一个表发生在7月22号 select count(*) from ccps_xxxx 发现是主关键字的索引。然后我在主库重建了索引。第二个表发生在8月2日是段语句LEFT JOIN table 其上面没有索引，所以在备库上FLASH BUFFER CACHE 根据BUG的提示: 对该表做过DROP TRUCATE SHRINK OWNER OBJECT_NAME OBJECT_TYPE LAST_DDL_TIME CCPS1 CCPS_MAXMIND_OUTPUTS TABLE 2014-07-30 1124:0I:44 CCPS1 CCPS_CREDITINFO TABLE 2014-07-30 1024:58I:01 CCPS1 IX_CI_EMAIL INDEX 2014-07-30 1024:57I:59 CCPS1 IX_CI_DATETIME INDEX 2014-07-30 1024:57I:59 CCPS1 PK_CCPS_CREDITINFO INDEX 2014-07-30 1024:57I:05 CCPS1 CCPS_MER_TEL_VALIDATION_SEQ SEQUENCE 2014-07-29 1724:14I:25 CCPS1 IX_TV_MER_NO INDEX 2014-07-29 1724:12I:18 CCPS1 IX_TV_GW_NO INDEX 2014-07-29 1724:12I:18 CCPS1 PK_CCPS_MER_TEL_VALIDATION INDEX 2014-07-29 1724:12I:18 CCPS1 CCPS_MER_TEL_VALIDATION TABLE 2014-07-29 1724:12I:18 CCPS1 CCPS_INTERFACE_PARAMNAME_SEQ SEQUENCE 2014-07-23 1524:17I:59 CCPS1 PK_CCPS_INTERFACE_PARAMNAME INDEX 2014-07-23 1524:17I:58 CCPS1 CCPS_INTERFACE_PARAMNAME TABLE 2014-07-23 1524:17I:58 CCPS1 SYS_SMS TABLE 2014-07-22 2124:32I:00 CCPS1 SYS_SMS_SEQ SEQUENCE 2014-07-22 2124:31I:12 CCPS1 IX_TR_BANKORDERNO INDEX 2014-07-03 1624:56I:38 CCPS1 CCPS_TRADERECORD TABLE 2014-07-03 1624:56I:38 没metlink帐号无法看到具体的BUG说明。

小灰狼W 2014-08-05

打赏
举报

如果是这个bug，按文档描述里说的，buffer cache flush过了以后应该不会再和删除用户有关系了重建索引有效果的话，按4楼说的，检查下索引是否进行了相关操作

客家族_Shark曾_小凡仙 2014-08-05

打赏
举报


OWNER	OBJECT_NAME	OBJECT_TYPE	LAST_DDL_TIME
CCPS1	CCPS_MAXMIND_OUTPUTS	TABLE	2014-07-30 1124:0I:44
CCPS1	CCPS_CREDITINFO	TABLE	2014-07-30 1024:58I:01
CCPS1	IX_CI_EMAIL	INDEX	2014-07-30 1024:57I:59
CCPS1	IX_CI_DATETIME	INDEX	2014-07-30 1024:57I:59
CCPS1	PK_CCPS_CREDITINFO	INDEX	2014-07-30 1024:57I:05
CCPS1	CCPS_MER_TEL_VALIDATION_SEQ	SEQUENCE	2014-07-29 1724:14I:25
CCPS1	IX_TV_MER_NO	INDEX	2014-07-29 1724:12I:18
CCPS1	IX_TV_GW_NO	INDEX	2014-07-29 1724:12I:18
CCPS1	PK_CCPS_MER_TEL_VALIDATION	INDEX	2014-07-29 1724:12I:18
CCPS1	CCPS_MER_TEL_VALIDATION	TABLE	2014-07-29 1724:12I:18
CCPS1	CCPS_INTERFACE_PARAMNAME_SEQ	SEQUENCE	2014-07-23 1524:17I:59
CCPS1	PK_CCPS_INTERFACE_PARAMNAME	INDEX	2014-07-23 1524:17I:58
CCPS1	CCPS_INTERFACE_PARAMNAME	TABLE	2014-07-23 1524:17I:58
CCPS1	SYS_SMS	TABLE	2014-07-22 2124:32I:00
CCPS1	SYS_SMS_SEQ	SEQUENCE	2014-07-22 2124:31I:12
CCPS1	IX_TR_BANKORDERNO	INDEX	2014-07-03 1624:56I:38
CCPS1	CCPS_TRADERECORD	TABLE	2014-07-03 1624:56I:38

客家族_Shark曾_小凡仙 2014-08-05

打赏
举报

引用 4 楼 hyee 的回复:

我碰上过的这种错误基本上是和索引的重建有关，比如说在一条SQL正在执行未完成的时候，它所使用的索引被rebuild online完成，这时候就会报这个错误。你可以先在备库上按last_ddl_time逆序查询看是不是有新ddl产生导致的错误，如果是的话，也许可以在应用程序层面解决。

谢谢提醒，第一次发生在7月23日然后我在主库重建了索引。再一次发生在一周后！我FLASH BUFFER CACHE了不知道能不能找到原因所在？

客家族_Shark曾_小凡仙 2014-08-05

打赏
举报

引用 3 楼 wildwave 的回复:

看起来和Bug 8740993很接近啊 Here is the scenario: (1) Standby queries table T1, which brings buffer X into the buffer cache. (2) Primary drops/truncates/shrinks table T1. (3) Standby applies the redo to drop/truncates/shrinks T1 as well. (4) DROP case: Primary creates a new table T2 and inserts into T2, which uses the space that was previously used by T1. TRUNCATE/SHRINK case: Primary inserts more rows into T1, which uses the space that was previously used by T1. (5) Standby applies the redo in step (4). (6) A query is run to scan T2/T1, which finds buffer X in the buffer cache (provided that buffer X has not yet aged out). It leads to ORA-1410. 检查下是否发生了drop/truncates/shrinks 11.2.0.1的bug有点多，可以的话，升级到11.2.0.3

是的很接近！这个库运行的比较久远了。上面有6个SCHMA 最近把其中一个交易量大的用户迁走了。过了几周后，我把该用户给DROP USER CCPS 。而发生ORA的是CCPS1，另外还有个CCPS2用户并没有发生问题。不会是我DROP USER引起来的吧！

hyee 2014-08-05

打赏
举报

我碰上过的这种错误基本上是和索引的重建有关，比如说在一条SQL正在执行未完成的时候，它所使用的索引被rebuild online完成，这时候就会报这个错误。你可以先在备库上按last_ddl_time逆序查询看是不是有新ddl产生导致的错误，如果是的话，也许可以在应用程序层面解决。

小灰狼W 2014-08-05

打赏
举报

看起来和Bug 8740993很接近啊 Here is the scenario: (1) Standby queries table T1, which brings buffer X into the buffer cache. (2) Primary drops/truncates/shrinks table T1. (3) Standby applies the redo to drop/truncates/shrinks T1 as well. (4) DROP case: Primary creates a new table T2 and inserts into T2, which uses the space that was previously used by T1. TRUNCATE/SHRINK case: Primary inserts more rows into T1, which uses the space that was previously used by T1. (5) Standby applies the redo in step (4). (6) A query is run to scan T2/T1, which finds buffer X in the buffer cache (provided that buffer X has not yet aged out). It leads to ORA-1410. 检查下是否发生了drop/truncates/shrinks 11.2.0.1的bug有点多，可以的话，升级到11.2.0.3