innodb 快照原理解惑。。跪求高手解答·

沉沦 2010-10-25 03:04:25

引言：

InnoDB 的MVCC工作原理：

通过为每一行记录添加两个额外的隐藏的值来实现MVCC，这两个值一个记录这行数据何时被创建，另外一个记录这行数据何时过期（或者被删除）。
但是InnoDB并不存储这些事件发生时的实际时间，相反它只存储这些事件发生时的系统版本号。
这是一个随着事务的创建而不断增长的数字。每个事务在事务开始时会记录它自己的系统版本号。
每个查询必须去检查每行数据的版本号与事务的版本号是否相同。让我们来看看当隔离级别是REPEATABLE READ时这种策略是如何应用到特定的操作的：

SELECT InnoDB必须每行数据来保证它符合两个条件：

1、InnoDB必须找到一个行的版本，它至少要和事务的版本一样老(也即它的版本号不大于事务的版本号)。
这保证了不管是事务开始之前，或者事务创建时，或者修改了这行数据的时候，这行数据是存在的。

2、这行数据的删除版本必须是未定义的或者比事务版本要大。这可以保证在事务开始之前这行数据没有被删除。

符合这两个条件的行可能会被当作查询结果而返回。

INSERT：InnoDB为这个新行记录当前的系统版本号。

DELETE：InnoDB将当前的系统版本号设置为这一行的删除ID。

UPDATE：InnoDB会写一个这行数据的新拷贝，这个拷贝的版本为当前的系统版本号。它同时也会将这个版本号写到旧行的删除版本里。

==============================================================================================================



-- 我对红色字体做个实验：

CREATE TABLE xxx (

  id varchar(20) default NULL

) ENGINE=InnoDB DEFAULT CHARSET=utf8;



-- 表内容

+------+

| id   |

+------+

| 1    |                    -- 这里隐藏了行的事务版本号，这里我例如设置为 10

| 2    |                    -- 这里隐藏了行的事务版本号，这里我例如设置为 11

+------+



-- 数据操作 



 +---------------------------------+-------------------------------------+

 |  session 1                      |   session                           |

 +----------------------------- ---+-------------------------------------+

 |                                 |                                     |

 | start transaction 我假设这个事务|                                     |

 |                   的版本号为 12 |                                     |  

 +---------------------------------+-------------------------------------+

 |                                 |                                     |

 |                                 | start transaction   我假设这个事务为|

 |                                 |                     版本号为 13     |

 +---------------------------------+-------------------------------------+                    +------+

 |                                 |                                     |                    | id   |

 |                                 |    select * from xxx                | session 2的结果==> +------+

 |                                 | -- 查处所有小于等于13版本号的行     |                    |  1   | 

 |                                 |                                     |                    +------+

 |                                 |                                     |                    |  2   |

 |                                 |                                     |                    +------+ 

 |                                 |                                     |                

 +---------------------------------+-------------------------------------+                   

 |                                 |                                     |                    

 |  insert into xxx values('3')    |                                     |                   

 |  -- 输入一个版本号为 12的行     |                                     |

 +---------------------------------+-------------------------------------+

 |                                 |                                     | 

 |    commit;                      |                                     |

 |                                 |                                     |

 +---------------------------------+-------------------------------------+

 |                                 |     select * from xxx               |   

 |                                 |                                     |                     +------+ 

 |                                 | 主要问题在这：                      |                     | id   |

 |                                 | session 1 插入了一个版本号为 12的的 | session 2的结果 ：  +------+

 |                                 | 行,引用引言的红色字体部分可以得到， |                     |  1   | 

 |                                 | session 2 的事务版本号为 13是大于12 |                     +------+

 |                                 | 的，这时，因该可以到的 id = 3的值。 |                     |  2   |

 |                                 |   但为什么却没有得到呢？            |                     +------+ 

 +---------------------------------+-------------------------------------+



                                                ↑

                                                 |

                                                 |

                                             问题在这

...全文

235 19 打赏收藏转发到动态举报

写回复

用AI写文章

19 条回复

切换为时间正序

请发表友善的回复…

发表回复

gao1738 2012-02-06

打赏
举报

http://www.ningoo.net/html/tag/mysql

ACMAIN_CHM 2010-10-26

打赏
举报

下面是手册中的例子, 和你自己测试的例子基本类似相同。注意这句话 If the transaction isolation level is REPEATABLE READ (the default level), all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.

[Quote=MySQL 5.1 Reference Manual]
13.6.8.2. Consistent Nonlocking Reads
A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT will see the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you may see the table in a state that never existed in the database.

If the transaction isolation level is REPEATABLE READ (the default level), all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.

With READ COMMITTED isolation level, each consistent read within a transaction sets and reads its own fresh snapshot.

Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. A consistent read does not set any locks on the tables it accesses, and therefore other sessions are free to modify those tables at the same time a consistent read is being performed on the table.

Suppose that you are running in the default REPEATABLE READ isolation level. When you issue a consistent read (that is, an ordinary SELECT statement), InnoDB gives your transaction a timepoint according to which your query sees the database. If another transaction deletes a row and commits after your timepoint was assigned, you do not see the row as having been deleted. Inserts and updates are treated similarly.

You can advance your timepoint by committing your transaction and then doing another SELECT.

This is called multi-versioned concurrency control.

In the following example, session A sees the row inserted by B only when B has committed the insert and A has committed as well, so that the timepoint is advanced past the commit of B.

Session A Session B

SET autocommit=0; SET autocommit=0;
time
| SELECT * FROM t;
| empty set
| INSERT INTO t VALUES (1, 2);
|
v SELECT * FROM t;
empty set
COMMIT;

SELECT * FROM t;
empty set

COMMIT;

SELECT * FROM t;
---------------------
| 1 | 2 |
---------------------
1 row in set

[/Quote]

ACMAIN_CHM 2010-10-26

打赏
举报

[Quote=MySQL 5.1 Reference Manual]13.6.9. InnoDB Multi-Versioning
InnoDB is a multi-versioned storage engine: it keeps information about old versions of changed rows, to support transactional features such as concurrency and rollback. This information is stored in the tablespace in a data structure called a rollback segment (after an analogous data structure in Oracle). InnoDB uses the information in the rollback segment to perform the undo operations needed in a transaction rollback. It also uses the information to build earlier versions of a row for a consistent read.
Internal Details of Multi-Versioning
Internally, InnoDB adds three fields to each row stored in the database. A 6-byte DB_TRX_ID field indicates the transaction identifier for the last transaction that inserted or updated the row. Also, a deletion is treated internally as an update where a special bit in the row is set to mark it as deleted. Each row also contains a 7-byte DB_ROLL_PTR field called the roll pointer. The roll pointer points to an undo log record written to the rollback segment. If the row was updated, the undo log record contains the information necessary to rebuild the content of the row before it was updated. A 6-byte DB_ROW_ID field contains a row ID that increases monotonically as new rows are inserted. If InnoDB generates a clustered index automatically, the index contains row ID values. Otherwise, the DB_ROW_ID column does not appear in any index.
Undo logs in the rollback segment are divided into insert and update undo logs. Insert undo logs are needed only in transaction rollback and can be discarded as soon as the transaction commits. Update undo logs are used also in consistent reads, but they can be discarded only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.
Guidelines for Managing Rollback Segments
Commit your transactions regularly, including those transactions that issue only consistent reads. Otherwise, InnoDB cannot discard data from the update undo logs, and the rollback segment may grow too big, filling up your tablespace.
The physical size of an undo log record in the rollback segment is typically smaller than the corresponding inserted or updated row. You can use this information to calculate the space needed for your rollback segment.
In the InnoDB multi-versioning scheme, a row is not physically removed from the database immediately when you delete it with an SQL statement. InnoDB only physically removes the corresponding row and its index records when it discards the update undo log record written for the deletion. This removal operation is called a purge, and it is quite fast, usually taking the same order of time as the SQL statement that did the deletion.
If you insert and delete rows in smallish batches at about the same rate in the table, the purge thread can start to lag behind and the table can grow bigger and bigger because of all the “dead” rows, making everything disk-bound and very slow In such a case, throttle new row operations, and allocate more resources to the purge thread by tuning the innodb_max_purge_lag system variable. See Section 13.6.3, “InnoDB Startup Options and System Variables” for more information.
[/Quote]
首先，不知道你的 “通过为每一行记录添加两个额外的隐藏的值来实现MVCC” 不知道从哪儿来的。从MYSQL的文档说明中可以了解到 InnoDB adds three fields ，它加了三个，分别是DB_TRX_ID、DB_ROLL_PTR、DB_ROW_ID。作用分别为事务标志、回滚指针、记录行标志。

zuoxingyu 2010-10-26

打赏
举报

此贴必火。

zuoxingyu 2010-10-26

打赏
举报

[Quote=引用 15 楼 ymhtt 的回复:]
如果是read-committed 就能看到已经提交的数据了
[/Quote]

这个隔离级别下，不是会出现幻读么？
第一次读得到2条记录，再读一次得到3条记录了。

yuwenbao 2010-10-26

打赏
举报

呵呵，楼主看的是高性能mysql，不过高性能mysql，在介绍innodb时就已经说了，建议去看mysql 的关于innodb的手册，它会让你对自己的理解感到吃惊和意外，因为innodb的实现是比较特别的。

ymhtt 2010-10-26

打赏
举报

如果是read-committed 就能看到已经提交的数据了

ymhtt 2010-10-26

打赏
举报

REPEATABLE READ就是这样的吧

沉沦 2010-10-25

打赏
举报

[Quote=引用 10 楼 zuoxingyu 的回复:]
狼头哥，我硬是没看懂。
可以用汉字大概解释下么？感谢啊。等你几天了来了这么长一段洋文，欲哭无泪。
[/Quote]

就是，就是。狼头大哥，貌似这丢洋文，不是我要的答案。。麻烦狼头大哥，看看我的问题。。。

我的试验，我觉得于你回答的有矛盾··

zuoxingyu 2010-10-25

打赏
举报

狼头哥，我硬是没看懂。
可以用汉字大概解释下么？感谢啊。等你几天了来了这么长一段洋文，欲哭无泪。

ACMAIN_CHM 2010-10-25

打赏
举报

希望下面的图有助于理解。

This graphic shows a SELECT statement at the top with "SCN 10023" written underneath it. Under the statement is a column of blocks, each with a different SCN: 10021, 10021, 10024 (this block is shaded), 10008, 10024 (this block is shaded), 10011, 10021. An arrow labeled "Scan Path" goes from the SELECT statement through the column to the end, but the arrow veers out to the right of the shaded blocks and goes through blocks that hover to the right of the column: one with SCN 10006 and another with SCN 10021. A circle labeled Undo Segment points to these two blocks.

As the database retrieves data blocks on behalf of a query, the database ensures that the data in each block reflects the contents of the block when the query began. The database rolls back changes to the block as needed to reconstruct the block to the point in time the query started processing.

The database uses a mechanism called an SCN to guarantee the order of transactions. As the SELECT statement enters the execution phase, the database determines the SCN recorded at the time the query began executing. In Figure 9-1, this SCN is 10023. Each query in the transaction must return committed data with respect to SCN 10023.

In Figure 9-1, blocks with SCNs after 10023 indicate changed data, as shown by the two blocks with SCN 10024. The SELECT statement requires a version of the block that is consistent with committed changes. The database copies current data blocks to a new buffer and applies undo data to reconstruct previous versions of the blocks. These reconstructed data blocks are called consistent read (CR) clones.

In Figure 9-1, the database creates two CR clones: one block consistent to SCN 10006 and the other block consistent to SCN 10021. The database returns the reconstructed data for the query. In this way, Oracle Database prevents dirty reads.

hxs0117144 2010-10-25

打赏
举报

ruv wv kx

沉沦 2010-10-25

打赏
举报

[Quote=引用 6 楼 zuoxingyu 的回复:]
1、InnoDB必须找到一个行的版本，它至少要和事务的版本一样老

你举的例子的行的版本号，同一个SELECT取到的所有行的版本号应该都是一样的，比如是10，事务版本号也是10.

然后session 1 插入一条新纪录，新纪录的版本号增加为11，但是session 2的版本号还是10，新纪录的版本号比session 2里面的行版本号要新，所以查询不出来。

我觉得是这么理解的。
[/Quote]

我觉得session 2的版本号是12 ，他在session 1后开事务~~

还有哪位高手来解答哈。。谢了。

zuoxingyu 2010-10-25

打赏
举报

1、InnoDB必须找到一个行的版本，它至少要和事务的版本一样老

你举的例子的行的版本号，同一个SELECT取到的所有行的版本号应该都是一样的，比如是10，事务版本号也是10.

然后session 1 插入一条新纪录，新纪录的版本号增加为11，但是session 2的版本号还是10，新纪录的版本号比session 2里面的行版本号要新，所以查询不出来。

我觉得是这么理解的。

zuoxingyu 2010-10-25

打赏
举报



mysql> SELECT @@TX_ISOLATION;

+-----------------+

| @@TX_ISOLATION  |

+-----------------+

| REPEATABLE-READ |

+-----------------+

1 row in set (0.00 sec)



mysql>

MYSQL默认的事务隔离级别是可重复读级别，你的实验很好的证明了这一点。

沉沦 2010-10-25

打赏
举报

看我的实验，证明，行的版本号小于事务的版本号，为什么显示不出来呢···

沉沦 2010-10-25

打赏
举报

[Quote=引用 2 楼 zuoxingyu 的回复:]
session 2的事务没有提交啊。
在整个事务内部，看到的数据都是一致的。不管外面是怎么改变的。
[/Quote]

zuoxingyu 大哥，问题就在这里啊。。。

为什么整个事务的内部是一样的呢？

我想原因在于，他用MVCC快照了一下吧。

这个快照并不是复制数据，是因为MVCC，建立事务版本号吧。

应该行的版本号大于事务的版本号，这个事务将不会显示这个行的结果吧。。

我的理解，与书上说的有矛盾，忘zuoxingyu 大哥指教。。

zuoxingyu 2010-10-25

打赏
举报

session 2的事务没有提交啊。
在整个事务内部，看到的数据都是一致的。不管外面是怎么改变的。

MVCC保存了某一时刻数据的一个快照。意思就是无论事物运行了多久，它们都能看到一致的数据。也就是说在相同的时间下，不同的事物看相同表的数据是不同的。如果你从来没有这方面的经验，可能说这些有点令人困惑。但是在以后这个会很容易理解和熟悉的。