解析mdf文件的问题

null0zero 2010-12-23 02:46:16

小弟想做一个解析mdf文件的工作，主要就是根据sqlserver数据库文件解析出其中的表结构以及各个表之间的关系，不知道能不能做的到，有知道的大哥指点一下吧？谢谢了 :)

...全文

436 15 打赏收藏转发到动态举报

写回复

用AI写文章

15 条回复

切换为时间正序

请发表友善的回复…

发表回复

Sybase数据库恢复 2010-12-26

打赏
举报

如果你说的是一个健康的数据库的话， sql server management studio是万能的。

null0zero 2010-12-23

打赏
举报

[Quote=引用 12 楼 andkylee 的回复:]

你看过sqlite的源代码，应该知道如果两眼摸黑研究sqlserver内部物理存储结构的难度。好在还有sqlserver internals这种内部资料可供参考。
如果没有这种资料的话，一个字“难”。
这么说，有时候一个字节具体代表什么意思，都得摸索好久。这是我在研究sybase数据库物理存储结构的时候的体会。
[/Quote]
恩，是的。sqlite好歹还有源代码可以参考，有书籍可以参考，大体的设计框架都可以知道，这样子看代码也是一点点的啃

看来想直接解析数据库的物理文件存储格式这条路是走不通了。。。

那我要解析出mdf文件的相关格式，我要怎么做呢？貌似可以用sqlserver的数据库管理工具生成相应的sql脚本，这样子的话，数据库的表结构是可以有了。目前正在下载sqlserver数据库，准备动手重新试验一下

那数据库表中的具体对应数据呢？这个需要怎么获得呢？

麻烦你给指点一下吧？谢谢了 :)

Sybase数据库恢复 2010-12-23

打赏
举报

你看过sqlite的源代码，应该知道如果两眼摸黑研究sqlserver内部物理存储结构的难度。好在还有sqlserver internals这种内部资料可供参考。
如果没有这种资料的话，一个字“难”。
这么说，有时候一个字节具体代表什么意思，都得摸索好久。这是我在研究sybase数据库物理存储结构的时候的体会。

Sybase数据库恢复 2010-12-23

打赏
举报

[Quote=引用 10 楼 wpclub 的回复:]

引用 9 楼 dawugui 的回复:

引用 8 楼 wpclub 的回复:
额，谢谢楼上各位了

想做这个最初起因就是一直在看sqlite的代码，sqlite的数据库文件中的相关格式都是给定的，那么直接sqlite的数据库文件对应的位置表示什么意思，就可以知道相关的数据库表结构等等

所以就有了想解析一下mdf文件的目的，只可惜现在水平太次了，没法去做，呵呵
没用sql……
[/Quote]

不可能有这种工具的。
要是sqlserver的mdf和oracle的dbf互相转化，那么哥们你发财了，是两个工具一个比sqlserver-internals更牛，一个比DUL更牛。

null0zero 2010-12-23

打赏
举报

[Quote=引用 9 楼 dawugui 的回复:]

引用 8 楼 wpclub 的回复:
额，谢谢楼上各位了

想做这个最初起因就是一直在看sqlite的代码，sqlite的数据库文件中的相关格式都是给定的，那么直接sqlite的数据库文件对应的位置表示什么意思，就可以知道相关的数据库表结构等等

所以就有了想解析一下mdf文件的目的，只可惜现在水平太次了，没法去做，呵呵
没用sqlite,帮顶.
[/Quote]

对了，那我想问一下，我记得以前有工具，可以对几种不同数据库的格式文件进行转换，比如将sqlserver数据库的mdf文件转换为oracle的数据库文件格式等等，这种软件要做的话，具体原理是怎么实现的呢？
ls的再给指点一下吧？

dawugui 2010-12-23

打赏
举报

[Quote=引用 8 楼 wpclub 的回复:]
额，谢谢楼上各位了

想做这个最初起因就是一直在看sqlite的代码，sqlite的数据库文件中的相关格式都是给定的，那么直接sqlite的数据库文件对应的位置表示什么意思，就可以知道相关的数据库表结构等等

所以就有了想解析一下mdf文件的目的，只可惜现在水平太次了，没法去做，呵呵
[/Quote]没用sqlite,帮顶.

null0zero 2010-12-23

打赏
举报

额，谢谢楼上各位了

想做这个最初起因就是一直在看sqlite的代码，sqlite的数据库文件中的相关格式都是给定的，那么直接sqlite的数据库文件对应的位置表示什么意思，就可以知道相关的数据库表结构等等

所以就有了想解析一下mdf文件的目的，只可惜现在水平太次了，没法去做，呵呵

dawugui 2010-12-23

打赏
举报

这是以前某人写的:

SQLserver数据文件(MDF)的页面文件头结构剖析

先执行一下以下SQL语句，我的测试环境为SQL2005

SQL code dbcc traceon(3604)
go
dbcc page(master,1,0,2)

可以看到MDF文件的一些物理结构信息，其中包括重要的头96个字节。也就是第一个页面的文件头。
........
PAGE HEADER:

Page @0x03FA0000
m_pageId = (1:0) m_headerVersion = 1 m_type = 15
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x8
m_objId (AllocUnitId.idObj) = 99 m_indexId (AllocUnitId.idInd) = 0 Metadata: AllocUnitId = 6488064
Metadata: PartitionId = 0 Metadata: IndexId = 0 Metadata: ObjectId = 99
m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 0
m_slotCnt = 1 m_freeCnt = 7937 m_freeData = 3059
m_reservedCnt = 0 m_lsn = (149:448:1) m_xactReserved = 0
m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = -1073741694
........
DATA:

Memory Dump @0x62FEC000
62FEC000: 010f0000 08000000 00000000 00000000 ?................
62FEC010: 00000000 00000100 63000000 011ff30b ?........c.......
62FEC020: 00000000 01000000 95000000 c0010000 ?................
62FEC030: 01000000 00000000 00000000 820000c0 ?................
62FEC040: 00000000 00000000 00000000 00000000 ?................
62FEC050: 00000000 00000000 00000000 00000000 ?................

以上蓝色的文字就是文件头的一些信息。如果这些信息损坏将会造成严重的后果。

经过简单的逐个字节分析，中间借助了windows计算器和c#的BitConverter.GetBytes函数。
得出了如下文件结构图，其中每行4个字节，一共分析了文件头的前64个字节。

(0133--o_sqlHeader.jpg)

在数据库的头96个字节中第0x40开始直道0x5F应该都是0。
我发现只有测试页的m_pageId 的冒号前面的数字不为1时才在0x40到0x5f写入数据。但是具体代表什么还没有看出来。

姑且认为数据库第一个页面的0x00-0x3f就如上图所示，0x40-0x5f都为0（不正确的话请纠正一下）

这张图有什么用呢，如果你理解了上述参数的意义，用二进制编辑器打开一个头文件损坏的mdf文件就有可能恢复这个已经损坏的数据库。

偶不是dba也不是专业恢复数据的，只是个普通的开发人员，怎么恢复还请有经验人士补充一下。

有情提醒，这些东西非常危险，请不要随意测试，最好找一个没用的数据库来研究。

这里补充一下参数的意义

HTML codem_pageId
This identifies the file number the page is part of and the position within the file. (1:143) means page 143 in file 1.
m_headerVersion
This is the page header version. Since version 7.0 this value has always been 1.
m_typea
This is the page type. The values you're likely to see are:
1 - data page. This holds data records in a heap or clustered index leaf-level.
2 - index page. This holds index records in the upper levels of a clustered index and all levels of non-clustered indexes.
3 - text mix page. A text page that holds small chunks of LOB values plus internal parts of text tree. These can be shared between LOB values in the same partition of an index or heap.
4 - text tree page. A text page that holds large chunks of LOB values from a single column value.
7 - sort page. A page that stores intermediate results during a sort operation.
8 - GAM page. Holds global allocation information about extents in a GAM interval (every data file is split into 4GB chunks - the number of extents that can be represented in a bitmap on a single database page). Basically whether an extent is allocated or not. GAM = Global Allocation Map. The first one is page 2 in each file. More on these in a later post.
9 - SGAM page. Holds global allocation information about extents in a GAM interval. Basically whether an extent is available for allocating mixed-pages. SGAM = Shared GAM. the first one is page 3 in each file. More on these in a later post.
10 - IAM page. Holds allocation information about which extents within a GAM interval are allocated to an index or allocation unit, in SQL Server 2000 and 2005 respectively. IAM = Index Allocation Map. More on these in a later post.
11 - PFS page. Holds allocation and free space information about pages within a PFS interval (every data file is also split into approx 64MB chunks - the number of pages that can be represented in a byte-map on a single database page. PFS = Page Free Space. The first one is page 1 in each file. More on these in a later post.
13 - boot page. Holds information about the database. There's only one of these in the database. It's page 9 in file 1.
15 - file header page. Holds information about the file. There's one per file and it's page 0 in the file.
16 - diff map page. Holds information about which extents in a GAM interval have changed since the last full or differential backup. The first one is page 6 in each file.
17 - ML map page. Holds information about which extents in a GAM interval have changed while in bulk-logged mode since the last backup. This is what allows you to switch to bulk-logged mode for bulk-loads and index rebuilds without worrying about breaking a backup chain. The first one is page 7 in each file.
m_typeFlagBits
This is mostly unused. For data and index pages it will always be 4. For all other pages it will always be 0 - except PFS pages. If a PFS page has m_typeFlagBits of 1, that means that at least one of the pages in the PFS interval mapped by the PFS page has at least one ghost record.
m_level
This is the level that the page is part of in the b-tree.
Levels are numbered from 0 at the leaf-level and increase to the single-page root level (i.e. the top of the b-tree).
In SQL Server 2000, the leaf level of a clustered index (with data pages) was level 0, and the next level up (with index pages) was also level 0. The level then increased to the root. So to determine whether a page was truly at the leaf level in SQL Server 2000, you need to look at the m_type as well as the m_level.
For all page types apart from index pages, the level is always 0.
m_flagBits
This stores a number of different flags that describe the page. For example, 0x200 means that the page has a page checksum on it (as our example page does) and 0x100 means the page has torn-page protection on it.
Some bits are no longer used in SQL Server 2005.
m_objId
m_indexId
In SQL Server 2000, these identified the actual relational object and index IDs to which the page is allocated. In SQL Server 2005 this is no longer the case. The allocation metadata totally changed so these instead identify what's called the allocation unit that the page belongs to (I'll do another post that describes these later today).
m_prevPage
m_nextPage
These are pointers to the previous and next pages at this level of the b-tree and store 6-byte page IDs.
The pages in each level of an index are joined in a doubly-linked list according to the logical order (as defined by the index keys) of the index. The pointers do not necessarily point to the immediately adjacent physical pages in the file (because of fragmentation).
The pages on the left-hand side of a b-tree level will have the m_prevPage pointer be NULL, and those on the right-hand side will have the m_nextPage be NULL.
In a heap, or if an index only has a single page, these pointers will both be NULL for all pages.
pminlen
This is the size of the fixed-length portion of the records on the page.
m_slotCnt
This is the count of records on the page.
m_freeCnt
This is the number of bytes of free space in the page.
m_freeData
This is the offset from the start of the page to the first byte after the end of the last record on the page. It doesn't matter if there is free space nearer to the start of the page.
m_reservedCnt
This is the number of bytes of free space that has been reserved by active transactions that freed up space on the page. It prevents the free space from being used up and allows the transactions to roll-back correctly. There's a very complicated algorithm for changing this value.
m_lsn
This is the Log Sequence Number of the last log record that changed the page.
m_xactReserved
This is the amount that was last added to the m_reservedCnt field.
m_xdesId
This is the internal ID of the most recent transaction that added to the m_reservedCnt field.
m_ghostRecCnt
The is the count of ghost records on the page.
m_tornBits
This holds either the page checksum or the bits that were displaced by the torn-page protection bits - depending on what form of page protection is turnde on for the database.

王向飞 2010-12-23

打赏
举报

除非你进入企业管理器用SQL查，至今无人破解，或许你能

dawugui 2010-12-23

打赏
举报

如果仅仅用mdf文件来做,是做不到的.

billpu 2010-12-23

打赏
举报

单mdf...几乎是不可能的

飘零一叶 2010-12-23

打赏
举报

这个真不知道，问问微软吧，他们肯定行

dawugui 2010-12-23

打赏
举报

[Quote=引用楼主 wpclub 的回复:]
小弟想做一个解析mdf文件的工作，主要就是根据sqlserver数据库文件解析出其中的表结构以及各个表之间的关系，不知道能不能做的到，有知道的大哥指点一下吧？谢谢了 :)
[/Quote]
各个表之间的关系用powerbuilder这个软件去看.

dawugui 2010-12-23

打赏
举报

[Quote=引用楼主 wpclub 的回复:]
小弟想做一个解析mdf文件的工作，主要就是根据sqlserver数据库文件解析出其中的表结构以及各个表之间的关系，不知道能不能做的到，有知道的大哥指点一下吧？谢谢了 :)
[/Quote]各个表结构如下:
--查询出各（某）表字段的属性



--sql server 2000

SELECT 

    表名       = case when a.colorder=1 then d.name else '' end,

    表说明     = case when a.colorder=1 then isnull(f.value,'') else '' end,

    字段序号   = a.colorder,

    字段名     = a.name,

    标识       = case when COLUMNPROPERTY( a.id,a.name,'IsIdentity')=1 then '√'else '' end,

    主键       = case when exists(SELECT 1 FROM sysobjects where xtype='PK' and parent_obj=a.id and name in (

                     SELECT name FROM sysindexes WHERE indid in(

                        SELECT indid FROM sysindexkeys WHERE id = a.id AND colid=a.colid))) then '√' else '' end,

    类型       = b.name,

    占用字节数 = a.length,

    长度       = COLUMNPROPERTY(a.id,a.name,'PRECISION'),

    小数位数   = isnull(COLUMNPROPERTY(a.id,a.name,'Scale'),0),

    允许空     = case when a.isnullable=1 then '√'else '' end,

    默认值     = isnull(e.text,''),

    字段说明   = isnull(g.[value],'')

FROM 

    syscolumns a

left join 

    systypes b 

on 

    a.xusertype=b.xusertype

inner join 

    sysobjects d 

on 

    a.id=d.id  and d.xtype='U' and  d.name<>'dtproperties'

left join 

    syscomments e 

on 

    a.cdefault=e.id

left join 

    sysproperties g 

on 

    a.id=g.id and a.colid=g.smallid  

left join 

    sysproperties f 

on 

    d.id=f.id and f.smallid=0

where 

    d.name='要查询的表'    --如果只查询指定表,加上此条件

order by 

    a.id,a.colorder



/*

表名    表说明 字段序号 字段名    标识 主键    类型    占用字节数 长度 小数位数 允许空  默认值      字段说明 

------- ----- -------  -------- ---- ------- ------ ------- --------------- ------ ---------- ---------- 

authors       1        au_id          √     id      11     11      0                          

              2        au_lname              varchar 40     40      0                          

              3        au_fname              varchar 20     20      0                          

              4        phone                 char    12     12      0              ('UNKNOWN') 

              5        address               varchar 40     40      0       √                 

              6        city                  varchar 20     20      0       √                 

              7        state                 char    2      2       0       √                 

              8        zip                   char    5      5       0       √                 

              9        contract              bit     1      1       0                          

（所影响的行数为 9 行）

*/



--sql server 2005

-- 1. 表结构信息查询 

-- ========================================================================

-- 表结构信息查询

-- 邹建 2005.08(引用请保留此信息)

-- ========================================================================

SELECT 

    TableName=CASE WHEN C.column_id=1 THEN O.name ELSE N'' END,

    TableDesc=ISNULL(CASE WHEN C.column_id=1 THEN PTB.[value] END,N''),

    Column_id=C.column_id,

    ColumnName=C.name,

    PrimaryKey=ISNULL(IDX.PrimaryKey,N''),

    [IDENTITY]=CASE WHEN C.is_identity=1 THEN N'√'ELSE N'' END,

    Computed=CASE WHEN C.is_computed=1 THEN N'√'ELSE N'' END,

    Type=T.name,

    Length=C.max_length,

    Precision=C.precision,

    Scale=C.scale,

    NullAble=CASE WHEN C.is_nullable=1 THEN N'√'ELSE N'' END,

    [Default]=ISNULL(D.definition,N''),

    ColumnDesc=ISNULL(PFD.[value],N''),

    IndexName=ISNULL(IDX.IndexName,N''),

    IndexSort=ISNULL(IDX.Sort,N''),

    Create_Date=O.Create_Date,

    Modify_Date=O.Modify_date

FROM sys.columns C

    INNER JOIN sys.objects O

        ON C.[object_id]=O.[object_id]

            AND O.type='U'

            AND O.is_ms_shipped=0

    INNER JOIN sys.types T

        ON C.user_type_id=T.user_type_id

    LEFT JOIN sys.default_constraints D

        ON C.[object_id]=D.parent_object_id

            AND C.column_id=D.parent_column_id

            AND C.default_object_id=D.[object_id]

    LEFT JOIN sys.extended_properties PFD

        ON PFD.class=1 

            AND C.[object_id]=PFD.major_id 

            AND C.column_id=PFD.minor_id

--             AND PFD.name='Caption'  -- 字段说明对应的描述名称(一个字段可以添加多个不同name的描述)

    LEFT JOIN sys.extended_properties PTB

        ON PTB.class=1 

            AND PTB.minor_id=0 

            AND C.[object_id]=PTB.major_id

--             AND PFD.name='Caption'  -- 表说明对应的描述名称(一个表可以添加多个不同name的描述) 

    LEFT JOIN                       -- 索引及主键信息

    (

        SELECT 

            IDXC.[object_id],

            IDXC.column_id,

            Sort=CASE INDEXKEY_PROPERTY(IDXC.[object_id],IDXC.index_id,IDXC.index_column_id,'IsDescending')

                WHEN 1 THEN 'DESC' WHEN 0 THEN 'ASC' ELSE '' END,

            PrimaryKey=CASE WHEN IDX.is_primary_key=1 THEN N'√'ELSE N'' END,

            IndexName=IDX.Name

        FROM sys.indexes IDX

        INNER JOIN sys.index_columns IDXC

            ON IDX.[object_id]=IDXC.[object_id]

                AND IDX.index_id=IDXC.index_id

        LEFT JOIN sys.key_constraints KC

            ON IDX.[object_id]=KC.[parent_object_id]

                AND IDX.index_id=KC.unique_index_id

        INNER JOIN  -- 对于一个列包含多个索引的情况,只显示第1个索引信息

        (

            SELECT [object_id], Column_id, index_id=MIN(index_id)

            FROM sys.index_columns

            GROUP BY [object_id], Column_id

        ) IDXCUQ

            ON IDXC.[object_id]=IDXCUQ.[object_id]

                AND IDXC.Column_id=IDXCUQ.Column_id

                AND IDXC.index_id=IDXCUQ.index_id

    ) IDX

        ON C.[object_id]=IDX.[object_id]

            AND C.column_id=IDX.column_id 

-- WHERE O.name=N'要查询的表'       -- 如果只查询指定表,加上此条件

ORDER BY O.name,C.column_id 



-- 2. 索引及主键信息 

-- ========================================================================

-- 索引及主键信息

-- 邹建 2005.08(引用请保留此信息)

-- ========================================================================

SELECT 

    TableId=O.[object_id],

    TableName=O.Name,

    IndexId=ISNULL(KC.[object_id],IDX.index_id),

    IndexName=IDX.Name,

    IndexType=ISNULL(KC.type_desc,'Index'),

    Index_Column_id=IDXC.index_column_id,

    ColumnID=C.Column_id,

    ColumnName=C.Name,

    Sort=CASE INDEXKEY_PROPERTY(IDXC.[object_id],IDXC.index_id,IDXC.index_column_id,'IsDescending')

        WHEN 1 THEN 'DESC' WHEN 0 THEN 'ASC' ELSE '' END,

    PrimaryKey=CASE WHEN IDX.is_primary_key=1 THEN N'√'ELSE N'' END,

    [UQIQUE]=CASE WHEN IDX.is_unique=1 THEN N'√'ELSE N'' END,

    Ignore_dup_key=CASE WHEN IDX.ignore_dup_key=1 THEN N'√'ELSE N'' END,

    Disabled=CASE WHEN IDX.is_disabled=1 THEN N'√'ELSE N'' END,

    Fill_factor=IDX.fill_factor,

    Padded=CASE WHEN IDX.is_padded=1 THEN N'√'ELSE N'' END

FROM sys.indexes IDX

    INNER JOIN sys.index_columns IDXC

        ON IDX.[object_id]=IDXC.[object_id]

            AND IDX.index_id=IDXC.index_id

    LEFT JOIN sys.key_constraints KC

        ON IDX.[object_id]=KC.[parent_object_id]

            AND IDX.index_id=KC.unique_index_id

    INNER JOIN sys.objects O

        ON O.[object_id]=IDX.[object_id]

    INNER JOIN sys.columns C

        ON O.[object_id]=C.[object_id]

            AND O.type='U'

            AND O.is_ms_shipped=0

            AND IDXC.Column_id=C.Column_id

--    INNER JOIN  -- 对于一个列包含多个索引的情况,只显示第1个索引信息

--    (

--        SELECT [object_id], Column_id, index_id=MIN(index_id)

--        FROM sys.index_columns

--        GROUP BY [object_id], Column_id

--    ) IDXCUQ

--        ON IDXC.[object_id]=IDXCUQ.[object_id]

--            AND IDXC.Column_id=IDXCUQ.Column_id