社区
Delphi
帖子详情
'Table does not indexed'是怎么回事?
天剑68
2000-02-28 10:52:00
本人在delphi程序中,引用一个odbc中设置的ms visual foxpro数据源,类型是'free table directory'.在程序中放一个Table指向其中一个表,程序运行时打开表。程序运行多次后,提示‘table dose not indexed’,修改源程序时,把table的active=true一下,也提示相同错误。请问怎么回事呢?
...全文
228
4
打赏
收藏
'Table does not indexed'是怎么回事?
本人在delphi程序中,引用一个odbc中设置的ms visual foxpro数据源,类型是'free table directory'.在程序中放一个Table指向其中一个表,程序运行时打开表。程序运行多次后,提示‘table dose not indexed’,修改源程序时,把table的active=true一下,也提示相同错误。请问怎么回事呢?
复制链接
扫一扫
分享
转发到动态
举报
写回复
配置赞助广告
用AI写文章
4 条
回复
切换为时间正序
请发表友善的回复…
发表回复
打赏红包
天剑68
2000-03-08
打赏
举报
回复
我已经发现问题了,是ODBC版本的事!
kxy
2000-02-29
打赏
举报
回复
表没有建立索引.
天剑68
2000-02-29
打赏
举报
回复
表建有索引。
而且不是在bde中设置。
Tine
2000-02-29
打赏
举报
回复
没有建立主键。看看BDE配置是否正确
微软内部资料-SQL性能优化5
Contents Overview 1 Lesson 1:
Ind
ex Concepts 3 Lesson 2: Concepts – Statistics 29 Lesson 3: Concepts – Query Optimization 37 Lesson 4: Information Collection and Analysis 61 Lesson 5: Formulating and Implementing Resolution 75 Module 6: Troubleshooting Query Performance Overview At the end of this module, you will be able to: Describe the different types of
ind
exe
s and how
ind
exe
s can be used to improve performance. Describe what statistics are used for and how they can help in optimizing query performance. Describe how queries are optimized. Analyze the information collected from various tools. Formulate resolution to query performance problems. Lesson 1:
Ind
ex Concepts
Ind
exe
s are the most useful tool for improving query performance. Without a useful
ind
ex, Microsoft® SQL Server™ must search every row on every page in
table
to f
ind
the rows to return. With a multi
table
query, SQL Server must sometimes search a
table
multiple times so each page is scanned much more than once. Having useful
ind
exe
s speeds up f
ind
ing
ind
ividual rows in a
table
, as well as f
ind
ing the matching rows needed to join two
table
s. What You Will Learn After completing this lesson, you will be able to: Understand the structure of SQL Server
ind
exe
s. Describe how SQL Server uses
ind
exe
s to f
ind
rows. Describe how fillfactor can impact the performance of data retrieval and insertion. Describe the different types of fragmentation that can occur within an
ind
ex. Recommended Reading Chapter 8: “
Ind
exe
s”, Inside SQL Server 2000 by Kalen Delaney Chapter 11: “Batches, Stored Procedures and Functions”, Inside SQL Server 2000 by Kalen Delaney F
ind
ing Rows without
Ind
exe
s With No
Ind
exe
s, A
Table
Must Be Scanned SQL Server keeps track of which pages belong to a
table
or
ind
ex by using IAM pages. If there is no clustered
ind
ex, there is a sys
ind
exe
s row for the
table
with an
ind
id value of 0, and that row will keep track of the address of the first IAM for the
table
. The IAM is a giant bitmap, and every 1 bit
ind
icates that the corresponding extent belongs to the
table
. The IAM allows SQL Server to do efficient prefetching of the
table
’s extents, but every row still must be examined. General
Ind
ex Structure All SQL Server
Ind
exe
s Are Organized As B-Trees
Ind
exe
s in SQL Server store their information using standard B-trees. A B-tree provides fast access to data by searching on a key value of the
ind
ex. B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree’s usefulness. The trees are managed, and branches are grafted as necessary, so that navigating down the tree to f
ind
a value and locate a specific record takes only a few page accesses. Because the trees are balanced, f
ind
ing any record requires about the same amount of resources, and retrieval speed is consistent because the
ind
ex has the same depth throughout. Clustered and Nonclustered
Ind
exe
s Both
Ind
ex Types Have Many Common Features An
ind
ex consists of a tree with a root from which the navigation begins, possible intermediate
ind
ex levels, and bottom-level leaf pages. You use the
ind
ex to f
ind
the correct leaf page. The number of levels in an
ind
ex will vary depending on the number of rows in the
table
and the size of the key column or columns for the
ind
ex. If you create an
ind
ex using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the
ind
ex. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to accessing the whole
table
. An
ind
ex can have multiple node levels An
ind
ex page above the leaf is called a node page. Each
ind
ex row in node pages contains an
ind
ex key (or set of keys for a composite
ind
ex) and a pointer to a page at the next level for which the first key value is the same as the key value in the current
ind
ex row. Leaf Level contains all key values In any
ind
ex, whether clustered or nonclustered, the leaf level contains every key value, in key sequence. In SQL Server 2000, the sequence can be either ascending or descending. The sys
ind
exe
s
table
contains all sizing, location and distribution information Any information about size of
ind
exe
s or
table
s is stored in sys
ind
exe
s. The only source of any storage location information is the sys
ind
exe
s
table
, which keeps track of the address of the root page for every
ind
ex, and the first IAM page for the
ind
ex or
table
. There is also a column for the first page of the
table
, but this is not guaranteed to be reliable. SQL Server can f
ind
all pages belonging to an
ind
ex or
table
by examining the IAM pages. Sys
ind
exe
s contains a pointer to the first IAM page, and each IAM page contains a pointer to the next one. The Difference between Clustered and Nonclustered
Ind
exe
s The main difference between the two types of
ind
exe
s is how much information is stored at the leaf. The leaf levels of both types of
ind
exe
s contain all the key values in order, but they also contain other information. Clustered
Ind
exe
s The Leaf Level of a Clustered
Ind
ex Is the Data The leaf level of a clustered
ind
ex contains the data pages, not just the
ind
ex keys. Another way to say this is that the data itself is part of the clustered
ind
ex. A clustered
ind
ex keeps the data in a
table
ordered around the key. The data pages in the
table
are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the
ind
ex key or keys. Deciding which key to cluster on is an important performance consideration. When the
ind
ex is traversed to the leaf level, the data itself has been retrieved, not simply pointed to. Uniqueness Is Maintained In Key Values In SQL Server 2000, all clustered
ind
exe
s are unique. If you build a clustered
ind
ex without specifying the unique keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You can see this extra value if you use DBCC PAGE to look at the actual
ind
ex rows the section on
ind
exe
s internal. . F
ind
ing Rows in a Clustered
Ind
ex The Leaf Level of a Clustered
Ind
ex Contains the Data A clustered
ind
ex is like a telephone directory in which all of the rows for customers with the same last name are clustered together in the same part of the book. Just as the organization of a telephone directory makes it easy for a person to search, SQL Server quickly searches a
table
with a clustered
ind
ex. Because a clustered
ind
ex determines the sequence in which rows are stored in a
table
, there can only be one clustered
ind
ex for a
table
at a time. Performance Considerations Keeping your clustered key value small increases the number of
ind
ex rows that can be placed on an
ind
ex page and decreases the number of levels that must be traversed. This minimizes I/O. As we’ll see, the clustered key is duplicated in every nonclustered
ind
ex row, so keeping your clustered key small will allow you to have more
ind
ex fit per page in all your
ind
exe
s. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname = ‘Ota’ Nonclustered
Ind
exe
s The Leaf Level of a Nonclustered
Ind
ex Contains a Bookmark A nonclustered
ind
ex is like the
ind
ex of a textbook. The data is stored in one place and the
ind
ex is stored in another. Pointers
ind
icate the storage location of the
ind
exe
d items in the underlying
table
. In a nonclustered
ind
ex, the leaf level contains each
ind
ex key, plus a bookmark that tells SQL Server where to f
ind
the data row corresponding to the key in the
ind
ex. A bookmark can take one of two forms: If the
table
has a clustered
ind
ex, the bookmark is the clustered
ind
ex key for the corresponding data row. This clustered key can be multiple column if the clustered
ind
ex is composite, or is defined to be non-unique. If the
table
is a heap (in other words, it has no clustered
ind
ex), the bookmark is a RID, which is an actual row locator in the form File#:Page#:Slot#. F
ind
ing Rows with a NC
Ind
ex on a Heap Nonclustered
Ind
exe
s Are Very Efficient When Searching For A Single Row After the nonclustered key at the leaf level of the
ind
ex is found, only one more page access is needed to f
ind
the data row. Searching for a single row using a nonclustered
ind
ex is almost as efficient as searching for a single row in a clustered
ind
ex. However, if we are searching for multiple rows, such as duplicate values, or keys in a range, anything more than a small number of rows will make the nonclustered
ind
ex search very inefficient. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname BETWEEN ‘Master’ AND ‘Rudd’ F
ind
ing Rows with a NC
Ind
ex on a Clustered
Table
A Clustered Key Is Used as the Bookmark for All Nonclustered
Ind
exe
s If the
table
has a clustered
ind
ex, all columns of the clustered key will be duplicated in the nonclustered
ind
ex leaf rows, unless there is overlap between the clustered and nonclustered key. For example, if the clustered
ind
ex is on (lastname, firstname) and a nonclustered
ind
ex is on firstname, the firstname value will not be duplicated in the nonclustered
ind
ex leaf rows. Note The query corresponding to the slide is: SELECT lastname, firstname, phone FROM member WHERE firstname = ‘Mike’ Covering
Ind
exe
s A Covering
Ind
ex Provides the Fastest Data Access A covering
ind
ex contains ALL the fields accessed in the query. Normally, only the columns in the WHERE clause are helpful in determining useful
ind
exe
s, but for a covering
ind
ex, all columns must be included. If all columns needed for the query are in the
ind
ex, SQL Server never needs to access the data pages. If even one column in the query is not part of the
ind
ex, the data rows must be accessed. The leaf level of an
ind
ex is the only level that contains every key value, or set of key values. For a clustered
ind
ex, the leaf level is the data itself, so in reality, a clustered
ind
ex ALWAYS covers any query. Nevertheless, for most of our optimization discussions, we only consider nonclustered
ind
exe
s. Scanning the leaf level of a nonclustered
ind
ex is almost always faster than scanning a clustered
ind
ex, so covering
ind
exe
s are particular valuable when we need ALL the key values of a particular nonclustered
ind
ex. Example: Select an aggregate value of a column with a clustered
ind
ex. Suppose we have a nonclustered
ind
ex on price, this query is covered: SELECT avg(price) from titles Since the clustered key is included in every nonclustered
ind
ex row, the clustered key can be included in the covering. Suppose you have a nonclustered
ind
ex on price and a clustered
ind
ex on title_id; then this query is covered: SELECT title_id, price FROM titles WHERE price between 10 and 20 Performance Considerations In general, you do want to keep your
ind
exe
s narrow. However, if you have a critical query that just is not giving you satisfactory performance no matter what you do, you should consider creating an
ind
ex to cover it, or adding one or two extra columns to an existing
ind
ex, so that the query will be covered. The leaf level of a nonclustered
ind
ex is like a ‘mini’ clustered
ind
ex, so you can have most of the benefits of clustering, even if there already is another clustered
ind
ex on the
table
. The tradeoff to adding more, wider
ind
exe
s for covering queries are the added disk space, and more overhead for updating those columns that are now part of the
ind
ex. Bug In general, SQL Server will detect when a query is covered, and detect the possible covering
ind
exe
s. However, in some cases, you must force SQL Server to use a covering
ind
ex by including a WHERE clause, even if the WHERE clause will return ALL the rows in the
table
. This is SHILOH bug #352079 Steps to reproduce 1. Make copy of orders
table
from Northw
ind
: USE Northw
ind
CREATE
TABLE
[NewOrders] ( [OrderID] [int] NOT NULL , [CustomerID] [nchar] (5) NULL , [EmployeeID] [int] NULL , [OrderDate] [datetime] NULL , [RequiredDate] [datetime] NULL , [ShippedDate] [datetime] NULL , [ShipVia] [int] NULL , [Freight] [money] NULL , [ShipName] [nvarchar] (40) NULL, [ShipAddress] [nvarchar] (60) , [ShipCity] [nvarchar] (15) NULL, [ShipRegion] [nvarchar] (15) NULL, [ShipPostalCode] [nvarchar] (10) NULL, [ShipCountry] [nvarchar] (15) NULL ) INSERT into NewOrders SELECT * FROM Orders 2. Build nc
ind
ex on OrderDate: create
ind
ex date
ind
ex on neworders(orderdate) 3. Test Query by looking at query plan: select orderdate from NewOrders The
ind
ex is being scanned, as expected. 4. Build an
ind
ex on orderId: create
ind
ex orderid_
ind
ex on neworders(orderID) 5. Test Query by looking at query plan: select orderdate from NewOrders Now the
TABLE
is being scanned, instead of the original
ind
ex!
Ind
ex Intersection Multiple
Ind
exe
s Can Be Used On A Single
Table
In versions prior to SQL Server 7, only one
ind
ex could be used for any
table
to process any single query. The only exception was a query involving an OR. In current SQL Server versions, multiple nonclustered
ind
exe
s can each be accessed, retrieving a set of keys with bookmarks, and then the result sets can be joined on the common bookmarks. The optimizer weighs the cost of performing the un
ind
exe
d join on the intermediate result sets, with the cost of only using one
ind
ex, and then scanning the entire result set from that single
ind
ex. Fillfactor and Performance Creating an
Ind
ex with a Low Fillfactor Delays Page Splits when Inserting DBCC SHOWCONTIG will show you a low value for “Avg. Page Density” when a low fillfactor has been specified. This is good for inserts and updates, because it will delay the need to split pages to make room for new rows. It can be bad for scans, because fewer rows will be on each page, and more pages must be read to access the same amount of data. However, this cost will be minimal if the scan density value is good.
Ind
ex Reorganization DBCC SHOWCONTIG Provides Lots of Information Here’s some sample output from running a basic DBCC SHOWCONTIG on the order details
table
in the Northw
ind
database: DBCC SHOWCONTIG scanning 'Order Details'
table
...
Table
: 'Order Details' (325576198);
ind
ex ID: 1, database ID:6
TABLE
level scan performed. - Pages Scanned................................: 9 - Extents Scanned..............................: 6 - Extent Switches..............................: 5 - Avg. Pages per Extent........................: 1.5 - Scan Density [Best Count:Actual Count].......: 33.33% [2:6] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 16.67% - Avg. Bytes Free per Page.....................: 673.2 - Avg. Page Density (full).....................: 91.68% By default, DBCC SHOWCONTIG scans the page chain at the leaf level of the specified
ind
ex and keeps track of the following values: Average number of bytes free on each page (Avg. Bytes Free per Page) Number of pages accessed (Pages scanned) Number of extents accessed (Extents scanned) Number of times a page had a lower page number than the previous page in the scan (This value for Out of order pages is not displayed, but is used for additional computations.) Number of times a page in the scan was on a different extent than the previous page in the scan (Extent switches) SQL Server also keeps track of all the extents that have been accessed, and then it determines how many gaps are in the used extents. An extent is identified by the page number of its first page. So, if extents 8, 16, 24, 32, and 40 make up an
ind
ex, there are no gaps. If the extents are 8, 16, 24, and 40, there is one gap. The value in DBCC SHOWCONTIG’s output called Extent Scan Fragmentation is computed by dividing the number of gaps by the number of extents, so in this example the Extent Scan Fragmentation is ¼, or 25 percent. A
table
using extents 8, 24, 40, and 56 has three gaps, and its Extent Scan Fragmentation is ¾, or 75 percent. The maximum number of gaps is the number of extents - 1, so Extent Scan Fragmentation can never be 100 percent. The value in DBCC SHOWCONTIG’s output called Logical Scan Fragmentation is computed by dividing the number of Out of order pages by the number of pages in the
table
. This value is meaningless in a heap. You can use either the Extent Scan Fragmentation value or the Logical Scan Fragmentation value to determine the general level of fragmentation in a
table
. The lower the value, the less fragmentation there is. Alternatively, you can use the value called Scan Density, which is computed by dividing the optimum number of extent switches by the actual number of extent switches. A high value means that there is little fragmentation. Scan Density is not valid if the
table
spans multiple files; therefore, it is less useful than the other values. SQL Server 2000 allows online defragmentation You can choose from several methods for removing fragmentation from an
ind
ex. You could rebuild the
ind
ex and have SQL Server allocate all new contiguous pages for you. To rebuild the
ind
ex, you can use a simple DROP
IND
EX and CREATE
IND
EX combination, but in many cases using these commands is less than optimal. In particular, if the
ind
ex is supporting a constraint, you cannot use the DROP
IND
EX command. Alternatively, you can use DBCC DBRE
IND
EX, which can rebuild all the
ind
exe
s on a
table
in one operation, or you can use the drop_existing clause along with CREATE
IND
EX. The drawback of these methods is that the
table
is unavailable while SQL Server is rebuilding the
ind
ex. When you are rebuilding only nonclustered
ind
exe
s, SQL Server takes a shared lock on the
table
, which means that users cannot make modifications, but other processes can SELECT from the
table
. Of course, those SELECT queries cannot take advantage of the
ind
ex you are rebuilding, so they might not perform as well as they would otherwise. If you are rebuilding a clustered
ind
ex, SQL Server takes an exclusive lock and does not allow access to the
table
, so your data is temporarily unavailable. SQL Server 2000 lets you defragment an
ind
ex without completely rebuilding it. DBCC
IND
EXDEFRAG reorders the leaf-level pages into physical order as well as logical order, but using only the pages that are already allocated to the leaf level. This command does an in-place ordering, which is similar to a sorting technique called bubble sort (you might be familiar with this technique if you've studied and compared various sorting algorithms). In-place ordering can reduce logical fragmentation to 2 percent or less, making an ordered scan through the leaf level much faster. DBCC
IND
EXDEFRAG also compacts the pages of an
ind
ex, based on the original fillfactor. The pages will not always end up with the original fillfactor, but SQL Server uses that value as a goal. The defragmentation process attempts to leave at least enough space for one average-size row on each page. In addition, if SQL Server cannot obtain a lock on a page during the compaction phase of DBCC
IND
EXDEFRAG, it skips the page and does not return to it. Any empty pages created as a result of compaction are removed. The algorithm SQL Server 2000 uses for DBCC
IND
EXDEFRAG f
ind
s the next physical page in a file belonging to the
ind
ex's leaf level and the next logical page in the leaf level to swap it with. To f
ind
the next physical page, the algorithm scans the IAM pages belonging to that
ind
ex. In a database spanning multiple files, in which a
table
or
ind
ex has pages on more than one file, SQL Server handles pages on different files separately. SQL Server f
ind
s the next logical page by scanning the
ind
ex's leaf level. After each page move, SQL Server drops all locks and saves the last key on the last page it moved. The next iteration of the algorithm uses the last key to f
ind
the next logical page. This process lets other users update the
table
and
ind
ex while DBCC
IND
EXDEFRAG is running. Let us look at an example in which an
ind
ex's leaf level consists of the following pages in the following logical order: 47 22 83 32 12 90 64 The first key is on page 47, and the last key is on page 64. SQL Server would have to scan the pages in this order to retrieve the data in sorted order. As its first step, DBCC
IND
EXDEFRAG would f
ind
the first physical page, 12, and the first logical page, 47. It would then swap the pages, using a temporary buffer as a holding area. After the first swap, the leaf level would look like this: 12 22 83 32 47 90 64 The next physical page is 22, which is also the next logical page, so no work would be necessary. DBCC
IND
EXDEFRAG would then swap the next physical page, 32, with the next logical page, 83: 12 22 32 83 47 90 64 After the next swap of 47 with 83, the leaf level would look like this: 12 22 32 47 83 90 64 Then, the defragmentation process would swap 64 with 83: 12 22 32 47 64 90 83 and 83 with 90: 12 22 32 47 64 83 90 At the end of the DBCC
IND
EXDEFRAG operation, the pages in the
table
or
ind
ex are not contiguous, but their logical order matches their physical order. Now, if the pages were accessed from disk in sorted order, the head would need to move in only one direction. Keep in m
ind
that DBCC
IND
EXDEFRAG uses only pages that are already part of the
ind
ex's leaf level; it allocates no new pages. In addition, defragmenting a large
table
can take quite a while, and you will get a report every 5 minutes about the estimated percentage completed. However, except for the locks on the pages being switched, this command needs no additional locks. All the
table
's other pages and
ind
exe
s are fully available for your applications to use during the defragmentation process. If you must completely rebuild an
ind
ex because you want a new fillfactor, or if simple defragmentation is not enough because you want to remove all fragmentation from your
ind
exe
s, another SQL Server 2000 improvement makes
ind
ex rebuilding less of an imposition on the rest of the system. SQL Server 2000 lets you create an
ind
ex in parallel—that is, using multiple processors—which drastically reduces the time necessary to perform the rebuild. The algorithm SQL Server 2000 uses, allows near-linear scaling with the number of processors you use for the rebuild, so four processors will take only one-fourth the time that one processor requires to rebuild an
ind
ex. System availability increases because the length of time that a
table
is unavailable decreases. Note that only the SQL Server 2000 Enterprise Edition supports parallel
ind
ex creation.
Ind
exe
s on Views and Computed Columns Building an
Ind
ex Gives the Data Physical Existence Normally, views are only logical and the rows comprising the view’s data are not generated until the view is accessed. The values for computed columns are typically not stored anywhere in the database; only the definition for the computation is stored and the computation is redone every time a computed column is accessed. The first
ind
ex on a view must be a clustered
ind
ex, so that the leaf level can hold all the actual rows that make up the view. Once that clustered
ind
ex has been build, and the view’s data is now physical, additional (nonclustered)
ind
exe
s can be built. An
ind
ex on a computed column can be nonclustered, because all we need to store is the
ind
ex key values. Common Prerequisites for
Ind
exe
d Views and
Ind
exe
s on Computed Columns In order for SQL Server to create use these special
ind
exe
s, you must have the seven SET options correctly specified: ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNING must be all ON NUMERIC_ROUNDABORT must be OFF Only deterministic expressions can be used in the definition of
Ind
exe
d Views or
ind
exe
s on Computed Columns. See the BOL for the list of deterministic functions and expressions. Property functions are available to check if a column or view meets the requirements and is
ind
exable. SELECT OBJECTPROPERTY (Object_id, ‘Is
Ind
exable’) SELECT COLUMNPROPERTY (Object_id, column_name , ‘Is
Ind
exable’ ) Schema B
ind
ing Guarantees That Object Definition Won’t Change A view can only be
ind
exe
d if it has been built with schema b
ind
ing. The SQL Server Optimizer Determines If the
Ind
exe
d View Can Be Used The query must request a subset of the data contained in the view. The ability of the optimizer to use the
ind
exe
d view even if the view is not directly referenced is available only in SQL Server 2000 Enterprise Edition. In Standard edition, you can create
ind
exe
d views, and you can select directly from them, but the optimizer will not choose to use them if they are not directly referenced. Examples of
Ind
exe
d Views: The best candidates for improvement by
ind
exe
d views are queries performing aggregations and joins. We will explain how the useful
ind
exe
d views may be created for these two major groups of queries. The considerations are valid also for queries and
ind
exe
d views using both joins and aggregations. -- Example: USE Northw
ind
-- Identify 5 products with overall biggest discount total. -- This may be expressed for example by two different queries: -- Q1. select TOP 5 ProductID, SUM(UnitPrice*Quantity)- SUM(UnitPrice*Quantity*(1.00-Discount)) Rebate from [order details] group by ProductID order by Rebate desc --Q2. select TOP 5 ProductID, SUM(UnitPrice*Quantity*Discount) Rebate from [order details] group by ProductID order by Rebate desc --The following
ind
exe
d view will be used to
exe
cute Q1. create view Vdiscount1 with schemab
ind
ing as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered
ind
ex VDiscount
Ind
on Vdiscount1 (ProductID) However, it will not be used by the Q2 because the
ind
exe
d view does not contain the SUM(UnitPrice*Quantity*Discount) aggregate. We can construct another
ind
exe
d view create view Vdiscount2 with schemab
ind
ing as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, SUM(UnitPrice*Quantity*Discount) SumDiscoutPrice2, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered
ind
ex VDiscount
Ind
on Vdiscount2 (ProductID) This view may be used by both Q1 and Q2. Observe that the
ind
exe
d view Vdiscount2 will have the same number of rows and only one more column compared to Vdiscount1, and it may be used by more queries. In general, try to design
ind
exe
d views that may be used by more queries. The following query asking for the order with the largest total discount -- Q3. select TOP 3 OrderID, SUM(UnitPrice*Quantity*Discount) OrderRebate from dbo.[order details] group By OrderID Q3 can use neither of the Vdiscount views because the column OrderID is not included in the view definition. To address this variation of the discount analysis query we may create a different
ind
exe
d view, similar to the query itself. An attempt to generalize the previous
ind
exe
d view Vdiscount2 so that all three queries Q1, Q2, and Q3 can take advantage of a single
ind
exe
d view would require a view with both OrderID and ProductID as grouping columns. Because the OrderID, ProductID combination is unique in the original order details
table
the resulting view would have as many rows as the original
table
and we would see no savings in using such view compared to using the original
table
. Consider the size of the resulting
ind
exe
d view. In the case of pure aggregation, the
ind
exe
d view may provide no significant performance gains if its size is close to the size of the original
table
. Complex aggregates (STDEV, VARIANCE, AVG) cannot participate in the
ind
ex view definition. However, SQL Server may use an
ind
exe
d view to
exe
cute a query containing AVG aggregate. Query containing STDEV or VARIANCE cannot use
ind
exe
d view to pre-compute these values. The next example shows a query producing the average price for a particular product -- Q4. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID group by ProductName, od.ProductID This is an example of
ind
exe
d view that will be considered by the SQL Server to answer the Q4 create view v3 with schemab
ind
ing as select od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) Price, COUNT_BIG(*) Count, SUM(od.Quantity) Units from dbo.[order details] od group by od.ProductID go create UNIQUE CLUSTERED
ind
ex iv3 on v3 (ProductID) go Observe that the view definition does not contain the
table
Products. The
ind
exe
d view does not need to contain all
table
s used in the query that uses the
ind
exe
d view. In addition, the following query (same as above Q4 only with one additional search condition) will use the same
ind
exe
d view. Observe that the added predicate references only columns from
table
s not present in the v3 view definition. -- Q5. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and p.ProductName like '%tofu%' group by ProductName, od.ProductID The following query cannot use the
ind
exe
d view because the added search condition od.UnitPrice>10 contains a column from the
table
in the view definition and the column is neither grouping column nor the predicate appears in the view definition. -- Q6. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID To contrast the Q6 case, the following query will use the
ind
exe
d view v3 since the added predicate is on the grouping column of the view v3. -- Q7. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.ProductID in (1,2,13,41) group by ProductName, od.ProductID -- The previous query Q6 will use the following
ind
exe
d view V4: create view V4 with schemab
ind
ing as select ProductName, od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units, COUNT_BIG(*) Count from dbo.[order details] od, dbo.Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID create unique clustered
ind
ex VDiscount
Ind
on V4 (ProductName, ProductID) The same
ind
ex on the view V4 will be used also for a query where a join to the
table
Orders is added, for example -- Q8. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 group by ProductName, od.ProductID We will show several modifications of the query Q8 and explain why such modifications cannot use the above view V4. -- Q8a. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>25 group by ProductName, od.ProductID 8a cannot use the
ind
exe
d view because of the where clause mismatch. Observe that
table
Orders does not participate in the
ind
exe
d view V4 definition. In spite of that, adding a predicate on this
table
will disallow using the
ind
exe
d view because the added predicate may eliminate additional rows participating in the aggregates as it is shown in Q8b. -- Q8b. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 and o.OrderDate>'01/01/1998' group by ProductName, od.ProductID Locking and
Ind
exe
s In General, You Should Let SQL Server Control the Locking within
Ind
exe
s The stored procedure sp_
ind
exoption lets you manually control the unit of locking within an
ind
ex. It also lets you disallow page locks or row locks within an
ind
ex. Since these options are available only for
ind
exe
s, there is no way to control the locking within the data pages of a heap. (But remember that if a
table
has a clustered
ind
ex, the data pages are part of the
ind
ex and are affected by the sp_
ind
exoption setting.) The
ind
ex options are set for each
table
or
ind
ex
ind
ividually. Two options, Allow Rowlocks and AllowPageLocks, are both set to TRUE initially for every
table
and
ind
ex. If both of these options are set to FALSE for a
table
, only full
table
locks are allowed. As described in Module 4, SQL Server determines at runtime whether to initially lock rows, pages, or the entire
table
. The locking of rows (or keys) is heavily favored. The type of locking chosen is based on the number of rows and pages to be scanned, the number of rows on a page, the isolation level in effect, the update activity going on, the number of users on the system needing memory for their own purposes, and so on. SAP databases frequently use sp_
ind
exoption to reduce deadlocks Setting vs. Querying In SQL Server 2000, the procedure sp_
ind
exoption should only be used for setting an
ind
ex option. To query an option, use the
IND
EXPROPERTY function. Lesson 2: Concepts – Statistics Statistics are the most important tool that the SQL Server query optimizer has to determine the ideal
exe
cution plan for a query. Statistics that are out of date or nonexistent seriously jeopardize query performance. SQL Server 2000 computes and stores statistics in a completely different format that all earlier versions of SQL Server. One of the improvements is an increased ability to determine which values are out of the normal range in terms of the number of occurrences. The new statistics maintenance routines are particularly good at determining when a key value has a very unusual skew of data. What You Will Learn After completing this lesson, you will be able to: Define terms related to statistics collected by SQL Server. Describe how statistics are maintained by SQL Server. Discuss the autostats feature of SQL Server. Describe how statistics are used in query optimization. Recommended Reading Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/techart/statquery.htm Definitions Cardinality The cardinality means how many unique values exist in the data. Density For each
ind
ex and set of column statistics, SQL Server keeps track of details about the uniqueness (or density) of the data values encountered, which provides a measure of how selective the
ind
ex is. A unique
ind
ex, of course, has the lowest density —by definition, each
ind
ex entry can point to only one row. A unique
ind
ex has a density value of 1/number of rows in the
table
. Density values range from 0 through 1. Highly selective
ind
exe
s have density values of 0.10 or lower. For example, a unique
ind
ex on a
table
with 8345 rows has a density of 0.00012 (1/8345). If a nonunique nonclustered
ind
ex has a density of 0.2165 on the same
table
, each
ind
ex key can be expected to point to about 1807 rows (0.2165 × 8345). This is probably not selective enough to be more efficient than just scanning the
table
, so this
ind
ex is probably not useful. Because driving the query from a nonclustered
ind
ex means that the pages must be retrieved in
ind
ex order, an estimated 1807 data page accesses (or logical reads) are needed if there is no clustered
ind
ex on the
table
and the leaf level of the
ind
ex contains the actual RID of the desired data row. The only time a data page doesn’t need to be reaccessed is when the occasional coincidence occurs in which two adjacent
ind
ex entries happen to point to the same data page. In general, you can think of density as the average number of duplicates. We can also talk about the term ‘join density’, which applies to the average number of duplicates in the foreign key column. This would answer the question: in this one-to-many relationship, how many is ‘many’? Selectivity In general selectivity applies to a particular data value referenced in a WHERE clause. High selectivity means that only a small percentage of the rows satisfy the WHERE clause filter, and a low selectivity means that many rows will satisfy the filter. For example, in an employees
table
, the column employee_id is probably very selective, and the column gender is probably not very selective at all. Statistics Statistics are a histogram consisting of an even sampling of values for a column or for an
ind
ex key (or the first column of the key for a composite
ind
ex) based on the current data. The histogram is stored in the statblob field of the sys
ind
exe
s
table
, which is of type image. (Remember that image data is actually stored in structures separate from the data row itself. The data row merely contains a pointer to the image data. For simplicity’s sake, we’ll talk about the
ind
ex statistics as being stored in the image field called statblob.) To fully estimate the usefulness of an
ind
ex, the optimizer also needs to know the number of pages in the
table
or
ind
ex; this information is stored in the dpages column of sys
ind
exe
s. During the second phase of query optimization,
ind
ex selection, the query optimizer determines whether an
ind
ex exists for a columns in your WHERE clause, assesses the
ind
ex’s usefulness by determining the selectivity of the clause (that is, how many rows will be returned), and estimates the cost of f
ind
ing the qualifying rows. Statistics for a single column
ind
ex consist of one histogram and one density value. The multicolumn statistics for one set of columns in a composite
ind
ex consist of one histogram for the first column in the
ind
ex and density values for each prefix combination of columns (including the first column alone). The fact that density information is kept for all columns helps the optimizer decide how useful the
ind
ex is for joins. Suppose, for example, that an
ind
ex is composed of three key fields. The density on the first column might be 0.50, which is not too useful. However, as you look at more key columns in the
ind
ex, the number of rows pointed to is fewer than (or in the worst case, the same as) the first column, so the density value goes down. If you are looking at both the first and second columns, the density might be 0.25, which is somewhat better. Moreover, if you examine three columns, the density might be 0.03, which is highly selective. It does not make sense to refer to the density of only the second column. The lead column density is always needed. Statistics Maintenance Statistics Information Tracks the Distribution of Key Values SQL Server statistics is basically a histogram that contains up to 200 values of a given key column. In addition to the histogram, the statblob field contains the following information: The time of the last statistics collection The number of rows used to produce the histogram and density information The average key length Densities for other combinations of columns In the statblob column, up to 200 sample values are stored; the range of key values between each sample value is called a step. The sample value is the endpoint of the range. Three values are stored along with each step: a value called EQ_ROWS, which is the number of rows that have a value equal to that sample value; a value called RANGE_ROWS, which specifies how many other values are inside the range (between two adjacent sample values); and the number of distinct values, or RANGE_DENSITY of the range. DBCC SHOW_STATISTICS The DBCC SHOW_STATISTICS output shows us the first two of these three values, but not the range density. The RANGE_DENSITY is instead used to compute two additional values: DISTINCT_RANGE_ROWS—the number of distinct rows inside this range (not counting the RANGE_HI_KEY value itself. This is computed as 1/RANGE_DENSITY. AVG_RANGE_ROWS—the average number of rows per distinct value, computed as RANGE_DENSITY * RANGE_ROWS. In addition to statistics on
ind
exe
s, SQL Server can also keep track of statistics on columns with no
ind
exe
s. Knowing the density, or the likelihood of a particular value occurring, can help the optimizer determine an optimum processing strategy, even if SQL Server can’t use an
ind
ex to actually locate the values. Statistics on Columns Column statistics can be useful for two main purposes When the SQL Server optimizer is determining the optimal join order, it frequently is best to have the smaller input processed first. By ‘input’ we mean
table
after all filters in the WHERE clause have been applied. Even if there is no useful
ind
ex on a column in the WHERE clause, statistics could tell us that only a few rows will quality, and those the resulting input will be very small. The SQL Server query optimizer can use column statistics on non-initial columns in a composite nonclustered
ind
ex to determine if scanning the leaf level to obtain the bookmarks will be an efficient processing strategy. For example, in the member
table
in the credit database, the first name column is almost unique. Suppose we have a nonclustered
ind
ex on (lastname, firstname), and we issue this query: select * from member where firstname = 'MPRO' In this case, statistics on the firstname column would
ind
icate very few rows satisfying this condition, so the optimizer will choose to scan the nonclustered
ind
ex, since it is smaller than the clustered
ind
ex (the
table
). The small number of bookmarks will then be followed to retrieve the actual data. Manually Updating Statistics You can also manually force statistics to be updated in one of two ways. You can run the UPDATE STATISTICS command on a
table
or on one specific
ind
ex or column statistics, or you can also
exe
cute the procedure sp_updatestats, which runs UPDATE STATISTICS against all user-defined
table
s in the current database. You can create statistics on un
ind
exe
d columns using the CREATE STATISTICS command or by
exe
cuting sp_createstats, which creates single-column statistics for all eligible columns for all user
table
s in the current database. This includes all columns except computed columns and columns of the ntext, text, or image datatypes, and columns that already have statistics or are the first column of an
ind
ex. Autostats By Default SQL Server Will Update Statistics on Any
Ind
ex or Column as Needed Every database is created with the database options auto create statistics and auto update statistics set to true, but you can turn either one off. You can also turn off automatic updating of statistics for a specific
table
in one of two ways: UPDATE STATISTICS In addition to updating the statistics, the option WITH NORECOMPUTE
ind
icates that the statistics should not be automatically recomputed in the future. Running UPDATE STATISTICS again without the WITH NORECOMPUTE option enables automatic updates. sp_autostats This procedure sets or unsets a flag for a
table
to
ind
icate that statistics should or should not be updated automatically. You can also use this procedure with only the
table
name to f
ind
out whether the
table
is set to automatically have its
ind
ex statistics updated. ' However, setting the database option auto update statistics to FALSE overrides any
ind
ividual
table
settings. In other words, no automatic updating of statistics takes place. This is not a recommended practice unless thorough testing has shown you that you do not need the automatic updates or that the performance overhead is more than you can afford. Trace Flags Trace flag 205 – reports recompile due to autostats. Trace flag 8721 – writes information to the errorlog when AutoStats has been run. For more information, see the following Knowledge Base article: Q195565 “INF: How SQL Server 7.0 Autostats Work.” Statistics and Performance The Performance Penalty of NOT Having Up-To-Date Statistics Far Outweighs the Benefit of Avoiding Automatic Updating Autostats should be turned off only after thorough testing shows it to be necessary. Because autostats only forces a recompile after a certain number or percentage of rows has been changed, you do not have to make any adjustments for a read-only database. Lesson 3: Concepts – Query Optimization What You Will Learn After completing this lesson, you will be able to: Describe the phases of query optimization. Discuss how SQL Server estimates the selectivity of
ind
exe
s and column and how this estimate is used in query optimization. Recommended Reading Chapter 15: “The Query Processor”, Inside SQL Server 2000 by Kalen Delaney Chapter 16: “Query Tuning”, Inside SQL Server 2000 by Kalen Delaney Whitepaper about SQL Server Query Processor Architecture by Hal Berenson and Kalen Delaney http://msdn.microsoft.com/library/backgrnd/html/sqlquerproc.htm Phases of Query Optimization Query Optimization Involves several phases Trivial Plan Optimization Optimization itself goes through several steps. The first step is something called Trivial Plan Optimization. The whole idea of trivial plan optimization is that cost based optimization is a bit expensive to run. The optimizer can try a great many possible variations trying to f
ind
the cheapest plan. If SQL Server knows that there is only one really viable plan for a query, it could avoid a lot of work. A prime example is a query that consists of an INSERT with a VALUES clause. There is only one possible plan. Another example is a SELECT where all the columns are in a unique covering
ind
ex, and that
ind
ex is the only one that is useable. There is no other
ind
ex that has that set of columns in it. These two examples are cases where SQL Server should just generate the plan and not try to f
ind
something better. The trivial plan optimizer f
ind
s the really obvious plans, which are typically very inexpensive. In fact, all the plans that get through the autoparameterization template result in plans that the trivial plan optimizer can f
ind
. Between those two mechanisms, the plans that are simple tend to be weeded out earlier in the process and do not pay a lot of the compilation cost. This is a good thing, because the number of potential plans in 7.0 went up astronomically as SQL Server added hash joins, merge joins and
ind
ex intersections, to its list of processing techniques. Simplification and Statistics Loading If a plan is not found by the trivial plan optimizer, SQL Server can perform some simplifications, usually thought of as syntactic transformations of the query itself, looking for commutative properties and operations that can be rearranged. SQL Server can do constant folding, and other operations that do not require looking at the cost or analyzing what
ind
exe
s are, but that can result in a more efficient query. SQL Server then loads up the metadata including the statistics information on the
ind
exe
s, and then the optimizer goes through a series of phases of cost based optimization. Cost Based Optimization Phases The cost based optimizer is designed as a set of transformation rules that try various permutations of
ind
exe
s and join strategies. Because of the number of potential plans in SQL Server 7.0 and SQL Server 2000, if the optimizer just ran through all the combinations and produced a plan, the optimization process would take a very long time to run. Therefore, optimization is broken up into phases. Each phase is a set of rules. After each phase is run, the cost of any resulting plan is examined, and if SQL Server determines that the plan is cheap enough, that plan is kept and
exe
cuted. If the plan is not cheap enough, the optimizer runs the next phase, which is another set of rules. In the vast majority of cases, a good plan will be found in the preliminary phases. Typically, if the plan that a query would have had in SQL Server 6.5 is also the optimal plan in SQL Server 7.0 and SQL Server 2000, the plan will tend to be found either by the trivial plan optimizer or by the first phase of the cost based optimizer. The rules were intentionally organized to try to make that be true. The plan will probably consist of using a single
ind
ex and using nested loops. However, every once in a while, because of lack of statistical information, or some other nuance, the optimizer will have to proceed with the later phases of optimization. Sometimes this is because there is a real possibility that the optimizer could f
ind
a better plan. When a plan is found, it becomes the optimizer’s output, and then SQL Server goes through all the caching mechanisms that we have already discussed in Module 5. Full Optimization At some point, the optimizer determines that it has gone through enough preliminary phases, and it reverts to a phase called full optimization. If the optimizer goes through all the preliminary phases, and still has not found a cheap plan, it examines the cost for the plan that it has so far. If the cost is above the threshold, the optimizer goes into a phase called full optimization. This threshold is configurable, as the configuration option ‘cost threshold for parallelism’. The full optimization phase assumes that this plan should be run this in parallel. If the machine is very busy, the plan will end up running it in serial, but the optimizer has a goal to produce a good parallel. If the cost is below the threshold (or a single processor machine), the full optimization phase just uses a brute force method to f
ind
a serial plan. Selectivity Estimation Selectivity Is One of The Most Important Pieces of Information One of the most import things the optimizer needs to know is the number of rows from any
table
that will meet all the conditions in the query. If there are no restrictions on a
table
, and all the rows will be needed, the optimizer can determine the number of rows from the sys
ind
exe
s
table
. This number is not absolutely guaranteed to be accurate, but it is the number the optimizer uses. If there is a filter on the
table
in a WHERE clause, the optimizer needs statistics information.
Ind
exe
s automatically maintain statistics, and the optimizer will use these values to determine the usefulness of the
ind
ex. If there is no
ind
ex on the column involved in the filter, then column statistics can be used or generated. Optimizing Search Arguments In General, the Filters in the WHERE Clause Determine Which
Ind
exe
s Will Be Useful If an
ind
exe
d column is referenced in a Search Argument (SARG), the optimizer will analyze the cost of using that
ind
ex. A SARG has the form: column
value value
column Operator must be one of =, >, >= <, <= The value can be a constant, an operation, or a variable. Some functions also will be treated as SARGs. These queries have SARGs, and a nonclustered
ind
ex on firstname will be used in most cases: select * from member where firstname < 'AKKG' select * from member where firstname = substring('HAAKGALSFJA', 2,5) select * from member where firstname = 'AA' + 'KG' declare @name char(4) set @name = 'AKKG' select * from member where firstname < @name Not all functions can be used in SARGs. select * from charge where charge_amt < 2*2 select * from charge where charge_amt < sqrt(16) Compare these queries to ones using = instead of <. With =, the optimizer can use the density information to come up with a good row estimate, even if it’s not going to actually perform the function’s calculations. A filter with a variable is usually a SARG The issue is, can the optimizer come up with useful costing information? A filter with a variable is not a SARG if the variable is of a different datatype, and the column must be converted to the variable’s datatype For more information, see the following Knowledge Base article: Q198625 Enter Title of KB Article Here Use credit go CREATE
TABLE
[member2] ( [member_no] [smallint] NOT NULL , [lastname] [shortstring] NOT NULL , [firstname] [shortstring] NOT NULL , [middleinitial] [letter] NULL , [street] [shortstring] NOT NULL , [city] [shortstring] NOT NULL , [state_prov] [statecode] NOT NULL , [country] [countrycode] NOT NULL , [mail_code] [mailcode] NOT NULL ) GO insert into member2 select member_no, lastname, firstname, middleinitial, street, city, state_prov, country, mail_code from member alter
table
member2 add constraint pk_member2 primary key clustered (lastname, member_no, firstname, country) declare @id int set @id = 47 update member2 set city = city + ' City', state_prov = state_prov + ' State' where lastname = 'Barr' and member_no = @id and firstname = 'URQYJBFVRRPWKVW' and country = 'USA' These queries don’t have SARGs, and a
table
scan will be done: select * from member where substring(lastname, 1,2) = ‘BA’ Some non-SARGs can be converted select * from member where lastname like ‘ba%’ In some cases, you can rewrite your query to turn a non-SARG into a SARG; for example, you can rewrite the substring query above and the LIKE query that follows it. Join Order and Types of Joins Join Order and Strategy Is Determined By the Optimizer The
exe
cution plan output will display the join order from top to bottom; i.e. the
table
listed on top is the first one accessed in a join. You can override the optimizer’s join order decision in two ways: OPTION (FORCE ORDER) applies to one query SET FORCEPLAN ON applies to entire session, until set OFF If either of these options is used, the join order is determined by the order the
table
s are listed in the query’s FROM clause, and no optimizer on JOIN ORDER is done. Forcing the JOIN order may force a particular join strategy. For example, in most outer join operations, the outer
table
is processed first, and a nested loops join is done. However, if you force the inner
table
to be accessed first, a merge join will need to be done. Compare the query plan for this query with and without the FORCE ORDER hint: select * from titles right join publishers on titles.pub_id = publishers.pub_id -- OPTION (FORCE ORDER) Nested Loop Join A nested iteration is when the query optimizer constructs a set of nested loops, and the result set grows as it progresses through the rows. The query optimizer performs the following steps. 1. F
ind
s a row from the first
table
. 2. Uses that row to scan the next
table
. 3. Uses the result of the previous
table
to scan the next
table
. Evaluating Join Combinations The query optimizer automatically evaluates at least four or more possible join combinations, even if those combinations are not specified in the join predicate. You do not have to add redundant clauses. The query optimizer balances the cost and uses statistics to determine the number of join combinations that it evaluates. Evaluating every possible join combination is inefficient and costly. Evaluating Cost of Query Performance When the query optimizer performs a nested join, you should be aware that certain costs are incurred. Nested loop joins are far superior to both merge joins and hash joins when
exe
cuting small transactions, such as those affecting only a small set of rows. The query optimizer: Uses nested loop joins if the outer input is quite small and the inner input is
ind
exe
d and quite large. Uses the smaller input as the outer
table
. Requires that a useful
ind
ex exist on the join predicate for the inner
table
. Always uses a nested loop join strategy if the join operation uses an operator other than an equality operator. Merge Joins The columns of the join conditions are used as inputs to process a merge join. SQL Server performs the following steps when using a merge join strategy: 1. Gets the first input values from each input set. 2. Compares input values. 3. Performs a merge algorithm. • If the input values are equal, the rows are returned. • If the input values are not equal, the lower value is discarded, and the next input value from that input is used for the next comparison. 4. Repeats the process until all of the rows from one of the input sets have been processed. 5. Evaluates any remaining search conditions in the query and returns only rows that qualify. Note Only one pass per input is done. The merge join operation ends after all of the input values of one input have been evaluated. The remaining values from the other input are not processed. Requires That Joined Columns Are Sorted If you
exe
cute a query with join operations, and the joined columns are in sorted order, the query optimizer processes the query by using a merge join strategy. A merge join is very efficient because the columns are already sorted, and it requires fewer page I/O. Evaluates Sorted Values For the query optimizer to use the merge join, the inputs must be sorted. The query optimizer evaluates sorted values in the following order: 1. Uses an existing
ind
ex tree (most typical). The query optimizer can use the
ind
ex tree from a clustered
ind
ex or a covered nonclustered
ind
ex. 2. Leverages sort operations that the GROUP BY, ORDER BY, and CUBE clauses use. The sorting operation only has to be performed once. 3. Performs its own sort operation in which a SORT operator is displayed when graphically viewing the
exe
cution plan. The query optimizer does this very rarely. Performance Considerations Consider the following facts about the query optimizer's use of the merge join: SQL Server performs a merge join for all types of join operations (except cross join or full join operations), including UNION operations. A merge join operation may be a one-to-one, one-to-many, or many-to-many operation. If the merge join is a many-to-many operation, SQL Server uses a temporary
table
to store the rows. If duplicate values from each input exist, one of the inputs rew
ind
s to the start of the duplicates as each duplicate value from the other input is processed. Query performance for a merge join is very fast, but the cost can be high if the query optimizer must perform its own sort operation. If the data volume is large and the desired data can be obtained presorted from existing Balanced-Tree (B-Tree)
ind
exe
s, merge join is often the fastest join algorithm. A merge join is typically used if the two join inputs have a large amount of data and are sorted on their join columns (for example, if the join inputs were obtained by scanning sorted
ind
exe
s). Merge join operations can only be performed with an equality operator in the join predicate. Hashing is a strategy for dividing data into equal sets of a manageable size based on a given property or characteristic. The grouped data can then be used to determine whether a particular data item matches an existing value. Note Duplicate data or ranges of data are not useful for hash joins because the data is not organized together or in order. When a Hash Join Is Used The query optimizer uses a hash join option when it estimates that it is more efficient than processing queries by using a nested loop or merge join. It typically uses a hash join when an
ind
ex does not exist or when existing
ind
exe
s are not useful. Assigns a Build and Probe Input The query optimizer assigns a build and probe input. If the query optimizer incorrectly assigns the build and probe input (this may occur because of imprecise density estimates), it reverses them dynamically. The ability to change input roles dynamically is called role reversal. Build input consists of the column values from a
table
with the lowest number of rows. Build input creates a hash
table
in memory to store these values. The hash bucket is a storage place in the hash
table
in which each row of the build input is inserted. Rows from one of the join
table
s are placed into the hash bucket where the hash key value of the row matches the hash key value of the bucket. Hash buckets are stored as a linked list and only contain the columns that are needed for the query. A hash
table
contains hash buckets. The hash
table
is created from the build input. Probe input consists of the column values from the
table
with the most rows. Probe input is what the build input checks to f
ind
a match in the hash buckets. Note The query optimizer uses column or
ind
ex statistics to help determine which input is the smaller of the two. Processing a Hash Join The following list is a simplified description of how the query optimizer processes a hash join. It is not intended to be comprehensive because the algorithm is very complex. SQL Server: 1. Reads the probe input. Each probe input is processed one row at a time. 2. Performs the hash algorithm against each probe input and generates a hash key value. 3. F
ind
s the hash bucket that matches the hash key value. 4. Accesses the hash bucket and looks for the matching row. 5. Returns the row if a match is found. Performance Considerations Consider the following facts about the hash joins that the query optimizer uses: Similar to merge joins, a hash join is very efficient, because it uses hash buckets, which are like a dynamic
ind
ex but with less overhead for combining rows. Hash joins can be performed for all types of join operations (except cross join operations), including UNION and DIFFERENCE operations. A hash operator can remove duplicates and group data, such as SUM (salary) GROUP BY department. The query optimizer uses only one input for both the build and probe roles. If join inputs are large and are of similar size, the performance of a hash join operation is similar to a merge join with prior sorting. However, if the size of the join inputs is significantly different, the performance of a hash join is often much faster. Hash joins can process large, unsorted, non-
ind
exe
d inputs efficiently. Hash joins are useful in complex queries because the intermediate results: • Are not
ind
exe
d (unless explicitly saved to disk and then
ind
exe
d). • Are often not sorted for the next operation in the
exe
cution plan. The query optimizer can identify incorrect estimates and make corrections dynamically to process the query more efficiently. A hash join reduces the need for database denormalization. Denormalization is typically used to achieve better performance by reducing join operations despite redundancy, such as inconsistent updates. Hash joins give you the option to vertically partition your data as part of your physical database design. Vertical partitioning represents groups of columns from a single
table
in separate files or
ind
exe
s. Subquery Performance Joins Are Not Inherently Better Than Subqueries Here is an example showing three different ways to update a
table
, using a second
table
for lookup purposes. The first uses a JOIN with the update, the second uses a regular introduced with IN, and the third uses a correlated subquery. All three yield nearly identical performance. Note Note that performance comparisons cannot just be made based on I/Os. With HASHING and MERGING techniques, the number of reads may be the same for two queries, yet one may take a lot longer and use more memory resources. Also, always be sure to monitor statistics time. Suppose you want to add a 5 percent discount to order items in the Order Details
table
for which the supplier is Exotic Liquids, whose supplierid is 1. -- JOIN solution BEGIN TRAN UPDATE OD SET discount = discount + 0.05 FROM [Order Details] AS OD JOIN Products AS P ON OD.productid = P.productid WHERE supplierid = 1 ROLLBACK TRAN -- Regular subquery solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE productid IN (SELECT productid FROM Products WHERE supplierid = 1) ROLLBACK TRAN -- Correlated Subquery Solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE EXISTS(SELECT supplierid FROM Products WHERE [Order Details].productid = Products.productid AND supplierid = 1) ROLLBACK TRAN Internally, Your Join May Be Rewritten SQL Server’s query processor had many different ways of resolving your JOIN expressions. Subqueries may be converted to a JOIN with an implied distinct, which may result in a logical operator of SEMI JOIN. Compare the plans of the first two queries: USE credit select member_no from member where member_no in (select member_no from charge) select distinct m.member_no from member m join charge c on m.member_no = c.member_no The second query uses a HASH MATCH as the final step to remove the duplicates. The first query only had to do a semi join. For these queries, although the I/O values are the same, the first query (with the subquery) runs much faster (almost twice as fast). Another similar looking join is
SQL Assistant v5.0
SoftTree SQL Assistant v5.0 - What's New New Code Analysis and Refactoring Functions - smart object, column, and parameter renaming methods; smart code extraction methods; advanced code dependencies analysis Graphical analysis of dependencies for objects (
table
s, views, procedures, functions),
table
and view columns, procedure and function parameters. Code refactoring - Extract View definition from procedural code - 3 clicks method for conversion of a block of code to a view - customizable templates. Code refactoring - Extract sub procedure definition from procedural code - 3 clicks method for conversion of a block of code to a stored procedure or function - customizable templates. Code refactoring - Rename procedure/function in database including automatic discovery of dependencies, code preview before updating, including all dependencies. Code refactoring - Rename
table
/view in database including automatic discovery of dependencies, code preview before updating, including all dependencies. Code refactoring - Rename
table
/view column view in database including automatic discovery of dependencies, code preview before updating, including all dependencies. Code refactoring - Rename procedure/function parameter in database including automatic discovery of dependencies, code preview before updating, including all dependencies. Code refactoring - rename local variable - rename local variables in procedures and functions (standalone code loaded in the editor or part of a large script). Syntax checker improvements - highlighting for unused variables. Advanced Code Entry Automation - updated code snippets engine, programmable code snippets using SQL language Support for user-defined interactive prompts. See new $PROMPT$ macro . User-defined programmable macros using SQL code. See new $$..$$ and accompanying macros $CURRENT_NAME$, $CURRENT_SEL$, etc... . Removed certain limitations on the use of functionally similar macros in the same snippet code. For example, $COLUMNS$ and $COLUMNS_KEYS$ macros can be now used in the same code snippet. Improved SQL Intellisense and Code Formatting Improved keyword prompts - preferred keywords and SQL constructs displayed on top of the keyword prompts. You can now customize preferred keywords list to match your preferences. Column popups after ORDER BY and GROUP BY keywords are now feature checkboxes. Multiple columns can be picked at once. While-you-type and standalone code formatters support new formatting options allowing simultaneous use of different cases for keywords and for system functions, for example uppercasing keywords and using mixed or lower case for system functions. New options for handling name delimiters. "Always Add Delimiters" option supports new mode "Only if name = keyword (limited)" You can now customize list of keywords that you also use as object and column names, and which you don't want SQL Assistant to treat as keywords, for example, the default configuration includes ID and Name, names, you can add your own. "Show Keys and
Ind
exe
d Columns" is preset for all assistance types by default. Primary keys, foreign keys, and
ind
exe
d columns are displayed by default in column popups and mouse-over hints. Mouse-over hints for
table
and view columns display hyperlinks for sample data preview and for DDL code view for views. . Mouse-over hints for procedure and function parameters display hyperlinks for procedure/function DDL code view. Improved display of column name popups for PostgreSQL. The popups now
ind
icate auto-generated values and
ind
exe
d columns. Improved recognition of auto-generated values in various SQL statements. For example, SQL Assistant does not insert references to "timestamp" columns in SQL server when generating code for
table
INSERT and UPDATE statements. Improved control of column name popups. You can now configure additional symbols that trigger automatic column name popups, for example, you may add <, <>, >, BETWEEN and other symbols to trigger automatic column popup display. SQL Intellisense implemented for CREATE
IND
EX statement. SQL Intellisense implemented for CREATE OR REPLACE AND COMPILE JAVA SOURCE. Preset code formatting rule added for Oracle VARRAY. Preset code formatting rule added for DEFAULT VALUES in INSERT/UPDATE statements in Transact-SQL. Database name and schema name completion is now supported after Ctrl+Space. That is handy if you work with long names and need to type them often. Behavior change for query suggestions based on code entry history. To avoid accidental insertion of historical queries into the code, historical items are not pre-selected automatically. . Added new option for controlling alias generation. You can now specify name prefixes that you want SQL Assistant to ignore. Database Code Unit Testing Framework New complete database code unit testing framework supports interactive and automated database code unit testing; allows quick bulk setup of unit tests for multiple database projects. New Add-ons Add-ons for Delphi with built-in SQL Editor. Add-ons are available for all Delphi W
ind
ows versions from Delphi 2005 to Delphi 2010;. Experimental support for pre-release version of Visual Studio 2010 ( limited testing performed on W
ind
ows XP and Vista). Data Import/Export/Generation New test data generator for quickly populating
table
s and schemas with realistic test data. Supports updates of multiple
table
s in one project. Enhanced scripting and exporting
table
data now this feature supports exporting data from multiple
table
s and views, as well as from multiple schemas. Miscellaneous Syntax Check Results and Messages panes support copying and saving messages. New user-friendly interface for protecting from run-away queries and results. Added handling for XML w
ind
ows in SQL Server Management Studio. SQL Intellisense is disabled in XML w
ind
ows. Improved SEH (Structure Exception Handler) for better code quality and support. Added new API functions, including new function for off-line code formatting, with support for multiple files. A number of other minor improvements
The Art of Assembly Language Programming
You are visitor as of October 17, 1996.The Art of Assembly Language ProgrammingForward Why Would Anyone Learn This Stuff?1 What's Wrong With Assembly Language2 What's Right With Assembly Language?3 Organization of This Text and Pedagogical Concerns4 Obtaining Program Source Listings and Other Materials in This TextSection One: Machine OrganizationArt of Assembly Language: Chapter OneChapter One - Data Representation1.0 - Chapter Overview1.1 - Numbering Systems1.1.1 - A Review of the Decimal System1.1.2 - The Binary Numbering System1.1.3 - Binary Formats1.2 - Data Organization1.2.1 - Bits1.2.2 - Nibbles1.2.3 - Bytes1.2.4 - Words1.2.5 - Double Words1.3 - The Hexadecimal Numbering System1.4 - Arithmetic Operations on Binary and Hexadecimal Numbers1.5 - Logical Operations on Bits1.6 - Logical Operations on Binary Numbers and Bit Strings1.7 - Signed and Unsigned Numbers1.8 - Sign and Zero Extension1.9 - Shifts and Rotates1.10 - Bit Fields and Packed Data1.11 - The ASCII Character Set1.12 Summary1.13 Laboratory
Exe
rcises1.13.1 Installing the Software1.13.2 Data Conversion
Exe
rcises1.13.3 Logical Operations
Exe
rcises1.13.4 Sign and Zero Extension
Exe
rcises1.13.5 Packed Data
Exe
rcises1.14 Questions1.15 Programming ProjectsChapter Two - Boolean Algebra2.0 - Chapter Overview2.1 - Boolean Algebra2.2 - Boolean Functions and Truth
Table
s2.3 - Algebraic Manipulation of Boolean Expressions2.4 - Canonical Forms2.5 - Simplification of Boolean Functions2.6 - What Does This Have To Do With Computers, Anyway?2.6.1 - Correspondence Between Electronic Circuits and Boolean Functions2.6.2 - Combinatorial Circuits2.6.3 - Sequential and Clocked Logic2.7 - Okay, What Does It Have To Do With Programming, Then?2.8 - Generic Boolean Functions2.9 Laboratory
Exe
rcises<
【Oracle】-【权限-ORA-04043】- object does not exist
用非dba账号(但赋予了DBA角色)登录一个新的10g数据库想看下版本号,SQL> desc v$instance;ERROR:ORA-04043: object "SYS"."V_$INSTANCE" does not ...
android sqlite 中 创建表 不要使用 "IF NOT EXISTS " +
TABLE
_NAME;
android sqlite 中 创建表 不要使用 "IF NOT EXISTS " +
TABLE
_NAME; 原因如下 Small. Fast. Reliable. Choose any three. CREATE
TABLE
create-
table
-stmt: column-def: type-name: column-constrain
Delphi
5,392
社区成员
262,732
社区内容
发帖
与我相关
我的任务
Delphi
Delphi 开发及应用
复制链接
扫一扫
分享
社区描述
Delphi 开发及应用
社区管理员
加入社区
获取链接或二维码
近7日
近30日
至今
加载中
查看更多榜单
社区公告
暂无公告
试试用AI创作助手写篇文章吧
+ 用AI写文章