关于错误"Procedure has not been executed or has no results"

freeliu 2007-01-24 01:57:15
现在有两个问题:
1、程序中循环调用某一个存储过程的时候,会出现题目中的错误。什么原因?
2、是不是每次调用完存储过程,都需要close procedure?

环境是pb+sql server 2000
...全文
1127 9 打赏 收藏 转发到动态 举报
写回复
用AI写文章
9 条回复
切换为时间正序
请发表友善的回复…
发表回复
freeliu 2007-01-25
  • 打赏
  • 举报
回复
再没有别的建议了么?
问题还没解决。
freeliu 2007-01-24
  • 打赏
  • 举报
回复
顶一下。
proglovercn 2007-01-24
  • 打赏
  • 举报
回复
估计是存储过程代码有考虑不周的地方。
在查询分析器中多执行几次,或者更换几个参数看看。
看看存储过程中是不是使用了游标,又没有If语句流程考虑不全面的地方。
------以上个人观点仅供参考
freeliu 2007-01-24
  • 打赏
  • 举报
回复
To:WangZWang(先来)
pb调用的时候,存储过程有没有返回结果集都可以。
当然,有结果集和没有结果集在程序的处理方式上有些差异。
freeliu 2007-01-24
  • 打赏
  • 举报
回复
服务器性能引发的错误,这句话我不太理解。
这个是系统错误提示。
WangZWang 2007-01-24
  • 打赏
  • 举报
回复
那就是使用的PB组件调用存储过程需要返回结果集,但你的
存储过程却不提供,这是前台开发语言运用的错误。
zlp321002 2007-01-24
  • 打赏
  • 举报
回复
1:看服务器性能.是否是数据库性能引发的错误.
2:这个信息系统错误还是存储过程返回的信息.
freeliu 2007-01-24
  • 打赏
  • 举报
回复
To:WangZWang(先来)
我的存储过程本身就没有返回结果。他的功能是更新数据。
存储过程在查询分析其中执行没有问题。
在代码中如果循环只执行一次,也没有问题。
WangZWang 2007-01-24
  • 打赏
  • 举报
回复
出现这个错误通常有两点:
1. 先确认存储过程是否正确?
2. 使用调用存储过程的组件需要有返回结果集,但
存储过程却没有返回结果集。

调用完存储过程,不需要close procedure
Oracle SQL Developer, v1.5.0.54.40 Release Notes 完整版下载:http://www.oracle.com/technology/global/cn/software/products/sql/index.html 1. Known Issues 1.1 General - Print prints only one page that is a truncation of the current tab. - Can't invoke SQL*Plus on Windows 2003. - The menu item, and right-click off a Connection node, for invoking SQL*Plus does not work with connections whose passwords are not persisted. 1.2 Connections - Cannot connect to remote database as OPS$ account. 1.3 Browse - If connected as sys with sysdba role, Types node displays built in data-types (e.g. BLOB, DATE, DECIMAL, etc.) If clicked on, will only see "create or replace". 1.4 Creating and Modifying Objects - Editing Triggers - If you have comments before the 'BEGIN' they will be lost if you edit. You will see when you click edit that they will not be there. To preserve them, they need to be below the BEGIN or you will need to edit via the SQL Worksheet. 1.5 Table > Data - Tables > Your_Table > Data - PageUp and PageDown buttons not working correctly if cursor is in the rownum column. 1.6 Export - Cannot export if result set contains duplicate column names. 2. Workarounds 2.1 To disable Code Insight Run SQL Developer from a command line using the following statement: Windows : sqldeveloper -J-Dsdev.insight=false Linux or Mac: Run sh sqldeveloper -J-Dsdev.insight=false or edit sqldeveloper.conf and add "AddVMOption -J-Dsdev.insight=false" 2.2 If DDL tab is null for all objects in a Connection Your dbms_metadata might be loaded incorrectly. If this statement fails when executed in a SQL Worksheet against the Connection select dbms_metadata.get_ddl('TABLE',table_name , user ) from user_tables; You need to reload $ORACLE_HOME/rdbms/admin/catmeta.sql 2.3 If Snippets are not accessible You may have not done a clean install. SQL Developer needs to be installed into a clean directory, not over a previous release. 3. Accessibility Issues The following is a li
Contents Overview 1 Lesson 1: Concepts – Locks and Lock Manager 3 Lesson 2: Concepts – Batch and Transaction 31 Lesson 3: Concepts – Locks and Applications 51 Lesson 4: Information Collection and Analysis 63 Lesson 5: Concepts – Formulating and Implementing Resolution 81 Module 4: Troubleshooting Locking and Blocking Overview At the end of this module, you will be able to:  Discuss how lock manager uses lock mode, lock resources, and lock compatibility to achieve transaction isolation.  Describe the various transaction types and how transactions differ from batches.  Describe how to troubleshoot blocking and locking issues.  Analyze the output of blocking scripts and Microsoft® SQL Server™ Profiler to troubleshoot locking and blocking issues.  Formulate hypothesis to resolve locking and blocking issues. Lesson 1: Concepts – Locks and Lock Manager This lesson outlines some of the common causes that contribute to the perception of a slow server. What You Will Learn After completing this lesson, you will be able to:  Describe locking architecture used by SQL Server.  Identify the various lock modes used by SQL Server.  Discuss lock compatibility and concurrent access.  Identify different types of lock resources.  Discuss dynamic locking and lock escalation.  Differentiate locks, latches, and other SQL Server internal “locking” mechanism such as spinlocks and other synchronization objects. Recommended Reading  Chapter 14 “Locking”, Inside SQL Server 2000 by Kalen Delaney  SOX000821700049 – SQL 7.0 How to interpret lock resource Ids  SOX000925700237 – TITLE: Lock escalation in SQL 7.0  SOX001109700040 – INF: Queries with PREFETCH in the plan hold lock until the end of transaction Locking Concepts Delivery Tip Prior to delivering this material, test the class to see if they fully understand the different isolation levels. If the class is not confident in their understanding, review appendix A04_Locking and its accompanying PowerPoint® file. Transactions in SQL Server provide the ACID properties: Atomicity A transaction either commits or aborts. If a transaction commits, all of its effects remain. If it aborts, all of its effects are undone. It is an “all or nothing” operation. Consistency An application should maintain the consistency of a database. For example, if you defer constraint checking, it is your responsibility to ensure that the database is consistent. Isolation Concurrent transactions are isolated from the updates of other incomplete transactions. These updates do not constitute a consistent state. This property is often called serializability. For example, a second transaction traversing the doubly linked list mentioned above would see the list before or after the insert, but it will see only complete changes. Durability After a transaction commits, its effects will persist even if there are system failures. Consistency and isolation are the most important in describing SQL Server’s locking model. It is up to the application to define what consistency means, and isolation in some form is needed to achieve consistent results. SQL Server uses locking to achieve isolation. Definition of Dependency: A set of transactions can run concurrently if their outputs are disjoint from the union of one another’s input and output sets. For example, if T1 writes some object that is in T2’s input or output set, there is a dependency between T1 and T2. Bad Dependencies These include lost updates, dirty reads, non-repeatable reads, and phantoms. ANSI SQL Isolation Levels An isolation level determines the degree to which data is isolated for use by one process and guarded against interference from other processes. Prior to SQL Server 7.0, REPEATABLE READ and SERIALIZABLE isolation levels were synonymous. There was no way to prevent non-repeatable reads while not preventing phantoms. By default, SQL Server 2000 operates at an isolation level of READ COMMITTED. To make use of either more or less strict isolation levels in applications, locking can be customized for an entire session by setting the isolation level of the session with the SET TRANSACTION ISOLATION LEVEL statement. To determine the transaction isolation level currently set, use the DBCC USEROPTIONS statement, for example: USE pubs GO SET TRANSACTION ISOLATION LEVEL REPEATABLE READ GO DBCC USEROPTIONS GO Multigranular Locking Multigranular Locking In our example, if one transaction (T1) holds an exclusive lock at the table level, and another transaction (T2) holds an exclusive lock at the row level, each of the transactions believe they have exclusive access to the resource. In this scenario, since T1 believes it locks the entire table, it might inadvertently make changes to the same row that T2 thought it has locked exclusively. In a multigranular locking environment, there must be a way to effectively overcome this scenario. Intent lock is the answer to this problem. Intent Lock Intent Lock is the term used to mean placing a marker in a higher-level lock queue. The type of intent lock can also be called the multigranular lock mode. An intent lock indicates that SQL Server wants to acquire a shared (S) lock or exclusive (X) lock on some of the resources lower down in the hierarchy. For example, a shared intent lock placed at the table level means that a transaction intends on placing shared (S) locks on pages or rows within that table. Setting an intent lock at the table level prevents another transaction from subsequently acquiring an exclusive (X) lock on the table containing that page. Intent locks improve performance because SQL Server examines intent locks only at the table level to determine whether a transaction can safely acquire a lock on that table. This removes the requirement to examine every row or page lock on the table to determine whether a transaction can lock the entire table. Lock Mode The code shown in the slide represents how the lock mode is stored internally. You can see these codes by querying the master.dbo.spt_values table: SELECT * FROM master.dbo.spt_values WHERE type = N'L' However, the req_mode column of master.dbo.syslockinfo has lock mode code that is one less than the code values shown here. For example, value of req_mode = 3 represents the Shared lock mode rather than the Schema Modification lock mode. Lock Compatibility These locks can apply at any coarser level of granularity. If a row is locked, SQL Server will apply intent locks at both the page and the table level. If a page is locked, SQL Server will apply an intent lock at the table level. SIX locks imply that we have shared access to a resource and we have also placed X locks at a lower level in the hierarchy. SQL Server never asks for SIX locks directly, they are always the result of a conversion. For example, suppose a transaction scanned a page using an S lock and then subsequently decided to perform a row level update. The row would obtain an X lock, but now the page would require an IX lock. The resultant mode on the page would be SIX. Another type of table lock is a schema stability lock (Sch-S) and is compatible with all table locks except the schema modification lock (Sch-M). The schema modification lock (Sch-M) is incompatible with all table locks. Locking Resources Delivery Tip Note the differences between Key and Key Range locks. Key Range locks will be covered in a couple of slides. SQL Server can lock these resources: Item Description DB A database. File A database file Index An entire index of a table. Table An entire table, including all data and indexes. Extent A contiguous group of data pages or index pages. Page An 8-KB data page or index page. Key Row lock within an index. Key-range A key-range. Used to lock ranges between records in a table to prevent phantom insertions or deletions into a set of records. Ensures serializable transactions. RID A Row Identifier. Used to individually lock a single row within a table. Application A lock resource defined by an application. The lock manager knows nothing about the resource format. It simply compares the 'strings' representing the lock resources to determine whether it has found a match. If a match is found, it knows that resource is already locked. Some of the resources have “sub-resources.” The followings are sub-resources displayed by the sp_lock output: Database Lock Sub-Resources: Full Database Lock (default) [BULK-OP-DB] – Bulk Operation Lock for Database [BULK-OP-LOG] – Bulk Operation Lock for Log Table Lock Sub-Resources: Full Table Lock (default) [UPD-STATS] – Update statistics Lock [COMPILE] – Compile Lock Index Lock sub-Resources: Full Index Lock (default) [INDEX_ID] – Index ID Lock [INDEX_NAME] – Index Name Lock [BULK_ALLOC] – Bulk Allocation Lock [DEFRAG] – Defragmentation Lock For more information, see also… SOX000821700049 SQL 7.0 How to interpret lock resource Ids Lock Resource Block The resource type has the following resource block format: Resource Type (Code) Content DB (2) Data 1: sub-resource; Data 2: 0; Data 3: 0 File (3) Data 1: File ID; Data 2: 0; Data 3: 0 Index (4) Data 1: Object ID; Data 2: sub-resource; Data 3: Index ID Table (5) Data 1: Object ID; Data 2: sub-resource; Data 3: 0. Page (6) Data 1: Page Number; Data 3: 0. Key (7) Data 1: Object ID; Data 2: Index ID; Data 3: Hashed Key Extent (8) Data 1: Extent ID; Data 3: 0. RID (9) Data 1: RID; Data 3: 0. Application (10) Data 1: Application resource name The rsc_bin column of master..syslockinfo contains the resource block in hexadecimal format. For an example of how to decode value from this column using the information above, let us assume we have the following value: 0x000705001F83D775010002014F0BEC4E With byte swapping within each field, this can be decoded as: Byte 0: Flag – 0x00 Byte 1: Resource Type – 0x07 (Key) Byte 2-3: DBID – 0x0005 Byte 4-7: ObjectID – 0x 75D7831F (1977058079) Byte 8-9: IndexID – 0x0001 Byte 10-16: Hash Key value – 0x 02014F0BEC4E For more information about how to decode this value, see also… Inside SQL Server 2000, pages 803 and 806. Key Range Locking Key Range Locking To support SERIALIZABLE transaction semantics, SQL Server needs to lock sets of rows specified by a predicate, such as WHERE salary BETWEEN 30000 AND 50000 SQL Server needs to lock data that does not exist! If no rows satisfy the WHERE condition the first time the range is scanned, no rows should be returned on any subsequent scans. Key range locks are similar to row locks on index keys (whether clustered or not). The locks are placed on individual keys rather than at the node level. The hash value consists of all the key components and the locator. So, for a nonclustered index over a heap, where columns c1 and c2 where indexed, the hash would contain contributions from c1, c2 and the RID. A key range lock applied to a particular key means that all keys between the value locked and the next value would be locked for all data modification. Key range locks can lock a slightly larger range than that implied by the WHERE clause. Suppose the following select was executed in a transaction with isolation level SERIALIZABLE: SELECT * FROM members WHERE first_name between ‘Al’ and ‘Carl’ If 'Al', 'Bob', and 'Dave' are index keys in the table, the first two of these would acquire key range locks. Although this would prevent anyone from inserting either 'Alex' or 'Ben', it would also prevent someone from inserting 'Dan', which is not within the range of the WHERE clause. Prior to SQL Server 7.0, page locking was used to prevent phantoms by locking the entire set of pages on which the phantom would exist. This can be too conservative. Key Range locking lets SQL Server lock only a much more restrictive area of the table. Impact Key-range locking ensures that these scenarios are SERIALIZABLE:  Range scan query  Singleton fetch of nonexistent row  Delete operation  Insert operation However, the following conditions must be satisfied before key-range locking can occur:  The transaction-isolation level must be set to SERIALIZABLE.  The operation performed on the data must use an index range access. Range locking is activated only when query processing (such as the optimizer) chooses an index path to access the data. Key Range Lock Mode Again, the req_mode column of master.dbo.syslockinfo has lock mode code that is one less than the code values shown here. Dynamic Locking When modifying individual rows, SQL Server typically would take row locks to maximize concurrency (for example, OLTP, order-entry application). When scanning larger volumes of data, it would be more appropriate to take page or table locks to minimize the cost of acquiring locks (for example, DSS, data warehouse, reporting). Locking Decision The decision about which unit to lock is made dynamically, taking many factors into account, including other activity on the system. For example, if there are multiple transactions currently accessing a table, SQL Server will tend to favor row locking more so than it otherwise would. It may mean the difference between scanning the table now and paying a bit more in locking cost, or having to wait to acquire a more coarse lock. A preliminary locking decision is made during query optimization, but that decision can be adjusted when the query is actually executed. Lock Escalation When the lock count for the transaction exceeds and is a multiple of ESCALATION_THRESHOLD (1250), the Lock Manager attempts to escalate. For example, when a transaction acquired 1250 locks, lock manager will try to escalate. The number of locks held may continue to increase after the escalation attempt (for example, because new tables are accessed, or the previous lock escalation attempts failed due to incompatible locks held by another spid). If the lock count for this transaction reaches 2500 (1250 * 2), Lock Manager will attempt escalation again. The Lock Manager looks at the lock memory it is using and if it is more than 40 percent of SQL Server’s allocated buffer pool memory, it tries to find a scan (SDES) where no escalation has already been performed. It then repeats the search operation until all scans have been escalated or until the memory used drops under the MEMORY_LOAD_ESCALATION_THRESHOLD (40%) value. If lock escalation is not possible or fails to significantly reduce lock memory footprint, SQL Server can continue to acquire locks until the total lock memory reaches 60 percent of the buffer pool (MAX_LOCK_RESOURCE_MEMORY_PERCENTAGE=60). Lock escalation may be also done when a single scan (SDES) holds more than LOCK_ESCALATION_THRESHOLD (765) locks. There is no lock escalation on temporary tables or system tables. Trace Flag 1211 disables lock escalation. Important Do not relay this to the customer without careful consideration. Lock escalation is a necessary feature, not something to be avoided completely. Trace flags are global and disabling lock escalation could lead to out of memory situations, extremely poor performing queries, or other problems. Lock escalation tracing can be seen using the Profiler or with the general locking trace flag, -T1200. However, Trace Flag 1200 shows all lock activity so it should not be usable on a production system. For more information, see also… SOX000925700237 “TITLE: SQL 7.0 Lock escalation in SQL 7.0” Lock Timeout Application Lock Timeout An application can set lock timeout for a session with the SET option: SET LOCK_TIMEOUT N where N is a number of milliseconds. A value of -1 means that there will be no timeout, which is equivalent to the version 6.5 behavior. A value of 0 means that there will be no waiting; if a process finds a resource locked, it will generate error message 1222 and continue with the next statement. The current value of LOCK_TIMEOUT is stored in the global variable @@lock_timeout. Note After a lock timeout any transaction containing the statement, is rolled back or canceled by SQL Server 2000 (bug#352640 was filed). This behavior is different from that of SQL Server 7.0. With SQL Server 7.0, the application must have an error handler that can trap error 1222 and if an application does not trap the error, it can proceed unaware that an individual statement within a transaction has been canceled, and errors can occur because statements later in the transaction may depend on the statement that was never executed. Bug#352640 is fixed in hotfix build 8.00.266 whereby a lock timeout will only Internal Lock Timeout At time, internal operations within SQL Server will attempt to acquire locks via lock manager. Typically, these lock requests are issued with “no waiting.” For example, the ghost record processing might try to clean up rows on a particular page, and before it can do that, it needs to lock the page. Thus, the ghost record manager will request a page lock with no wait so that if it cannot lock the page, it will just move on to other pages; it can always come back to this page later. If you look at SQL Profiler Lock: Timeout events, internal lock timeout typically have a duration value of zero. Lock Duration Lock Mode and Transaction Isolation Level For REPEATABLE READ transaction isolation level, update locks are held until data is read and processed, unless promoted to exclusive locks. "Data is processed" means that we have decided whether the row in question matched the search criteria; if not then the update lock is released, otherwise, we get an exclusive lock and make the modification. Consider the following query: use northwind go dbcc traceon(3604, 1200, 1211) -- turn on lock tracing -- and disable escalation go set transaction isolation level repeatable read begin tran update dbo.[order details] set discount = convert (real, discount) where discount = 0.0 exec sp_lock Update locks are promoted to exclusive locks when there is a match; otherwise, the update lock is released. The sp_lock output verifies that the SPID does not hold any update locks or shared locks at the end of the query. Lock escalation is turned off so that exclusive table lock is not held at the end. Warning Do not use trace flag 1200 in a production environment because it produces a lot of output and slows down the server. Trace flag 1211 should not be used unless you have done extensive study to make sure it helps with performance. These trace flags are used here for illustration and learning purposes only. Lock Ownership Most of the locking discussion in this lesson relates to locks owned by “transactions.” In addition to transaction, cursor and session can be owners of locks and they both affect how long locks are held. For every row that is fetched, when SCROLL_LOCKS option is used, regardless of the state of a transaction, a cursor lock is held until the next row is fetched or when the cursor is closed. Locks owned by session are outside the scope of a transaction. The duration of these locks are bounded by the connection and the process will continue to hold these locks until the process disconnects. A typical lock owned by session is the database (DB) lock. Locking – Read Committed Scan Under read committed isolation level, when database pages are scanned, shared locks are held when the page is read and processed. The shared locks are released “behind” the scan and allow other transactions to update rows. It is important to note that the shared lock currently acquired will not be released until shared lock for the next page is successfully acquired (this is commonly know as “crabbing”). If the same pages are scanned again, rows may be modified or deleted by other transactions. Locking – Repeatable Read Scan Under repeatable read isolation level, when database pages are scanned, shared locks are held when the page is read and processed. SQL Server continues to hold these shared locks, thus preventing other transactions to update rows. If the same pages are scanned again, previously scanned rows will not change but new rows may be added by other transactions. Locking – Serializable Read Scan Under serializable read isolation level, when database pages are scanned, shared locks are held not only on rows but also on scanned key range. SQL Server continues to hold these shared locks until the end of transaction. Because key range locks are held, not only will this prevent other transactions from modifying the rows, no new rows can be inserted. Prefetch and Isolation Level Prefetch and Locking Behavior The prefetch feature is available for use with SQL Server 7.0 and SQL Server 2000. When searching for data using a nonclustered index, the index is searched for a particular value. When that value is found, the index points to the disk address. The traditional approach would be to immediately issue an I/O for that row, given the disk address. The result is one synchronous I/O per row and, at most, one disk at a time working to evaluate the query. This does not take advantage of striped disk sets. The prefetch feature takes a different approach. It continues looking for more record pointers in the nonclustered index. When it has collected a number of them, it provides the storage engine with prefetch hints. These hints tell the storage engine that the query processor will need these particular records soon. The storage engine can now issue several I/Os simultaneously, taking advantage of striped disk sets to execute multiple operations simultaneously. For example, if the engine is scanning a nonclustered index to determine which rows qualify but will eventually need to visit the data page as well to access columns that are not in the index, it may decide to submit asynchronous page read requests for a group of qualifying rows. The prefetch data pages are then revisited later to avoid waiting for each individual page read to complete in a serial fashion. This data access path requires that a lock be held between the prefetch request and the row lookup to stabilize the row on the page so it is not to be moved by a page split or clustered key update. For our example, the isolation level of the query is escalated to REPEATABLE READ, overriding the transaction isolation level. With SQL Server 7.0 and SQL Server 2000, portions of a transaction can execute at a different transaction isolation level than the entire transaction itself. This is implemented as lock classes. Lock classes are used to control lock lifetime when portions of a transaction need to execute at a stricter isolation level than the underlying transaction. Unfortunately, in SQL Server 7.0 and SQL Server 2000, the lock class is created at the topmost operator of the query and hence released only at the end of the query. Currently there is no support to release the lock (lock class) after the row has been discarded or fetched by the filter or join operator. This is because isolation level can be set at the query level via a lock class, but no lower. Because of this, locks acquired during the query will not be released until the query completes. If prefetch is occurring you may see a single SPID that holds hundreds of Shared KEY or PAG locks even though the connection’s isolation level is READ COMMITTED. Isolation level can be determined from DBCC PSS output. For details about this behavior see “SOX001109700040 INF: Queries with PREFETCH in the plan hold lock until the end of transaction”. Other Locking Mechanism Lock manager does not manage latches and spinlocks. Latches Latches are internal mechanisms used to protect pages while doing operations such as placing a row physically on a page, compressing space on a page, or retrieving rows from a page. Latches can roughly be divided into I/O latches and non-I/O latches. If you see a high number of non-I/O related latches, SQL Server is usually doing a large number of hash or sort operations in tempdb. You can monitor latch activities via DBCC SQLPERF(‘WAITSTATS’) command. Spinlock A spinlock is an internal data structure that is used to protect vital information that is shared within SQL Server. On a multi-processor machine, when SQL Server tries to access a particular resource protected by a spinlock, it must first acquire the spinlock. If it fails, it executes a loop that will check to see if the lock is available and if not, decrements a counter. If the counter reaches zero, it yields the processor to another thread and goes into a “sleep” (wait) state for a pre-determined amount of time. When it wakes, hopefully, the lock is free and available. If not, the loop starts again and it is terminated only when the lock is acquired. The reason for implementing a spinlock is that it is probably less costly to “spin” for a short time rather than yielding the processor. Yielding the processor will force an expensive context switch where:  The old thread’s state must be saved  The new thread’s state must be reloaded  The data stored in the L1 and L2 cache are useless to the processor On a single-processor computer, the loop is not useful because no other thread can be running and thus, no one can release the spinlock for the currently executing thread to acquire. In this situation, the thread yields the processor immediately. Lesson 2: Concepts – Batch and Transaction This lesson outlines some of the common causes that contribute to the perception of a slow server. What You Will Learn After completing this lesson, you will be able to:  Review batch processing and error checking.  Review explicit, implicit and autocommit transactions and transaction nesting level.  Discuss how commit and rollback transaction done in stored procedure and trigger affects transaction nesting level.  Discuss various transaction isolation level and their impact on locking.  Discuss the difference between aborting a statement, a transaction, and a batch.  Describe how @@error, @@transcount, and @@rowcount can be used for error checking and handling. Recommended Reading  Charter 12 “Transactions and Triggers”, Inside SQL Server 2000 by Kalen Delaney Batch Definition SQL Profiler Statements and Batches To help further your understanding of what is a batch and what is a statement, you can use SQL Profiler to study the definition of batch and statement.  Try This: Using SQL Profiler to Analyze Batch 1. Log on to a server with Query Analyzer 2. Startup the SQL Profiler against the same server 3. Start a trace using the “StandardSQLProfiler” template 4. Execute the following using Query Analyzer: SELECT @@VERSION SELECT @@SPID The ‘SQL:BatchCompleted’ event is captured by the trace. It shows both the statements as a single batch. 5. Now execute the following using Query Analyzer {call sp_who()} What shows up? The ‘RPC:Completed’ with the sp_who information. RPC is simply another entry point to the SQL Server to call stored procedures with native data types. This allows one to avoid parsing. The ‘RPC:Completed’ event should be considered the same as a batch for the purposes of this discussion. Stop the current trace and start a new trace using the “SQLProfilerTSQL_SPs” template. Issue the same command as outlines in step 5 above. Looking at the output, not only can you see the batch markers but each statement as executed within the batch. Autocommit, Explicit, and Implicit Transaction Autocommit Transaction Mode (Default) Autocommit mode is the default transaction management mode of SQL Server. Every Transact-SQL statement, whether it is a standalone statement or part of a batch, is committed or rolled back when it completes. If a statement completes successfully, it is committed; if it encounters any error, it is rolled back. A SQL Server connection operates in autocommit mode whenever this default mode has not been overridden by either explicit or implicit transactions. Autocommit mode is also the default mode for ADO, OLE DB, ODBC, and DB-Library. A SQL Server connection operates in autocommit mode until a BEGIN TRANSACTION statement starts an explicit transaction, or implicit transaction mode is set on. When the explicit transaction is committed or rolled back, or when implicit transaction mode is turned off, SQL Server returns to autocommit mode. Explicit Transaction Mode An explicit transaction is a transaction that starts with a BEGIN TRANSACTION statement. An explicit transaction can contain one or more statements and must be terminated by either a COMMIT TRANSACTION or a ROLLBACK TRANSACTION statement. Implicit Transaction Mode SQL Server can automatically or, more precisely, implicitly start a transaction for you if a SET IMPLICIT_TRANSACTIONS ON statement is run or if the implicit transaction option is turned on globally by running sp_configure ‘user options’ 2. (Actually, the bit mask 0x2 must be turned on for the user option so you might have to perform an ‘OR’ operation with the existing user option value.) See SQL Server 2000 Books Online on how to turn on implicit transaction under ODBC and OLE DB (acdata.chm::/ac_8_md_06_2g6r.htm). Transaction Nesting Explicit transactions can be nested. Committing inner transactions is ignored by SQL Server other than to decrements @@TRANCOUNT. The transaction is either committed or rolled back based on the action taken at the end of the outermost transaction. If the outer transaction is committed, the inner nested transactions are also committed. If the outer transaction is rolled back, then all inner transactions are also rolled back, regardless of whether the inner transactions were individually committed. Each call to COMMIT TRANSACTION applies to the last executed BEGIN TRANSACTION. If the BEGIN TRANSACTION statements are nested, then a COMMIT statement applies only to the last nested transaction, which is the innermost transaction. Even if a COMMIT TRANSACTION transaction_name statement within a nested transaction refers to the transaction name of the outer transaction, the commit applies only to the innermost transaction. If a ROLLBACK TRANSACTION statement without a transaction_name parameter is executed at any level of a set of nested transaction, it rolls back all the nested transactions, including the outermost transaction. The @@TRANCOUNT function records the current transaction nesting level. Each BEGIN TRANSACTION statement increments @@TRANCOUNT by one. Each COMMIT TRANSACTION statement decrements @@TRANCOUNT by one. A ROLLBACK TRANSACTION statement that does not have a transaction name rolls back all nested transactions and decrements @@TRANCOUNT to 0. A ROLLBACK TRANSACTION that uses the transaction name of the outermost transaction in a set of nested transactions rolls back all the nested transactions and decrements @@TRANCOUNT to 0. When you are unsure if you are already in a transaction, SELECT @@TRANCOUNT to determine whether it is 1 or more. If @@TRANCOUNT is 0 you are not in a transaction. You can also find the transaction nesting level by checking the sysprocess.open_tran column. See SQL Server 2000 Books Online topic “Nesting Transactions” (acdata.chm::/ac_8_md_06_66nq.htm) for more information. Statement, Transaction, and Batch Abort One batch can have many statements and one transaction can have multiple statements, also. One transaction can span multiple batches and one batch can have multiple transactions. Statement Abort Currently executing statement is aborted. This can be a bit confusing when you start talking about statements in a trigger or stored procedure. Let us look closely at the following trigger: CREATE TRIGGER TRG8134 ON TBL8134 AFTER INSERT AS BEGIN SELECT 1/0 SELECT 'Next command in trigger' END To fire the INSERT trigger, the batch could be as simple as ‘INSERT INTO TBL8134 VALUES(1)’. However, the trigger contains two statements that must be executed as part of the batch to satisfy the clients insert request. When the ‘SELECT 1/0’ causes the divide by zero error, a statement abort is issued for the ‘SELECT 1/0’ statement. Batch and Transaction Abort On SQL Server 2000 (and SQL Server 7.0) whenever a non-informational error is encountered in a trigger, the statement abort is promoted to a batch and transactional abort. Thus, in the example the statement abort for ‘select 1/0’ promotion results in an entire batch abort. No further statements in the trigger or batch will be executed and a rollback is issued. On SQL Server 6.5, the statement aborts immediately and results in a transaction abort. However, the rest of the statements within the trigger are executed. This trigger could return ‘Next command in trigger’ as a result set. Once the trigger completes the batch abort promotion takes effect. Conversely, submitting a similar set of statements in a standalone batch can result in different behavior. SELECT 1/0 SELECT 'Next command in batch' Not considering the set option possibilities, a divide by zero error generally results in a statement abort. Since it is not in a trigger, the promotion to a batch abort is avoided and subsequent SELECT statement can execute. The programmer should add an “if @@ERROR” check immediately after the ‘select 1/0’ to T-SQL execution to control the flow correctly. Aborting and Set Options ARITHABORT If SET ARITHABORT is ON, these error conditions cause the query or batch to terminate. If the errors occur in a transaction, the transaction is rolled back. If SET ARITHABORT is OFF and one of these errors occurs, a warning message is displayed, and NULL is assigned to the result of the arithmetic operation. When an INSERT, DELETE, or UPDATE statement encounters an arithmetic error (overflow, divide-by-zero, or a domain error) during expression evaluation when SET ARITHABORT is OFF, SQL Server inserts or updates a NULL value. If the target column is not nullable, the insert or update action fails and the user receives an error. XACT_ABORT When SET XACT_ABORT is ON, if a Transact-SQL statement raises a run-time error, the entire transaction is terminated and rolled back. When OFF, only the Transact-SQL statement that raised the error is rolled back and the transaction continues processing. Compile errors, such as syntax errors, are not affected by SET XACT_ABORT. For example: CREATE TABLE t1 (a int PRIMARY KEY) CREATE TABLE t2 (a int REFERENCES t1(a)) GO INSERT INTO t1 VALUES (1) INSERT INTO t1 VALUES (3) INSERT INTO t1 VALUES (4) INSERT INTO t1 VALUES (6) GO SET XACT_ABORT OFF GO BEGIN TRAN INSERT INTO t2 VALUES (1) INSERT INTO t2 VALUES (2) /* Foreign key error */ INSERT INTO t2 VALUES (3) COMMIT TRAN SELECT 'Continue running batch 1...' GO SET XACT_ABORT ON GO BEGIN TRAN INSERT INTO t2 VALUES (4) INSERT INTO t2 VALUES (5) /* Foreign key error */ INSERT INTO t2 VALUES (6) COMMIT TRAN SELECT 'Continue running batch 2...' GO /* Select shows only keys 1 and 3 added. Key 2 insert failed and was rolled back, but XACT_ABORT was OFF and rest of transaction succeeded. Key 5 insert error with XACT_ABORT ON caused all of the second transaction to roll back. Also note that 'Continue running batch 2...' is not Returned to indicate that the batch is aborted. */ SELECT * FROM t2 GO DROP TABLE t2 DROP TABLE t1 GO Compile and Run-time Errors Compile Errors Compile errors are encountered during syntax checks, security checks, and other general operations to prepare the batch for execution. These errors can prevent the optimization of the query and thus lead to immediate abort. The statement is not run and the batch is aborted. The transaction state is generally left untouched. For example, assume there are four statements in a particular batch. If the third statement has a syntax error, none of the statements in the batch is executed. Optimization Errors Optimization errors would include rare situations where the statement encounters a problem when attempting to build an optimal execution plan. Example: “too many tables referenced in the query” error is reported because a “work table” was added to the plan. Runtime Errors Runtime errors are those that are encountered during the execution of the query. Consider the following batch: SELECT * FROM pubs.dbo.titles UPDATE pubs.dbo.authors SET au_lname = au_lname SELECT * FROM foo UPDATE pubs.dbo.authors SET au_lname = au_lname If you run the above statements in a batch, the first two statements will be executed, the third statement will fail because table foo does not exist, and the batch will terminate. Deferred Name Resolution is the feature that allows this batch to start executing before resolving the object foo. This feature allows SQL Server to delay object resolution and place a “placeholder” in the query’s execution. The object referenced by the placeholder is resolved until the query is executed. In our example, the execution of the statement “SELECT * FROM foo” will trigger another compile process to resolve the name again. This time, error message 208 is returned. Error: 208, Level 16, State 1, Line 1 Invalid object name 'foo'. Message 208 can be encountered as a runtime or compile error depending on whether the Deferred Name Resolution feature is available. In SQL Server 6.5 this would be considered a compile error and on SQL Server 2000 (and SQL Server7.0) as a runtime error due to Deferred Name Resolution. In the following example, if a trigger referenced authors2, the error is detected as SQL Server attempts to execute the trigger. However, under SQL Server 6.5 the create trigger statement fails because authors2 does not exist at compile time. When errors are encountered in a trigger, generally, the statement, batch, and transaction are aborted. You should be able to observe this by running the following script in pubs database: Create table tblTest(iID int) go create trigger trgInsert on tblTest for INSERT as begin select * from authors select * from authors2 select * from titles end go begin tran select 'Before' insert into tblTest values(1) select 'After' go select @@TRANCOUNT go When run in a batch, the statement and the batch are aborted but the transaction remains active. The follow script illustrates this: begin tran select 'Before' select * from authors2 select 'After' go select @@TRANCOUNT go One other factor in a compile versus runtime error is implicit data type conversions. If you were to run the following statements on SQL Server 6.5 and SQL Server 2000 (and SQL Server 7.0): create table tblData(dtData datetime) go select 1 insert into tblData values(12/13/99) go On SQL Server 6.5, you get an error before execution of the batch begins so no statements are executed and the batch is aborted. Error: 206, Level 16, State 2, Line 2 Operand type clash: int is incompatible with datetime On SQL Server 2000, you get the default value (1900-01-01 00:00:00.000) inserted into the table. SQL Server 2000 implicit data type conversion treats this as integer division. The integer division of 12/13/99 is 0, so the default date and time value is inserted, no error returned. To correct the problem on either version is to wrap the date string with quotes. See Bug #56118 (sqlbug_70) for more details about this situation. Another example of a runtime error is a 605 message. Error: 605 Attempt to fetch logical page %S_PGID in database '%.*ls' belongs to object '%.*ls', not to object '%.*ls'. A 605 error is always a runtime error. However, depending on the transaction isolation level, (e.g. using the NOLOCK lock hint), established by the SPID the handling of the error can vary. Specifically, a 605 error is considered an ACCESS error. Errors associated with buffer and page access are found in the 600 series of errors. When the error is encountered, the isolation level of the SPID is examined to determine proper handling based on information or fatal error level. Transaction Error Checking Not all errors cause transactions to automatically rollback. Although it is difficult to determine exactly which errors will rollback transactions and which errors will not, the main idea here is that programmers must perform error checking and handle errors appropriately. Error Handling Raiserror Details Raiserror seems to be a source of confusion but is really rather simple. Raiserror with severity levels of 20 or higher will terminate the connection. Of course, when the connection is terminated a full rollback of any open transaction will immediately be instantiated by the SQL Server (except distributed transaction with DTC involved). Severity levels lower than 20 will simply result in the error message being returned to the client. They do not affect the transaction scope of the connection. Consider the following batch: use pubs begin tran update authors set au_lname = 'smith' raiserror ('This is bad', 19, 1) with log select @@trancount With severity set at 19, the 'select @@trancount' will be executed after the raiserror statement and will return a value of 1. If severity is changed to 20, then the select statement will not run and the connection is broken. Important Error handling must occur not only in T-SQL batches and stored procedures, but also in application program code. Transactions and Triggers (1 of 2) Basic behavior assumes the implicit transactions setting is set to OFF. This behavior makes it possible to identify business logic errors in a trigger, raise an error, rollback the action, and add an audit table entry. Logically, the insert to the audit table cannot take place before the ROLLBACK action and you would not want to build in the audit table insert into every applications error handler that violated the business rule of the trigger. For more information, see also… SQL Server 2000 Books Online topic “Rollbacks in stored procedure and triggers“ (acdata.chm::/ac_8_md_06_4qcz.htm) IMPLICIT_TRANSACTIONS ON Behavior The behavior of firing other triggers on the same table can be tricky. Say you added a trigger that checks the CODE field. Read only versions of the rows contain the code ‘RO’ and read/write versions use ‘RW.’ Whenever someone tries to delete a row with a code ‘RO’ the trigger issues the rollback and logs an audit table entry. However, you also have a second trigger that is responsible for cascading delete operations. One client could issue the delete without implicit transactions on and only the current trigger would execute and then terminate the batch. However, a second client with implicit transactions on could issue the same delete and the secondary trigger would fire. You end up with a situation in which the cascading delete operations can take place (are committed) but the initial row remains in the table because of the rollback operation. None of the delete operations should be allowed but because the transaction scope was restarted because of the implicit transactions setting, they did. Transactions and Triggers (2 of 2) It is extremely difficult to determine the execution state of a trigger when using explicit rollback statements in combination with implicit transactions. The RETURN statement is not allowed to return a value. The only way I have found to set the @@ERROR is using a ‘raiserror’ as the last execution statement in the last trigger to execute. If you modify the example, this following RAISERROR statement will set @@ERROR to 50000: CREATE TRIGGER trgTest on tblTest for INSERT AS BEGIN ROLLBACK INSERT INTO tblAudit VALUES (1) RAISERROR('This is bad', 14,1) END However, this value does not carry over to a secondary trigger for the same table. If you raise an error at the end of the first trigger and then look at @@ERROR in the secondary trigger the @@ERROR remains 0. Carrying Forward an Active/Open Transaction It is possible to exit from a trigger and carry forward an open transaction by issuing a BEGIN TRAN or by setting implicit transaction on and doing INSERT, UPDATE, or DELETE. Warning It is never recommended that a trigger call BEGIN TRANSACTION. By doing this you increment the transaction count. Invalid code logic, not calling commit transaction, can lead to a situation where the transaction count remains elevated upon exit of the trigger. Transaction Count The behavior is better explained by understanding how the server works. It does not matter whether you are in a transaction, when a modification takes place the transaction count is incremented. So, in the simplest form, during the processing of an insert the transaction count is 1. On completion of the insert, the server will commit (and thus decrement the transaction count). If the commit identifies the transaction count has returned to 0, the actual commit processing is completed. Issuing a commit when the transaction count is greater than 1 simply decrements the nested transaction counter. Thus, when we enter a trigger, the transaction count is 1. At the completion of the trigger, the transaction count will be 0 due to the commit issued at the end of the modification statement (insert). In our example, if the connection was already in a transaction and called the second INSERT, since implicit transaction is ON, the transaction count in the trigger will be 2 as long as the ROLLBACK is not executed. At the end of the insert, the commit is again issued to decrement the transaction reference count to 1. However, the value does not return to 0 so the transaction remains open/active. Subsequent triggers are only fired if the transaction count at the end of the trigger remains greater than or equal to 1. The key to continuation of secondary triggers and the batch is the transaction count at the end of a trigger execution. If the trigger that performs a rollback has done an explicit begin transaction or uses implicit transactions, subsequent triggers and the batch will continue. If the transaction count is not 1 or greater, subsequent triggers and the batch will not execute. Warning Forcing the transaction count after issuing a rollback is dangerous because you can easily loose track of your transaction nesting level. When performing an explicit rollback in a trigger, you should immediately issue a return statement to maintain consistent behavior between a connection with and without implicit transaction settings. This will force the trigger(s) and batch to terminate immediately. One of the methods of dealing with this issue is to run ‘SET IMPLICIT_TRANSACTIONS OFF’ as the first statement of any trigger. Other methods may entails checking @@TRANCOUNT at the end of the trigger and continue to COMMIT the transaction as long as @@TRANCOUNT is greater than 1. Examples The following examples are based on this table: create table tbl50000Insert (iID int NOT NULL) go Note If more than one trigger is used, to guarantee the trigger firing sequence, the sp_settriggerorder command should be used. This command is omitted in these examples to simplify the complexity of the statements. First Example In the first example, the second trigger was never fired and the batch, starting with the insert statement, was aborted. Thus, the print statement was never issued. print('Trigger issues rollback - cancels batch') go create trigger trg50000Insert on tbl50000Insert for INSERT as begin select 'Inserted', * from inserted rollback tran select 'End of trigger', @@TRANCOUNT as 'TRANCOUNT' end go create trigger trg50000Insert2 on tbl50000Insert for INSERT as begin select 'In Trigger2' select 'Trigger 2 Inserted', * from inserted end go insert into tbl50000Insert values(1) print('---------------------- In same batch') select * from tbl50000Insert go -- Cleanup drop trigger trg50000Insert drop trigger trg50000Insert2 go delete from tbl50000Insert Second Example The next example shows that since a new transaction is started, the second trigger will be fired and the print statement in the batch will be executed. Note that the insert is rolled back. print('Trigger issues rollback - increases tran count to continue batch') go create trigger trg50000Insert on tbl50000Insert for INSERT as begin select 'Inserted', * from inserted rollback tran begin tran end go create trigger trg50000Insert2 on tbl50000Insert for INSERT as begin select 'In Trigger2' select 'Trigger 2 Inserted', * from inserted end go insert into tbl50000Insert values(2) print('---------------------- In same batch') select * from tbl50000Insert go -- Cleanup drop trigger trg50000Insert drop trigger trg50000Insert2 go delete from tbl50000Insert Third Example In the third example, the raiserror statement is used to set the @@ERROR value and the BEGIN TRAN statement is used in the trigger to allow the batch to continue to run. print('Trigger issues rollback - uses raiserror to set @@ERROR') go create trigger trg50000Insert on tbl50000Insert for INSERT as begin select 'Inserted', * from inserted rollback tran begin tran -- Increase @@trancount to allow -- batch to continue select @@trancount as ‘Trancount’ raiserror('This is from the trigger', 14,1) end go insert into tbl50000Insert values(3) select @@ERROR as 'ERROR', @@TRANCOUNT as 'Trancount' go -- Cleanup drop trigger trg50000Insert go delete from tbl50000Insert Fourth Example For the fourth example, a second trigger is added to illustrate the fact that @@ERROR value set in the first trigger will not be seen in the second trigger nor will it show up in the batch after the second trigger is fired. print('Trigger issues rollback - uses raiserror to set @@ERROR, not seen in second trigger and cleared in batch') go create trigger trg50000Insert on tbl50000Insert for INSERT as begin select 'Inserted', * from inserted rollback begin tran -- Increase @@trancount to -- allow batch to continue select @@TRANCOUNT as 'Trancount' raiserror('This is from the trigger', 14,1) end go create trigger trg50000Insert2 on tbl50000Insert for INSERT as begin select @@ERROR as 'ERROR', @@TRANCOUNT as 'Trancount' end go insert into tbl50000Insert values(4) select @@ERROR as 'ERROR', @@TRANCOUNT as 'Trancount' go -- Cleanup drop trigger trg50000Insert drop trigger trg50000Insert2 go delete from tbl50000Insert Lesson 3: Concepts – Locks and Applications This lesson outlines some of the common causes that contribute to the perception of a slow server. What You Will Learn After completing this lesson, you will be able to:  Explain how lock hints are used and their impact.  Discuss the effect on locking when an application uses Microsoft Transaction Server.  Identify the different kinds of deadlocks including distributed deadlock. Recommended Reading  Charter 14 “Locking”, Inside SQL Server 2000 by Kalen Delaney  Charter 16 “Query Tuning”, Inside SQL Server 2000 by Kalen Delaney Q239753 – Deadlock Situation Not Detected by SQL Server Q288752 – Blocked SPID Not Participating in Deadlock May Incorrectly be Chosen as victim Locking Hints UPDLOCK If update locks are used instead of shared locks while reading a table, the locks are held until the end of the statement or transaction. UPDLOCK has the advantage of allowing you to read data (without blocking other readers) and update it later with the assurance that the data has not changed since you last read it. READPAST READPAST is an optimizer hint for use with SELECT statements. When this hint is used, SQL Server will read past locked rows. For example, assume table T1 contains a single integer column with the values of 1, 2, 3, 4, and 5. If transaction A changes the value of 3 to 8 but has not yet committed, a SELECT * FROM T1 (READPAST) yields values 1, 2, 4, 5. Tip READPAST only applies to transactions operating at READ COMMITTED isolation and only reads past row-level locks. This lock hint can be used to implement a work queue on a SQL Server table. For example, assume there are many external work requests being thrown into a table and they should be serviced in approximate insertion order but they do not have to be completely FIFO. If you have 4 worker threads consuming work items from the queue they could each pick up a record using read past locking and then delete the entry from the queue and commit when they're done. If they fail, they could rollback, leaving the entry on the queue for the next worker thread to pick up. Caution The READPAST hint is not compatible with HOLDLOCK.  Try This: Using Locking Hints 1. Open a Query Window and connect to the pubs database. 2. Execute the following statements (--Conn 1 is optional to help you keep track of each connection): BEGIN TRANSACTION -- Conn 1 UPDATE titles SET price = price * 0.9 WHERE title_id = 'BU1032' 3. Open a second connection and execute the following statements: SELECT @@lock_timeout -- Conn 2 GO SELECT * FROM titles SELECT * FROM authors 4. Open a third connection and execute the following statements: SET LOCK_TIMEOUT 0 -- Conn 3 SELECT * FROM titles SELECT * FROM authors 5. Open a fourth connection and execute the following statement: SELECT * FROM titles (READPAST) -- Conn 4 WHERE title_ID < 'C' SELECT * FROM authors How many records were returned? 3 6. Open a fifth connection and execute the following statement: SELECT * FROM titles (NOLOCK) -- Conn 5 WHERE title_ID 0 the lock manager also checks for deadlocks every time a SPID gets blocked. So a single deadlock will trigger 20 seconds of more immediate deadlock detection, but if no additional deadlocks occur in that 20 seconds, the lock manager no longer checks for deadlocks at each block and detection again only happens every 5 seconds. Although normally not needed, you may use trace flag -T1205 to trace the deadlock detection process. Note Please note the distinction between application lock and other locks’ deadlock detection. For application lock, we do not rollback the transaction of the deadlock victim but simply return a -3 to sp_getapplock, which the application needs to handle itself. Deadlock Resolution How is a deadlock resolved? SQL Server picks one of the connections as a deadlock victim. The victim is chosen based on either which is the least expensive transaction (calculated using the number and size of the log records) to roll back or in which process “SET DEADLOCK_PRIORITY LOW” is specified. The victim’s transaction is rolled back, held locks are released, and SQL Server sends error 1205 to the victim’s client application to notify it that it was chosen as a victim. The other process can then obtain access to the resource it was waiting on and continue. Error 1205: Your transaction (process ID #%d) was deadlocked with another process and has been chosen as the deadlock victim. Rerun your transaction. Symptoms of deadlocking Error 1205 usually is not written to the SQL Server errorlog. Unfortunately, you cannot use sp_altermessage to cause 1205 to be written to the errorlog. If the client application does not capture and display error 1205, some of the symptoms of deadlock occurring are:  Clients complain of mysteriously canceled queries when using certain features of an application.  May be accompanied by excessive blocking. Lock contention increases the chances that a deadlock will occur. Triggers and Deadlock Triggers promote the deadlock priority of the SPID for the life of the trigger execution when the DEADLOCK PRIORITY is not set to low. When a statement in a trigger causes a deadlock to occur, the SPID executing the trigger is given preferential treatment and will not become the victim. Warning Bug 235794 is filed against SQL Server 2000 where a blocked SPID that is not a participant of a deadlock may incorrectly be chosen as a deadlock victim if the SPID is blocked by one of the deadlock participants and the SPID has the least amount of transaction logging. See KB article Q288752: “Blocked Spid Not Participating in Deadlock May Incorrectly be Chosen as victim” for more information. Distributed Deadlock – Scenario 1 Distributed Deadlocks The term distributed deadlock is ambiguous. There are many types of distributed deadlocks. Scenario 1 Client application opens connection A, begins a transaction, acquires some locks, opens connection B, connection B gets blocked by A but the application is designed to not commit A’s transaction until B completes. Note SQL Server has no way of knowing that connection A is somehow dependent on B – they are two distinct connections with two distinct transactions. This situation is discussed in scenario #4 in “Q224453 INF: Understanding and Resolving SQL Server 7.0 Blocking Problems”. Distributed Deadlock – Scenario 2 Scenario 2 Distributed deadlock involving bound connections. Two connections can be bound into a single transaction context with sp_getbindtoken/sp_bindsession or via DTC. Spid 60 enlists in a transaction with spid 61. A third spid 62 is blocked by spid 60, but spid 61 is blocked by spid 62. Because they are doing work in the same transaction, spid 60 cannot commit until spid 61 finishes his work, but spid 61 is blocked by 62 who is blocked by 60. This scenario is described in article “Q239753 - Deadlock Situation Not Detected by SQL Server.” Note SQL Server 6.5 and 7.0 do not detect this deadlock. The SQL Server 2000 deadlock detection algorithm has been enhanced to detect this type of distributed deadlock. The diagram in the slide illustrates this situation. Resources locked by a spid are below that spid (in a box). Arrows indicate blocking and are drawn from the blocked spid to the resource that the spid requires. A circle represents a transaction; spids in the same transaction are shown in the same circle. Distributed Deadlock – Scenario 3 Scenario 3 Distributed deadlock involving linked servers or server-to-server RPC. Spid 60 on Server 1 executes a stored procedure on Server 2 via linked server. This stored procedure does a loopback linked server query against a table on Server 1, and this connection is blocked by a lock held by Spid 60. Note No version of SQL Server is currently designed to detect this distributed deadlock. Lesson 4: Information Collection and Analysis This lesson outlines some of the common causes that contribute to the perception of a slow server. What You Will Learn After completing this lesson, you will be able to:  Identify specific information needed for troubleshooting issues.  Locate and collect information needed for troubleshooting issues.  Analyze output of DBCC Inputbuffer, DBCC PSS, and DBCC Page commands.  Review information collected from master.dbo.sysprocesses table.  Review information collected from master.dbo.syslockinfo table.  Review output of sp_who, sp_who2, sp_lock.  Analyze Profiler log for query usage pattern.  Review output of trace flags to help troubleshoot deadlocks. Recommended Reading Q244455 - INF: Definition of Sysprocesses Waittype and Lastwaittype Fields Q244456 - INF: Description of DBCC PSS Command for SQL Server 7.0 Q271509 - INF: How to Monitor SQL Server 2000 Blocking Q251004 - How to Monitor SQL Server 7.0 Blocking Q224453 - Understanding and Resolving SQL Server 7.0 Blocking Problem Q282749 – BUG: Deadlock information reported with SQL Server 2000 Profiler Locking and Blocking  Try This: Examine Blocked Processes 1. Open a Query Window and connect to the pubs database. Execute the following statements: BEGIN TRAN -- connection 1 UPDATE titles SET price = price + 1 2. Open another connection and execute the following statement: SELECT * FROM titles-- connection 2 3. Open a third connection and execute sp_who; note the process id (spid) of the blocked process. (Connection 3) 4. In the same connection, execute the following: SELECT spid, cmd, waittype FROM master..sysprocesses WHERE waittype 0 -- connection 3 5. Do not close any of the connections! What was the wait type of the blocked process?  Try This: Look at locks held Assumes all your connections are still open from the previous exercise. • Execute sp_lock -- Connection 3 What locks is the process from the previous example holding? Make sure you run ROLLBACK TRAN in Connection 1 to clean up your transaction. Collecting Information See Module 2 for more about how to gather this information using various tools. Recognizing Blocking Problems How to Recognize Blocking Problems  Users complain about poor performance at a certain time of day, or after a certain number of users connect.  SELECT * FROM sysprocesses or sp_who2 shows non-zero values in the blocked or BlkBy column.  More severe blocking incidents will have long blocking chains or large sysprocesses.waittime values for blocked spids.  Possibl
Delphi 7.1 Update Release Notes=======================================================This file contains important supplemental and late-breakinginformation that may not appear in the main productdocumentation, and supersedes information contained in otherdocuments, including previously installed release notes.Borland recommends that you read this file in its entirety.NOTE: If you are updating a localized version of Delphi 7, visit the Borland Registered User web site to obtain a localized readme file that may contain important late- breaking information not included in this readme file.IMPORTANT: Delphi must be closed before installing this update. =====================================================CONTENTS * INSTALLING THIS UPDATE * UPDATING LOCALIZED VERSIONS OF DELPHI 7 * KNOWN ISSUES * ISSUES ADDRESSED BY THIS UPDATE - IDE - CORE DATABASE - DATASNAP - DBGO (ADO COMPONENTS) - dbExpress - dbExpress COMPONENTS AND DB VCL - dbExpress CORE DRIVER AND METADATA - dbExpress VENDOR ISSUES - dbExpress CERTIFICATION - WEB SNAP - ACTIVEX - COMPILER - RTL - VCL - THIRD PARTY - BOLD FOR DELPHI * VERIFYING THAT THE UPDATE WAS SUCCESSFUL * FILES INSTALLED BY THIS UPDATE =======================================================INSTALLING THIS UPDATE* This update can not be applied to Delphi 7 Architect Trial version. * This update can not be removed after it is installed.* You will need the original Delphi 7 installation CD available to install this update.* To install this update from the CD, insert the CD, and launch the d7_ent_upd1.exe file appropriate for your locale.* To install this update from the Web, double-click the self-executing installation file and follow the prompts. * The Delphi 7 documentation PDF files are available on the update CD.========================================================UPDATING LOCALIZED VERSIONS OF DELPHI 7* This update can be applied only to the English version of Delphi 7. There are separate updates for the German, French and Japanese ver
Contents Overview 1 Lesson 1: Index Concepts 3 Lesson 2: Concepts – Statistics 29 Lesson 3: Concepts – Query Optimization 37 Lesson 4: Information Collection and Analysis 61 Lesson 5: Formulating and Implementing Resolution 75 Module 6: Troubleshooting Query Performance Overview At the end of this module, you will be able to:  Describe the different types of indexes and how indexes can be used to improve performance.  Describe what statistics are used for and how they can help in optimizing query performance.  Describe how queries are optimized.  Analyze the information collected from various tools.  Formulate resolution to query performance problems. Lesson 1: Index Concepts Indexes are the most useful tool for improving query performance. Without a useful index, Microsoft® SQL Server™ must search every row on every page in table to find the rows to return. With a multitable query, SQL Server must sometimes search a table multiple times so each page is scanned much more than once. Having useful indexes speeds up finding individual rows in a table, as well as finding the matching rows needed to join two tables. What You Will Learn After completing this lesson, you will be able to:  Understand the structure of SQL Server indexes.  Describe how SQL Server uses indexes to find rows.  Describe how fillfactor can impact the performance of data retrieval and insertion.  Describe the different types of fragmentation that can occur within an index. Recommended Reading  Chapter 8: “Indexes”, Inside SQL Server 2000 by Kalen Delaney  Chapter 11: “Batches, Stored Procedures and Functions”, Inside SQL Server 2000 by Kalen Delaney Finding Rows without Indexes With No Indexes, A Table Must Be Scanned SQL Server keeps track of which pages belong to a table or index by using IAM pages. If there is no clustered index, there is a sysindexes row for the table with an indid value of 0, and that row will keep track of the address of the first IAM for the table. The IAM is a giant bitmap, and every 1 bit indicates that the corresponding extent belongs to the table. The IAM allows SQL Server to do efficient prefetching of the table’s extents, but every row still must be examined. General Index Structure All SQL Server Indexes Are Organized As B-Trees Indexes in SQL Server store their information using standard B-trees. A B-tree provides fast access to data by searching on a key value of the index. B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree’s usefulness. The trees are managed, and branches are grafted as necessary, so that navigating down the tree to find a value and locate a specific record takes only a few page accesses. Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed is consistent because the index has the same depth throughout. Clustered and Nonclustered Indexes Both Index Types Have Many Common Features An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. You use the index to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to accessing the whole table. An index can have multiple node levels An index page above the leaf is called a node page. Each index row in node pages contains an index key (or set of keys for a composite index) and a pointer to a page at the next level for which the first key value is the same as the key value in the current index row. Leaf Level contains all key values In any index, whether clustered or nonclustered, the leaf level contains every key value, in key sequence. In SQL Server 2000, the sequence can be either ascending or descending. The sysindexes table contains all sizing, location and distribution information Any information about size of indexes or tables is stored in sysindexes. The only source of any storage location information is the sysindexes table, which keeps track of the address of the root page for every index, and the first IAM page for the index or table. There is also a column for the first page of the table, but this is not guaranteed to be reliable. SQL Server can find all pages belonging to an index or table by examining the IAM pages. Sysindexes contains a pointer to the first IAM page, and each IAM page contains a pointer to the next one. The Difference between Clustered and Nonclustered Indexes The main difference between the two types of indexes is how much information is stored at the leaf. The leaf levels of both types of indexes contain all the key values in order, but they also contain other information. Clustered Indexes The Leaf Level of a Clustered Index Is the Data The leaf level of a clustered index contains the data pages, not just the index keys. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to. Uniqueness Is Maintained In Key Values In SQL Server 2000, all clustered indexes are unique. If you build a clustered index without specifying the unique keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You can see this extra value if you use DBCC PAGE to look at the actual index rows the section on indexes internal. . Finding Rows in a Clustered Index The Leaf Level of a Clustered Index Contains the Data A clustered index is like a telephone directory in which all of the rows for customers with the same last name are clustered together in the same part of the book. Just as the organization of a telephone directory makes it easy for a person to search, SQL Server quickly searches a table with a clustered index. Because a clustered index determines the sequence in which rows are stored in a table, there can only be one clustered index for a table at a time. Performance Considerations Keeping your clustered key value small increases the number of index rows that can be placed on an index page and decreases the number of levels that must be traversed. This minimizes I/O. As we’ll see, the clustered key is duplicated in every nonclustered index row, so keeping your clustered key small will allow you to have more index fit per page in all your indexes. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname = ‘Ota’ Nonclustered Indexes The Leaf Level of a Nonclustered Index Contains a Bookmark A nonclustered index is like the index of a textbook. The data is stored in one place and the index is stored in another. Pointers indicate the storage location of the indexed items in the underlying table. In a nonclustered index, the leaf level contains each index key, plus a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. A bookmark can take one of two forms:  If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. This clustered key can be multiple column if the clustered index is composite, or is defined to be non-unique.  If the table is a heap (in other words, it has no clustered index), the bookmark is a RID, which is an actual row locator in the form File#:Page#:Slot#. Finding Rows with a NC Index on a Heap Nonclustered Indexes Are Very Efficient When Searching For A Single Row After the nonclustered key at the leaf level of the index is found, only one more page access is needed to find the data row. Searching for a single row using a nonclustered index is almost as efficient as searching for a single row in a clustered index. However, if we are searching for multiple rows, such as duplicate values, or keys in a range, anything more than a small number of rows will make the nonclustered index search very inefficient. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname BETWEEN ‘Master’ AND ‘Rudd’ Finding Rows with a NC Index on a Clustered Table A Clustered Key Is Used as the Bookmark for All Nonclustered Indexes If the table has a clustered index, all columns of the clustered key will be duplicated in the nonclustered index leaf rows, unless there is overlap between the clustered and nonclustered key. For example, if the clustered index is on (lastname, firstname) and a nonclustered index is on firstname, the firstname value will not be duplicated in the nonclustered index leaf rows. Note The query corresponding to the slide is: SELECT lastname, firstname, phone FROM member WHERE firstname = ‘Mike’ Covering Indexes A Covering Index Provides the Fastest Data Access A covering index contains ALL the fields accessed in the query. Normally, only the columns in the WHERE clause are helpful in determining useful indexes, but for a covering index, all columns must be included. If all columns needed for the query are in the index, SQL Server never needs to access the data pages. If even one column in the query is not part of the index, the data rows must be accessed. The leaf level of an index is the only level that contains every key value, or set of key values. For a clustered index, the leaf level is the data itself, so in reality, a clustered index ALWAYS covers any query. Nevertheless, for most of our optimization discussions, we only consider nonclustered indexes. Scanning the leaf level of a nonclustered index is almost always faster than scanning a clustered index, so covering indexes are particular valuable when we need ALL the key values of a particular nonclustered index. Example: Select an aggregate value of a column with a clustered index. Suppose we have a nonclustered index on price, this query is covered: SELECT avg(price) from titles Since the clustered key is included in every nonclustered index row, the clustered key can be included in the covering. Suppose you have a nonclustered index on price and a clustered index on title_id; then this query is covered: SELECT title_id, price FROM titles WHERE price between 10 and 20 Performance Considerations In general, you do want to keep your indexes narrow. However, if you have a critical query that just is not giving you satisfactory performance no matter what you do, you should consider creating an index to cover it, or adding one or two extra columns to an existing index, so that the query will be covered. The leaf level of a nonclustered index is like a ‘mini’ clustered index, so you can have most of the benefits of clustering, even if there already is another clustered index on the table. The tradeoff to adding more, wider indexes for covering queries are the added disk space, and more overhead for updating those columns that are now part of the index. Bug In general, SQL Server will detect when a query is covered, and detect the possible covering indexes. However, in some cases, you must force SQL Server to use a covering index by including a WHERE clause, even if the WHERE clause will return ALL the rows in the table. This is SHILOH bug #352079 Steps to reproduce 1. Make copy of orders table from Northwind: USE Northwind CREATE TABLE [NewOrders] ( [OrderID] [int] NOT NULL , [CustomerID] [nchar] (5) NULL , [EmployeeID] [int] NULL , [OrderDate] [datetime] NULL , [RequiredDate] [datetime] NULL , [ShippedDate] [datetime] NULL , [ShipVia] [int] NULL , [Freight] [money] NULL , [ShipName] [nvarchar] (40) NULL, [ShipAddress] [nvarchar] (60) , [ShipCity] [nvarchar] (15) NULL, [ShipRegion] [nvarchar] (15) NULL, [ShipPostalCode] [nvarchar] (10) NULL, [ShipCountry] [nvarchar] (15) NULL ) INSERT into NewOrders SELECT * FROM Orders 2. Build nc index on OrderDate: create index dateindex on neworders(orderdate) 3. Test Query by looking at query plan: select orderdate from NewOrders The index is being scanned, as expected. 4. Build an index on orderId: create index orderid_index on neworders(orderID) 5. Test Query by looking at query plan: select orderdate from NewOrders Now the TABLE is being scanned, instead of the original index! Index Intersection Multiple Indexes Can Be Used On A Single Table In versions prior to SQL Server 7, only one index could be used for any table to process any single query. The only exception was a query involving an OR. In current SQL Server versions, multiple nonclustered indexes can each be accessed, retrieving a set of keys with bookmarks, and then the result sets can be joined on the common bookmarks. The optimizer weighs the cost of performing the unindexed join on the intermediate result sets, with the cost of only using one index, and then scanning the entire result set from that single index. Fillfactor and Performance Creating an Index with a Low Fillfactor Delays Page Splits when Inserting DBCC SHOWCONTIG will show you a low value for “Avg. Page Density” when a low fillfactor has been specified. This is good for inserts and updates, because it will delay the need to split pages to make room for new rows. It can be bad for scans, because fewer rows will be on each page, and more pages must be read to access the same amount of data. However, this cost will be minimal if the scan density value is good. Index Reorganization DBCC SHOWCONTIG Provides Lots of Information Here’s some sample output from running a basic DBCC SHOWCONTIG on the order details table in the Northwind database: DBCC SHOWCONTIG scanning 'Order Details' table... Table: 'Order Details' (325576198); index ID: 1, database ID:6 TABLE level scan performed. - Pages Scanned................................: 9 - Extents Scanned..............................: 6 - Extent Switches..............................: 5 - Avg. Pages per Extent........................: 1.5 - Scan Density [Best Count:Actual Count].......: 33.33% [2:6] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 16.67% - Avg. Bytes Free per Page.....................: 673.2 - Avg. Page Density (full).....................: 91.68% By default, DBCC SHOWCONTIG scans the page chain at the leaf level of the specified index and keeps track of the following values:  Average number of bytes free on each page (Avg. Bytes Free per Page)  Number of pages accessed (Pages scanned)  Number of extents accessed (Extents scanned)  Number of times a page had a lower page number than the previous page in the scan (This value for Out of order pages is not displayed, but is used for additional computations.)  Number of times a page in the scan was on a different extent than the previous page in the scan (Extent switches) SQL Server also keeps track of all the extents that have been accessed, and then it determines how many gaps are in the used extents. An extent is identified by the page number of its first page. So, if extents 8, 16, 24, 32, and 40 make up an index, there are no gaps. If the extents are 8, 16, 24, and 40, there is one gap. The value in DBCC SHOWCONTIG’s output called Extent Scan Fragmentation is computed by dividing the number of gaps by the number of extents, so in this example the Extent Scan Fragmentation is ¼, or 25 percent. A table using extents 8, 24, 40, and 56 has three gaps, and its Extent Scan Fragmentation is ¾, or 75 percent. The maximum number of gaps is the number of extents - 1, so Extent Scan Fragmentation can never be 100 percent. The value in DBCC SHOWCONTIG’s output called Logical Scan Fragmentation is computed by dividing the number of Out of order pages by the number of pages in the table. This value is meaningless in a heap. You can use either the Extent Scan Fragmentation value or the Logical Scan Fragmentation value to determine the general level of fragmentation in a table. The lower the value, the less fragmentation there is. Alternatively, you can use the value called Scan Density, which is computed by dividing the optimum number of extent switches by the actual number of extent switches. A high value means that there is little fragmentation. Scan Density is not valid if the table spans multiple files; therefore, it is less useful than the other values. SQL Server 2000 allows online defragmentation You can choose from several methods for removing fragmentation from an index. You could rebuild the index and have SQL Server allocate all new contiguous pages for you. To rebuild the index, you can use a simple DROP INDEX and CREATE INDEX combination, but in many cases using these commands is less than optimal. In particular, if the index is supporting a constraint, you cannot use the DROP INDEX command. Alternatively, you can use DBCC DBREINDEX, which can rebuild all the indexes on a table in one operation, or you can use the drop_existing clause along with CREATE INDEX. The drawback of these methods is that the table is unavailable while SQL Server is rebuilding the index. When you are rebuilding only nonclustered indexes, SQL Server takes a shared lock on the table, which means that users cannot make modifications, but other processes can SELECT from the table. Of course, those SELECT queries cannot take advantage of the index you are rebuilding, so they might not perform as well as they would otherwise. If you are rebuilding a clustered index, SQL Server takes an exclusive lock and does not allow access to the table, so your data is temporarily unavailable. SQL Server 2000 lets you defragment an index without completely rebuilding it. DBCC INDEXDEFRAG reorders the leaf-level pages into physical order as well as logical order, but using only the pages that are already allocated to the leaf level. This command does an in-place ordering, which is similar to a sorting technique called bubble sort (you might be familiar with this technique if you've studied and compared various sorting algorithms). In-place ordering can reduce logical fragmentation to 2 percent or less, making an ordered scan through the leaf level much faster. DBCC INDEXDEFRAG also compacts the pages of an index, based on the original fillfactor. The pages will not always end up with the original fillfactor, but SQL Server uses that value as a goal. The defragmentation process attempts to leave at least enough space for one average-size row on each page. In addition, if SQL Server cannot obtain a lock on a page during the compaction phase of DBCC INDEXDEFRAG, it skips the page and does not return to it. Any empty pages created as a result of compaction are removed. The algorithm SQL Server 2000 uses for DBCC INDEXDEFRAG finds the next physical page in a file belonging to the index's leaf level and the next logical page in the leaf level to swap it with. To find the next physical page, the algorithm scans the IAM pages belonging to that index. In a database spanning multiple files, in which a table or index has pages on more than one file, SQL Server handles pages on different files separately. SQL Server finds the next logical page by scanning the index's leaf level. After each page move, SQL Server drops all locks and saves the last key on the last page it moved. The next iteration of the algorithm uses the last key to find the next logical page. This process lets other users update the table and index while DBCC INDEXDEFRAG is running. Let us look at an example in which an index's leaf level consists of the following pages in the following logical order: 47 22 83 32 12 90 64 The first key is on page 47, and the last key is on page 64. SQL Server would have to scan the pages in this order to retrieve the data in sorted order. As its first step, DBCC INDEXDEFRAG would find the first physical page, 12, and the first logical page, 47. It would then swap the pages, using a temporary buffer as a holding area. After the first swap, the leaf level would look like this: 12 22 83 32 47 90 64 The next physical page is 22, which is also the next logical page, so no work would be necessary. DBCC INDEXDEFRAG would then swap the next physical page, 32, with the next logical page, 83: 12 22 32 83 47 90 64 After the next swap of 47 with 83, the leaf level would look like this: 12 22 32 47 83 90 64 Then, the defragmentation process would swap 64 with 83: 12 22 32 47 64 90 83 and 83 with 90: 12 22 32 47 64 83 90 At the end of the DBCC INDEXDEFRAG operation, the pages in the table or index are not contiguous, but their logical order matches their physical order. Now, if the pages were accessed from disk in sorted order, the head would need to move in only one direction. Keep in mind that DBCC INDEXDEFRAG uses only pages that are already part of the index's leaf level; it allocates no new pages. In addition, defragmenting a large table can take quite a while, and you will get a report every 5 minutes about the estimated percentage completed. However, except for the locks on the pages being switched, this command needs no additional locks. All the table's other pages and indexes are fully available for your applications to use during the defragmentation process. If you must completely rebuild an index because you want a new fillfactor, or if simple defragmentation is not enough because you want to remove all fragmentation from your indexes, another SQL Server 2000 improvement makes index rebuilding less of an imposition on the rest of the system. SQL Server 2000 lets you create an index in parallel—that is, using multiple processors—which drastically reduces the time necessary to perform the rebuild. The algorithm SQL Server 2000 uses, allows near-linear scaling with the number of processors you use for the rebuild, so four processors will take only one-fourth the time that one processor requires to rebuild an index. System availability increases because the length of time that a table is unavailable decreases. Note that only the SQL Server 2000 Enterprise Edition supports parallel index creation. Indexes on Views and Computed Columns Building an Index Gives the Data Physical Existence Normally, views are only logical and the rows comprising the view’s data are not generated until the view is accessed. The values for computed columns are typically not stored anywhere in the database; only the definition for the computation is stored and the computation is redone every time a computed column is accessed. The first index on a view must be a clustered index, so that the leaf level can hold all the actual rows that make up the view. Once that clustered index has been build, and the view’s data is now physical, additional (nonclustered) indexes can be built. An index on a computed column can be nonclustered, because all we need to store is the index key values. Common Prerequisites for Indexed Views and Indexes on Computed Columns In order for SQL Server to create use these special indexes, you must have the seven SET options correctly specified: ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNING must be all ON NUMERIC_ROUNDABORT must be OFF Only deterministic expressions can be used in the definition of Indexed Views or indexes on Computed Columns. See the BOL for the list of deterministic functions and expressions. Property functions are available to check if a column or view meets the requirements and is indexable. SELECT OBJECTPROPERTY (Object_id, ‘IsIndexable’) SELECT COLUMNPROPERTY (Object_id, column_name , ‘IsIndexable’ ) Schema Binding Guarantees That Object Definition Won’t Change A view can only be indexed if it has been built with schema binding. The SQL Server Optimizer Determines If the Indexed View Can Be Used The query must request a subset of the data contained in the view. The ability of the optimizer to use the indexed view even if the view is not directly referenced is available only in SQL Server 2000 Enterprise Edition. In Standard edition, you can create indexed views, and you can select directly from them, but the optimizer will not choose to use them if they are not directly referenced. Examples of Indexed Views: The best candidates for improvement by indexed views are queries performing aggregations and joins. We will explain how the useful indexed views may be created for these two major groups of queries. The considerations are valid also for queries and indexed views using both joins and aggregations. -- Example: USE Northwind -- Identify 5 products with overall biggest discount total. -- This may be expressed for example by two different queries: -- Q1. select TOP 5 ProductID, SUM(UnitPrice*Quantity)- SUM(UnitPrice*Quantity*(1.00-Discount)) Rebate from [order details] group by ProductID order by Rebate desc --Q2. select TOP 5 ProductID, SUM(UnitPrice*Quantity*Discount) Rebate from [order details] group by ProductID order by Rebate desc --The following indexed view will be used to execute Q1. create view Vdiscount1 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount1 (ProductID) However, it will not be used by the Q2 because the indexed view does not contain the SUM(UnitPrice*Quantity*Discount) aggregate. We can construct another indexed view create view Vdiscount2 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, SUM(UnitPrice*Quantity*Discount) SumDiscoutPrice2, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount2 (ProductID) This view may be used by both Q1 and Q2. Observe that the indexed view Vdiscount2 will have the same number of rows and only one more column compared to Vdiscount1, and it may be used by more queries. In general, try to design indexed views that may be used by more queries. The following query asking for the order with the largest total discount -- Q3. select TOP 3 OrderID, SUM(UnitPrice*Quantity*Discount) OrderRebate from dbo.[order details] group By OrderID Q3 can use neither of the Vdiscount views because the column OrderID is not included in the view definition. To address this variation of the discount analysis query we may create a different indexed view, similar to the query itself. An attempt to generalize the previous indexed view Vdiscount2 so that all three queries Q1, Q2, and Q3 can take advantage of a single indexed view would require a view with both OrderID and ProductID as grouping columns. Because the OrderID, ProductID combination is unique in the original order details table the resulting view would have as many rows as the original table and we would see no savings in using such view compared to using the original table. Consider the size of the resulting indexed view. In the case of pure aggregation, the indexed view may provide no significant performance gains if its size is close to the size of the original table. Complex aggregates (STDEV, VARIANCE, AVG) cannot participate in the index view definition. However, SQL Server may use an indexed view to execute a query containing AVG aggregate. Query containing STDEV or VARIANCE cannot use indexed view to pre-compute these values. The next example shows a query producing the average price for a particular product -- Q4. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID group by ProductName, od.ProductID This is an example of indexed view that will be considered by the SQL Server to answer the Q4 create view v3 with schemabinding as select od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) Price, COUNT_BIG(*) Count, SUM(od.Quantity) Units from dbo.[order details] od group by od.ProductID go create UNIQUE CLUSTERED index iv3 on v3 (ProductID) go Observe that the view definition does not contain the table Products. The indexed view does not need to contain all tables used in the query that uses the indexed view. In addition, the following query (same as above Q4 only with one additional search condition) will use the same indexed view. Observe that the added predicate references only columns from tables not present in the v3 view definition. -- Q5. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and p.ProductName like '%tofu%' group by ProductName, od.ProductID The following query cannot use the indexed view because the added search condition od.UnitPrice>10 contains a column from the table in the view definition and the column is neither grouping column nor the predicate appears in the view definition. -- Q6. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID To contrast the Q6 case, the following query will use the indexed view v3 since the added predicate is on the grouping column of the view v3. -- Q7. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.ProductID in (1,2,13,41) group by ProductName, od.ProductID -- The previous query Q6 will use the following indexed view V4: create view V4 with schemabinding as select ProductName, od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units, COUNT_BIG(*) Count from dbo.[order details] od, dbo.Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID create unique clustered index VDiscountInd on V4 (ProductName, ProductID) The same index on the view V4 will be used also for a query where a join to the table Orders is added, for example -- Q8. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 group by ProductName, od.ProductID We will show several modifications of the query Q8 and explain why such modifications cannot use the above view V4. -- Q8a. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>25 group by ProductName, od.ProductID 8a cannot use the indexed view because of the where clause mismatch. Observe that table Orders does not participate in the indexed view V4 definition. In spite of that, adding a predicate on this table will disallow using the indexed view because the added predicate may eliminate additional rows participating in the aggregates as it is shown in Q8b. -- Q8b. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 and o.OrderDate>'01/01/1998' group by ProductName, od.ProductID Locking and Indexes In General, You Should Let SQL Server Control the Locking within Indexes The stored procedure sp_indexoption lets you manually control the unit of locking within an index. It also lets you disallow page locks or row locks within an index. Since these options are available only for indexes, there is no way to control the locking within the data pages of a heap. (But remember that if a table has a clustered index, the data pages are part of the index and are affected by the sp_indexoption setting.) The index options are set for each table or index individually. Two options, Allow Rowlocks and AllowPageLocks, are both set to TRUE initially for every table and index. If both of these options are set to FALSE for a table, only full table locks are allowed. As described in Module 4, SQL Server determines at runtime whether to initially lock rows, pages, or the entire table. The locking of rows (or keys) is heavily favored. The type of locking chosen is based on the number of rows and pages to be scanned, the number of rows on a page, the isolation level in effect, the update activity going on, the number of users on the system needing memory for their own purposes, and so on. SAP databases frequently use sp_indexoption to reduce deadlocks Setting vs. Querying In SQL Server 2000, the procedure sp_indexoption should only be used for setting an index option. To query an option, use the INDEXPROPERTY function. Lesson 2: Concepts – Statistics Statistics are the most important tool that the SQL Server query optimizer has to determine the ideal execution plan for a query. Statistics that are out of date or nonexistent seriously jeopardize query performance. SQL Server 2000 computes and stores statistics in a completely different format that all earlier versions of SQL Server. One of the improvements is an increased ability to determine which values are out of the normal range in terms of the number of occurrences. The new statistics maintenance routines are particularly good at determining when a key value has a very unusual skew of data. What You Will Learn After completing this lesson, you will be able to:  Define terms related to statistics collected by SQL Server.  Describe how statistics are maintained by SQL Server.  Discuss the autostats feature of SQL Server.  Describe how statistics are used in query optimization. Recommended Reading  Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/techart/statquery.htm Definitions Cardinality The cardinality means how many unique values exist in the data. Density For each index and set of column statistics, SQL Server keeps track of details about the uniqueness (or density) of the data values encountered, which provides a measure of how selective the index is. A unique index, of course, has the lowest density —by definition, each index entry can point to only one row. A unique index has a density value of 1/number of rows in the table. Density values range from 0 through 1. Highly selective indexes have density values of 0.10 or lower. For example, a unique index on a table with 8345 rows has a density of 0.00012 (1/8345). If a nonunique nonclustered index has a density of 0.2165 on the same table, each index key can be expected to point to about 1807 rows (0.2165 × 8345). This is probably not selective enough to be more efficient than just scanning the table, so this index is probably not useful. Because driving the query from a nonclustered index means that the pages must be retrieved in index order, an estimated 1807 data page accesses (or logical reads) are needed if there is no clustered index on the table and the leaf level of the index contains the actual RID of the desired data row. The only time a data page doesn’t need to be reaccessed is when the occasional coincidence occurs in which two adjacent index entries happen to point to the same data page. In general, you can think of density as the average number of duplicates. We can also talk about the term ‘join density’, which applies to the average number of duplicates in the foreign key column. This would answer the question: in this one-to-many relationship, how many is ‘many’? Selectivity In general selectivity applies to a particular data value referenced in a WHERE clause. High selectivity means that only a small percentage of the rows satisfy the WHERE clause filter, and a low selectivity means that many rows will satisfy the filter. For example, in an employees table, the column employee_id is probably very selective, and the column gender is probably not very selective at all. Statistics Statistics are a histogram consisting of an even sampling of values for a column or for an index key (or the first column of the key for a composite index) based on the current data. The histogram is stored in the statblob field of the sysindexes table, which is of type image. (Remember that image data is actually stored in structures separate from the data row itself. The data row merely contains a pointer to the image data. For simplicity’s sake, we’ll talk about the index statistics as being stored in the image field called statblob.) To fully estimate the usefulness of an index, the optimizer also needs to know the number of pages in the table or index; this information is stored in the dpages column of sysindexes. During the second phase of query optimization, index selection, the query optimizer determines whether an index exists for a columns in your WHERE clause, assesses the index’s usefulness by determining the selectivity of the clause (that is, how many rows will be returned), and estimates the cost of finding the qualifying rows. Statistics for a single column index consist of one histogram and one density value. The multicolumn statistics for one set of columns in a composite index consist of one histogram for the first column in the index and density values for each prefix combination of columns (including the first column alone). The fact that density information is kept for all columns helps the optimizer decide how useful the index is for joins. Suppose, for example, that an index is composed of three key fields. The density on the first column might be 0.50, which is not too useful. However, as you look at more key columns in the index, the number of rows pointed to is fewer than (or in the worst case, the same as) the first column, so the density value goes down. If you are looking at both the first and second columns, the density might be 0.25, which is somewhat better. Moreover, if you examine three columns, the density might be 0.03, which is highly selective. It does not make sense to refer to the density of only the second column. The lead column density is always needed. Statistics Maintenance Statistics Information Tracks the Distribution of Key Values SQL Server statistics is basically a histogram that contains up to 200 values of a given key column. In addition to the histogram, the statblob field contains the following information:  The time of the last statistics collection  The number of rows used to produce the histogram and density information  The average key length  Densities for other combinations of columns In the statblob column, up to 200 sample values are stored; the range of key values between each sample value is called a step. The sample value is the endpoint of the range. Three values are stored along with each step: a value called EQ_ROWS, which is the number of rows that have a value equal to that sample value; a value called RANGE_ROWS, which specifies how many other values are inside the range (between two adjacent sample values); and the number of distinct values, or RANGE_DENSITY of the range. DBCC SHOW_STATISTICS The DBCC SHOW_STATISTICS output shows us the first two of these three values, but not the range density. The RANGE_DENSITY is instead used to compute two additional values:  DISTINCT_RANGE_ROWS—the number of distinct rows inside this range (not counting the RANGE_HI_KEY value itself. This is computed as 1/RANGE_DENSITY.  AVG_RANGE_ROWS—the average number of rows per distinct value, computed as RANGE_DENSITY * RANGE_ROWS. In addition to statistics on indexes, SQL Server can also keep track of statistics on columns with no indexes. Knowing the density, or the likelihood of a particular value occurring, can help the optimizer determine an optimum processing strategy, even if SQL Server can’t use an index to actually locate the values. Statistics on Columns Column statistics can be useful for two main purposes  When the SQL Server optimizer is determining the optimal join order, it frequently is best to have the smaller input processed first. By ‘input’ we mean table after all filters in the WHERE clause have been applied. Even if there is no useful index on a column in the WHERE clause, statistics could tell us that only a few rows will quality, and those the resulting input will be very small.  The SQL Server query optimizer can use column statistics on non-initial columns in a composite nonclustered index to determine if scanning the leaf level to obtain the bookmarks will be an efficient processing strategy. For example, in the member table in the credit database, the first name column is almost unique. Suppose we have a nonclustered index on (lastname, firstname), and we issue this query: select * from member where firstname = 'MPRO' In this case, statistics on the firstname column would indicate very few rows satisfying this condition, so the optimizer will choose to scan the nonclustered index, since it is smaller than the clustered index (the table). The small number of bookmarks will then be followed to retrieve the actual data. Manually Updating Statistics You can also manually force statistics to be updated in one of two ways. You can run the UPDATE STATISTICS command on a table or on one specific index or column statistics, or you can also execute the procedure sp_updatestats, which runs UPDATE STATISTICS against all user-defined tables in the current database. You can create statistics on unindexed columns using the CREATE STATISTICS command or by executing sp_createstats, which creates single-column statistics for all eligible columns for all user tables in the current database. This includes all columns except computed columns and columns of the ntext, text, or image datatypes, and columns that already have statistics or are the first column of an index. Autostats By Default SQL Server Will Update Statistics on Any Index or Column as Needed Every database is created with the database options auto create statistics and auto update statistics set to true, but you can turn either one off. You can also turn off automatic updating of statistics for a specific table in one of two ways:  UPDATE STATISTICS In addition to updating the statistics, the option WITH NORECOMPUTE indicates that the statistics should not be automatically recomputed in the future. Running UPDATE STATISTICS again without the WITH NORECOMPUTE option enables automatic updates.  sp_autostats This procedure sets or unsets a flag for a table to indicate that statistics should or should not be updated automatically. You can also use this procedure with only the table name to find out whether the table is set to automatically have its index statistics updated. ' However, setting the database option auto update statistics to FALSE overrides any individual table settings. In other words, no automatic updating of statistics takes place. This is not a recommended practice unless thorough testing has shown you that you do not need the automatic updates or that the performance overhead is more than you can afford. Trace Flags Trace flag 205 – reports recompile due to autostats. Trace flag 8721 – writes information to the errorlog when AutoStats has been run. For more information, see the following Knowledge Base article: Q195565 “INF: How SQL Server 7.0 Autostats Work.” Statistics and Performance The Performance Penalty of NOT Having Up-To-Date Statistics Far Outweighs the Benefit of Avoiding Automatic Updating Autostats should be turned off only after thorough testing shows it to be necessary. Because autostats only forces a recompile after a certain number or percentage of rows has been changed, you do not have to make any adjustments for a read-only database. Lesson 3: Concepts – Query Optimization What You Will Learn After completing this lesson, you will be able to:  Describe the phases of query optimization.  Discuss how SQL Server estimates the selectivity of indexes and column and how this estimate is used in query optimization. Recommended Reading  Chapter 15: “The Query Processor”, Inside SQL Server 2000 by Kalen Delaney  Chapter 16: “Query Tuning”, Inside SQL Server 2000 by Kalen Delaney  Whitepaper about SQL Server Query Processor Architecture by Hal Berenson and Kalen Delaney http://msdn.microsoft.com/library/backgrnd/html/sqlquerproc.htm Phases of Query Optimization Query Optimization Involves several phases Trivial Plan Optimization Optimization itself goes through several steps. The first step is something called Trivial Plan Optimization. The whole idea of trivial plan optimization is that cost based optimization is a bit expensive to run. The optimizer can try a great many possible variations trying to find the cheapest plan. If SQL Server knows that there is only one really viable plan for a query, it could avoid a lot of work. A prime example is a query that consists of an INSERT with a VALUES clause. There is only one possible plan. Another example is a SELECT where all the columns are in a unique covering index, and that index is the only one that is useable. There is no other index that has that set of columns in it. These two examples are cases where SQL Server should just generate the plan and not try to find something better. The trivial plan optimizer finds the really obvious plans, which are typically very inexpensive. In fact, all the plans that get through the autoparameterization template result in plans that the trivial plan optimizer can find. Between those two mechanisms, the plans that are simple tend to be weeded out earlier in the process and do not pay a lot of the compilation cost. This is a good thing, because the number of potential plans in 7.0 went up astronomically as SQL Server added hash joins, merge joins and index intersections, to its list of processing techniques. Simplification and Statistics Loading If a plan is not found by the trivial plan optimizer, SQL Server can perform some simplifications, usually thought of as syntactic transformations of the query itself, looking for commutative properties and operations that can be rearranged. SQL Server can do constant folding, and other operations that do not require looking at the cost or analyzing what indexes are, but that can result in a more efficient query. SQL Server then loads up the metadata including the statistics information on the indexes, and then the optimizer goes through a series of phases of cost based optimization. Cost Based Optimization Phases The cost based optimizer is designed as a set of transformation rules that try various permutations of indexes and join strategies. Because of the number of potential plans in SQL Server 7.0 and SQL Server 2000, if the optimizer just ran through all the combinations and produced a plan, the optimization process would take a very long time to run. Therefore, optimization is broken up into phases. Each phase is a set of rules. After each phase is run, the cost of any resulting plan is examined, and if SQL Server determines that the plan is cheap enough, that plan is kept and executed. If the plan is not cheap enough, the optimizer runs the next phase, which is another set of rules. In the vast majority of cases, a good plan will be found in the preliminary phases. Typically, if the plan that a query would have had in SQL Server 6.5 is also the optimal plan in SQL Server 7.0 and SQL Server 2000, the plan will tend to be found either by the trivial plan optimizer or by the first phase of the cost based optimizer. The rules were intentionally organized to try to make that be true. The plan will probably consist of using a single index and using nested loops. However, every once in a while, because of lack of statistical information, or some other nuance, the optimizer will have to proceed with the later phases of optimization. Sometimes this is because there is a real possibility that the optimizer could find a better plan. When a plan is found, it becomes the optimizer’s output, and then SQL Server goes through all the caching mechanisms that we have already discussed in Module 5. Full Optimization At some point, the optimizer determines that it has gone through enough preliminary phases, and it reverts to a phase called full optimization. If the optimizer goes through all the preliminary phases, and still has not found a cheap plan, it examines the cost for the plan that it has so far. If the cost is above the threshold, the optimizer goes into a phase called full optimization. This threshold is configurable, as the configuration option ‘cost threshold for parallelism’. The full optimization phase assumes that this plan should be run this in parallel. If the machine is very busy, the plan will end up running it in serial, but the optimizer has a goal to produce a good parallel. If the cost is below the threshold (or a single processor machine), the full optimization phase just uses a brute force method to find a serial plan. Selectivity Estimation Selectivity Is One of The Most Important Pieces of Information One of the most import things the optimizer needs to know is the number of rows from any table that will meet all the conditions in the query. If there are no restrictions on a table, and all the rows will be needed, the optimizer can determine the number of rows from the sysindexes table. This number is not absolutely guaranteed to be accurate, but it is the number the optimizer uses. If there is a filter on the table in a WHERE clause, the optimizer needs statistics information. Indexes automatically maintain statistics, and the optimizer will use these values to determine the usefulness of the index. If there is no index on the column involved in the filter, then column statistics can be used or generated. Optimizing Search Arguments In General, the Filters in the WHERE Clause Determine Which Indexes Will Be Useful If an indexed column is referenced in a Search Argument (SARG), the optimizer will analyze the cost of using that index. A SARG has the form:  column value  value column  Operator must be one of =, >, >= <, <= The value can be a constant, an operation, or a variable. Some functions also will be treated as SARGs. These queries have SARGs, and a nonclustered index on firstname will be used in most cases: select * from member where firstname < 'AKKG' select * from member where firstname = substring('HAAKGALSFJA', 2,5) select * from member where firstname = 'AA' + 'KG' declare @name char(4) set @name = 'AKKG' select * from member where firstname < @name Not all functions can be used in SARGs. select * from charge where charge_amt < 2*2 select * from charge where charge_amt < sqrt(16) Compare these queries to ones using = instead of <. With =, the optimizer can use the density information to come up with a good row estimate, even if it’s not going to actually perform the function’s calculations. A filter with a variable is usually a SARG The issue is, can the optimizer come up with useful costing information? A filter with a variable is not a SARG if the variable is of a different datatype, and the column must be converted to the variable’s datatype For more information, see the following Knowledge Base article: Q198625 Enter Title of KB Article Here Use credit go CREATE TABLE [member2] ( [member_no] [smallint] NOT NULL , [lastname] [shortstring] NOT NULL , [firstname] [shortstring] NOT NULL , [middleinitial] [letter] NULL , [street] [shortstring] NOT NULL , [city] [shortstring] NOT NULL , [state_prov] [statecode] NOT NULL , [country] [countrycode] NOT NULL , [mail_code] [mailcode] NOT NULL ) GO insert into member2 select member_no, lastname, firstname, middleinitial, street, city, state_prov, country, mail_code from member alter table member2 add constraint pk_member2 primary key clustered (lastname, member_no, firstname, country) declare @id int set @id = 47 update member2 set city = city + ' City', state_prov = state_prov + ' State' where lastname = 'Barr' and member_no = @id and firstname = 'URQYJBFVRRPWKVW' and country = 'USA' These queries don’t have SARGs, and a table scan will be done: select * from member where substring(lastname, 1,2) = ‘BA’ Some non-SARGs can be converted select * from member where lastname like ‘ba%’ In some cases, you can rewrite your query to turn a non-SARG into a SARG; for example, you can rewrite the substring query above and the LIKE query that follows it. Join Order and Types of Joins Join Order and Strategy Is Determined By the Optimizer The execution plan output will display the join order from top to bottom; i.e. the table listed on top is the first one accessed in a join. You can override the optimizer’s join order decision in two ways:  OPTION (FORCE ORDER) applies to one query  SET FORCEPLAN ON applies to entire session, until set OFF If either of these options is used, the join order is determined by the order the tables are listed in the query’s FROM clause, and no optimizer on JOIN ORDER is done. Forcing the JOIN order may force a particular join strategy. For example, in most outer join operations, the outer table is processed first, and a nested loops join is done. However, if you force the inner table to be accessed first, a merge join will need to be done. Compare the query plan for this query with and without the FORCE ORDER hint: select * from titles right join publishers on titles.pub_id = publishers.pub_id -- OPTION (FORCE ORDER) Nested Loop Join A nested iteration is when the query optimizer constructs a set of nested loops, and the result set grows as it progresses through the rows. The query optimizer performs the following steps. 1. Finds a row from the first table. 2. Uses that row to scan the next table. 3. Uses the result of the previous table to scan the next table. Evaluating Join Combinations The query optimizer automatically evaluates at least four or more possible join combinations, even if those combinations are not specified in the join predicate. You do not have to add redundant clauses. The query optimizer balances the cost and uses statistics to determine the number of join combinations that it evaluates. Evaluating every possible join combination is inefficient and costly. Evaluating Cost of Query Performance When the query optimizer performs a nested join, you should be aware that certain costs are incurred. Nested loop joins are far superior to both merge joins and hash joins when executing small transactions, such as those affecting only a small set of rows. The query optimizer:  Uses nested loop joins if the outer input is quite small and the inner input is indexed and quite large.  Uses the smaller input as the outer table.  Requires that a useful index exist on the join predicate for the inner table.  Always uses a nested loop join strategy if the join operation uses an operator other than an equality operator. Merge Joins The columns of the join conditions are used as inputs to process a merge join. SQL Server performs the following steps when using a merge join strategy: 1. Gets the first input values from each input set. 2. Compares input values. 3. Performs a merge algorithm. • If the input values are equal, the rows are returned. • If the input values are not equal, the lower value is discarded, and the next input value from that input is used for the next comparison. 4. Repeats the process until all of the rows from one of the input sets have been processed. 5. Evaluates any remaining search conditions in the query and returns only rows that qualify. Note Only one pass per input is done. The merge join operation ends after all of the input values of one input have been evaluated. The remaining values from the other input are not processed. Requires That Joined Columns Are Sorted If you execute a query with join operations, and the joined columns are in sorted order, the query optimizer processes the query by using a merge join strategy. A merge join is very efficient because the columns are already sorted, and it requires fewer page I/O. Evaluates Sorted Values For the query optimizer to use the merge join, the inputs must be sorted. The query optimizer evaluates sorted values in the following order: 1. Uses an existing index tree (most typical). The query optimizer can use the index tree from a clustered index or a covered nonclustered index. 2. Leverages sort operations that the GROUP BY, ORDER BY, and CUBE clauses use. The sorting operation only has to be performed once. 3. Performs its own sort operation in which a SORT operator is displayed when graphically viewing the execution plan. The query optimizer does this very rarely. Performance Considerations Consider the following facts about the query optimizer's use of the merge join:  SQL Server performs a merge join for all types of join operations (except cross join or full join operations), including UNION operations.  A merge join operation may be a one-to-one, one-to-many, or many-to-many operation. If the merge join is a many-to-many operation, SQL Server uses a temporary table to store the rows. If duplicate values from each input exist, one of the inputs rewinds to the start of the duplicates as each duplicate value from the other input is processed.  Query performance for a merge join is very fast, but the cost can be high if the query optimizer must perform its own sort operation. If the data volume is large and the desired data can be obtained presorted from existing Balanced-Tree (B-Tree) indexes, merge join is often the fastest join algorithm.  A merge join is typically used if the two join inputs have a large amount of data and are sorted on their join columns (for example, if the join inputs were obtained by scanning sorted indexes).  Merge join operations can only be performed with an equality operator in the join predicate. Hashing is a strategy for dividing data into equal sets of a manageable size based on a given property or characteristic. The grouped data can then be used to determine whether a particular data item matches an existing value. Note Duplicate data or ranges of data are not useful for hash joins because the data is not organized together or in order. When a Hash Join Is Used The query optimizer uses a hash join option when it estimates that it is more efficient than processing queries by using a nested loop or merge join. It typically uses a hash join when an index does not exist or when existing indexes are not useful. Assigns a Build and Probe Input The query optimizer assigns a build and probe input. If the query optimizer incorrectly assigns the build and probe input (this may occur because of imprecise density estimates), it reverses them dynamically. The ability to change input roles dynamically is called role reversal. Build input consists of the column values from a table with the lowest number of rows. Build input creates a hash table in memory to store these values. The hash bucket is a storage place in the hash table in which each row of the build input is inserted. Rows from one of the join tables are placed into the hash bucket where the hash key value of the row matches the hash key value of the bucket. Hash buckets are stored as a linked list and only contain the columns that are needed for the query. A hash table contains hash buckets. The hash table is created from the build input. Probe input consists of the column values from the table with the most rows. Probe input is what the build input checks to find a match in the hash buckets. Note The query optimizer uses column or index statistics to help determine which input is the smaller of the two. Processing a Hash Join The following list is a simplified description of how the query optimizer processes a hash join. It is not intended to be comprehensive because the algorithm is very complex. SQL Server: 1. Reads the probe input. Each probe input is processed one row at a time. 2. Performs the hash algorithm against each probe input and generates a hash key value. 3. Finds the hash bucket that matches the hash key value. 4. Accesses the hash bucket and looks for the matching row. 5. Returns the row if a match is found. Performance Considerations Consider the following facts about the hash joins that the query optimizer uses:  Similar to merge joins, a hash join is very efficient, because it uses hash buckets, which are like a dynamic index but with less overhead for combining rows.  Hash joins can be performed for all types of join operations (except cross join operations), including UNION and DIFFERENCE operations.  A hash operator can remove duplicates and group data, such as SUM (salary) GROUP BY department. The query optimizer uses only one input for both the build and probe roles.  If join inputs are large and are of similar size, the performance of a hash join operation is similar to a merge join with prior sorting. However, if the size of the join inputs is significantly different, the performance of a hash join is often much faster.  Hash joins can process large, unsorted, non-indexed inputs efficiently. Hash joins are useful in complex queries because the intermediate results: • Are not indexed (unless explicitly saved to disk and then indexed). • Are often not sorted for the next operation in the execution plan.  The query optimizer can identify incorrect estimates and make corrections dynamically to process the query more efficiently.  A hash join reduces the need for database denormalization. Denormalization is typically used to achieve better performance by reducing join operations despite redundancy, such as inconsistent updates. Hash joins give you the option to vertically partition your data as part of your physical database design. Vertical partitioning represents groups of columns from a single table in separate files or indexes. Subquery Performance Joins Are Not Inherently Better Than Subqueries Here is an example showing three different ways to update a table, using a second table for lookup purposes. The first uses a JOIN with the update, the second uses a regular introduced with IN, and the third uses a correlated subquery. All three yield nearly identical performance. Note Note that performance comparisons cannot just be made based on I/Os. With HASHING and MERGING techniques, the number of reads may be the same for two queries, yet one may take a lot longer and use more memory resources. Also, always be sure to monitor statistics time. Suppose you want to add a 5 percent discount to order items in the Order Details table for which the supplier is Exotic Liquids, whose supplierid is 1. -- JOIN solution BEGIN TRAN UPDATE OD SET discount = discount + 0.05 FROM [Order Details] AS OD JOIN Products AS P ON OD.productid = P.productid WHERE supplierid = 1 ROLLBACK TRAN -- Regular subquery solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE productid IN (SELECT productid FROM Products WHERE supplierid = 1) ROLLBACK TRAN -- Correlated Subquery Solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE EXISTS(SELECT supplierid FROM Products WHERE [Order Details].productid = Products.productid AND supplierid = 1) ROLLBACK TRAN Internally, Your Join May Be Rewritten SQL Server’s query processor had many different ways of resolving your JOIN expressions. Subqueries may be converted to a JOIN with an implied distinct, which may result in a logical operator of SEMI JOIN. Compare the plans of the first two queries: USE credit select member_no from member where member_no in (select member_no from charge) select distinct m.member_no from member m join charge c on m.member_no = c.member_no The second query uses a HASH MATCH as the final step to remove the duplicates. The first query only had to do a semi join. For these queries, although the I/O values are the same, the first query (with the subquery) runs much faster (almost twice as fast). Another similar looking join is
A project model for the FreeBSD Project Niklas Saers Copyright © 2002-2005 Niklas Saers [ Split HTML / Single HTML ] Table of Contents Foreword 1 Overview 2 Definitions 2.1. Activity 2.2. Process 2.3. Hat 2.4. Outcome 2.5. FreeBSD 3 Organisational structure 4 Methodology model 4.1. Development model 4.2. Release branches 4.3. Model summary 5 Hats 5.1. General Hats 5.1.1. Contributor 5.1.2. Committer 5.1.3. Core Team 5.1.4. Maintainership 5.2. Official Hats 5.2.1. Documentation project manager 5.2.2. CVSup Mirror Site Coordinator 5.2.3. Internationalisation 5.2.4. Postmaster 5.2.5. Quality Assurance 5.2.6. Release Coordination 5.2.7. Public Relations & Corporate Liaison 5.2.8. Security Officer 5.2.9. Source Repository Manager 5.2.10. Election Manager 5.2.11. Web site Management 5.2.12. Ports Manager 5.2.13. Standards 5.2.14. Core Secretary 5.2.15. GNATS Administrator 5.2.16. Bugmeister 5.2.17. Donations Liaison Officer 5.2.18. Admin 5.3. Process dependent hats 5.3.1. Report originator 5.3.2. Bugbuster 5.3.3. Mentor 5.3.4. Vendor 5.3.5. Reviewers 5.3.6. CVSup Mirror Site Admin 6 Processes 6.1. Adding new and removing old committers 6.2. Adding/Removing an official CVSup Mirror 6.3. Committing code 6.4. Core election 6.5. Development of new features 6.6. Maintenance 6.7. Problem reporting 6.8. Reacting to misbehaviour 6.9. Release engineering 7 Tools 7.1. Concurrent Versions System (CVS) 7.2. CVSup 7.3. GNATS 7.4. Mailman 7.5. Perforce 7.6. Pretty Good Privacy 7.7. Secure Shell 8 Sub-projects 8.1. The Ports Subproject 8.2. The FreeBSD Documentation Project References List of Figures 3-1. The FreeBSD Project's structure 3-2. The FreeBSD Project's structure with committers in categories 4-1. Jørgenssen's model for change integration 4-2. The FreeBSD release tree 4-3. The overall development model 5-1. Overview of official hats 6-1. Process summary: adding a new committer 6-2. Process summary: removing a committer 6-3. Process summary: adding a CVSup mirror 6-4. Process summary: A committer commits code 6-5. Process summary: A contributor commits code 6-6. Process summary: Core elections 6-7. Jørgenssen's model for change integration 6-8. Process summary: problem reporting 6-9. Process summary: release engineering 8-1. Number of ports added between 1996 and 2005 Foreword Up until now, the FreeBSD project has released a number of described techniques to do different parts of work. However, a project model summarising how the project is structured is needed because of the increasing amount of project members. [1] This paper will provide such a project model and is donated to the FreeBSD Documentation project where it can evolve together with the project so that it can at any point in time reflect the way the project works. It is based on [Saers, 2003]. I would like to thank the following people for taking the time to explain things that were unclear to me and for proofreading the document. Andrey A. Chernov Bruce A. Mah Dag-Erling Smørgrav Giorgos Keramidas Ingvil Hovig Jesper Holck John Baldwin John Polstra Kirk McKusick Mark Linimon Marleen Devos Niels Jørgenssen Nik Clayton Poul-Henning Kamp Simon L. Nielsen Chapter 1 Overview A project model is a means to reduce the communications overhead in a project. As shown by [Brooks, 1995], increasing the number of project participants increases the communication in the project exponentionally. FreeBSD has during the past few year increased both its mass of active users and committers, and the communication in the project has risen accordingly. This project model will serve to reduce this overhead by providing an up-to-date description of the project. During the Core elections in 2002, Mark Murray stated “I am opposed to a long rule-book, as that satisfies lawyer-tendencies, and is counter to the technocentricity that the project so badly needs.” [FreeBSD, 2002B]. This project model is not meant to be a tool to justify creating impositions for developers, but as a tool to facilitate coordination. It is meant as a description of the project, with an overview of how the different processes are executed. It is an introduction to how the FreeBSD project works. The FreeBSD project model will be described as of July 1st, 2004. It is based on the Niels Jørgensen's paper [Jørgensen, 2001], FreeBSD's official documents, discussions on FreeBSD mailing lists and interviews with developers. After providing definitions of terms used, this document will outline the organisational structure (including role descriptions and communication lines), discuss the methodology model and after presenting the tools used for process control, it will present the defined processes. Finally it will outline major sub-projects of the FreeBSD project. [FreeBSD, 2002A, Section 1.2 and 1.3] give the vision and the architectural guidelines for the project. The vision is “To produce the best UNIX-like operating system package possible, with due respect to the original software tools ideology as well as usability, performance and stability.” The architectural guidelines help determine whether a problem that someone wants to be solved is within the scope of the project Chapter 2 Definitions 2.1. Activity An “activity” is an element of work performed during the course of a project [PMI, 2000]. It has an output and leads towards an outcome. Such an output can either be an input to another activity or a part of the process' delivery. 2.2. Process A “process” is a series of activities that lead towards a particular outcome. A process can consist of one or more sub-processes. An example of a process is software design. 2.3. Hat A “hat” is synonymous with role. A hat has certain responsibilities in a process and for the process outcome. The hat executes activities. It is well defined what issues the hat should be contacted about by the project members and people outside the project. 2.4. Outcome An “outcome” is the final output of the process. This is synonymous with deliverable, that is defined as “any measurable, tangible, verifiable outcome, result or item that must be produced to complete a project or part of a project. Often used more narrowly in reference to an external deliverable, which is a deliverable that is subject to approval by the project sponsor or customer” by [PMI, 2000]. Examples of outcomes are a piece of software, a decision made or a report written. 2.5. FreeBSD When saying “FreeBSD” we will mean the BSD derivative UNIX-like operating system FreeBSD, whereas when saying “the FreeBSD Project” we will mean the project organisation. Chapter 3 Organisational structure While no-one takes ownership of FreeBSD, the FreeBSD organisation is divided into core, committers and contributors and is part of the FreeBSD community that lives around it. Figure 3-1. The FreeBSD Project's structure Number of committers has been determined by going through CVS logs from January 1st, 2004 to December 31st, 2004 and contributors by going through the list of contributions and problem reports. The main resource in the FreeBSD community is its developers: the committers and contributors. It is with their contributions that the project can move forward. Regular developers are referred to as contributors. As by January 1st, 2003, there are an estimated 5500 contributors on the project. Committers are developers with the privilege of being able to commit changes. These are usually the most active developers who are willing to spend their time not only integrating their own code but integrating code submitted by the developers who do not have this privilege. They are also the developers who elect the core team, and they have access to closed discussions. The project can be grouped into four distinct separate parts, and most developers will focus their involvement in one part of FreeBSD. The four parts are kernel development, userland development, ports and documentation. When referring to the base system, both kernel and userland is meant. This split changes our triangle to look like this: Figure 3-2. The FreeBSD Project's structure with committers in categories Number of committers per area has been determined by going through CVS logs from January 1st, 2004 to December 31st, 2004. Note that many committers work in multiple areas, making the total number higher than the real number of committers. The total number of committers at that time was 269. Committers fall into three groups: committers who are only concerned with one area of the project (for instance file systems), committers who are involved only with one sub-project and committers who commit to different parts of the code, including sub-projects. Because some committers work on different parts, the total number in the committers section of the triangle is higher than in the above triangle. The kernel is the main building block of FreeBSD. While the userland applications are protected against faults in other userland applications, the entire system is vulnerable to errors in the kernel. This, combined with the vast amount of dependencies in the kernel and that it is not easy to see all the consequences of a kernel change, demands developers with a relative full understanding of the kernel. Multiple development efforts in the kernel also requires a closer coordination than userland applications do. The core utilities, known as userland, provide the interface that identifies FreeBSD, both user interface, shared libraries and external interfaces to connecting clients. Currently, 162 people are involved in userland development and maintenance, many being maintainers for their own part of the code. Maintainership will be discussed in the Maintainership section. Documentation is handled by The FreeBSD Documentation Project and includes all documents surrounding the FreeBSD project, including the web pages. There were during 2004 101 people making commits to the FreeBSD Documentation Project. Ports is the collection of meta-data that is needed to make software packages build correctly on FreeBSD. An example of a port is the port for the web-browser Mozilla. It contains information about where to fetch the source, what patches to apply and how, and how the package should be installed on the system. This allows automated tools to fetch, build and install the package. As of this writing, there are more than 12600 ports available. [2] , ranging from web servers to games, programming languages and most of the application types that are in use on modern computers. Ports will be discussed further in the section The Ports Subproject. Chapter 4 Methodology model 4.1. Development model There is no defined model for how people write code in FreeBSD. However, Niels Jørgenssen has suggested a model of how written code is integrated into the project. Figure 4-1. Jørgenssen's model for change integration The “development release” is the FreeBSD-CURRENT ("-CURRENT") branch and the “production release” is the FreeBSD-STABLE branch ("-STABLE") [Jørgensen, 2001]. This is a model for one change, and shows that after coding, developers seek community review and try integrating it with their own systems. After integrating the change into the development release, called FreeBSD-CURRENT, it is tested by many users and developers in the FreeBSD community. After it has gone through enough testing, it is merged into the production release, called FreeBSD-STABLE. Unless each stage is finished successfully, the developer needs to go back and make modifications in the code and restart the process. To integrate a change with either -CURRENT or -STABLE is called making a commit. Jørgensen found that most FreeBSD developers work individually, meaning that this model is used in parallel by many developers on the different ongoing development efforts. A developer can also be working on multiple changes, so that while he is waiting for review or people to test one or more of his changes, he may be writing another change. As each commit represents an increment, this is a massively incremental model. The commits are in fact so frequent that during one year [3] , 85427 commits were made, making a daily average of 233 commits. Within the “code” bracket in Jørgensen's figure, each programmer has his own working style and follows his own development models. The bracket could very well have been called “development” as it includes requirements gathering and analysis, system and detailed design, implementation and verification. However, the only output from these stages is the source code or system documentation. From a stepwise model's perspective (such as the waterfall model), the other brackets can be seen as further verification and system integration. This system integration is also important to see if a change is accepted by the community. Up until the code is committed, the developer is free to choose how much to communicate about it to the rest of the project. In order for -CURRENT to work as a buffer (so that bright ideas that had some undiscovered drawbacks can be backed out) the minimum time a commit should be in -CURRENT before merging it to -STABLE is 3 days. Such a merge is referred to as an MFC (Merge From Current). It is important to notice the word “change”. Most commits do not contain radical new features, but are maintenance updates. The only exceptions from this model are security fixes and changes to features that are deprecated in the -CURRENT branch. In these cases, changes can be committed directly to the -STABLE branch. In addition to many people working on the project, there are many related projects to the FreeBSD Project. These are either projects developing brand new features, sub-projects or projects whose outcome is incorporated into FreeBSD [4]. These projects fit into the FreeBSD Project just like regular development efforts: they produce code that is integrated with the FreeBSD Project. However, some of them (like Ports and Documentation) have the privilege of being applicable to both branches or commit directly to both -CURRENT and -STABLE. There is no standards to how design should be done, nor is design collected in a centralised repository. The main design is that of 4.4BSD. [5] As design is a part of the “Code” bracket in Jørgenssen's model, it is up to every developer or sub-project how this should be done. Even if the design should be stored in a central repository, the output from the design stages would be of limited use as the differences of methodologies would make them poorly if at all interoperable. For the overall design of the project, the project relies on the sub-projects to negotiate fit interfaces between each other rather than to dictate interfacing. 4.2. Release branches The releases of FreeBSD is best illustrated by a tree with many branches where each major branch represents a major version. Minor versions are represented by branches of the major branches. In the following release tree, arrows that follow one-another in a particular direction represent a branch. Boxes with full lines and diamonds represent official releases. Boxes with dotted lines represent the development branch at that time. Security branches are represented by ovals. Diamonds differ from boxes in that they represent a fork, meaning a place where a branch splits into two branches where one of the branches becomes a sub-branch. For example, at 4.0-RELEASE the 4.0-CURRENT branch split into 4-STABLE and 5.0-CURRENT. At 4.5-RELEASE, the branch forked off a security branch called RELENG_4_5. Figure 4-2. The FreeBSD release tree The latest -CURRENT version is always referred to as -CURRENT, while the latest -STABLE release is always referred to as -STABLE. In this figure, -STABLE refers to 4-STABLE while -CURRENT refers to 5.0-CURRENT following 5.0-RELEASE. [FreeBSD, 2002E] A “major release” is always made from the -CURRENT branch. However, the -CURRENT branch does not need to fork at that point in time, but can focus on stabilising. An example of this is that following 3.0-RELEASE, 3.1-RELEASE was also a continuation of the -CURRENT-branch, and -CURRENT did not become a true development branch until this version was released and the 3-STABLE branch was forked. When -CURRENT returns to becoming a development branch, it can only be followed by a major release. 5-STABLE is predicted to be forked off 5.0-CURRENT at around 5.3-RELEASE. It is not until 5-STABLE is forked that the development branch will be branded 6.0-CURRENT. A “minor release” is made from the -CURRENT branch following a major release, or from the -STABLE branch. Following and including, 4.3-RELEASE[6], when a minor release has been made, it becomes a “security branch”. This is meant for organisations that do not want to follow the -STABLE branch and the potential new/changed features it offers, but instead require an absolutely stable environment, only updating to implement security updates. [7] Each update to a security branch is called a “patchlevel”. For every security enhancement that is done, the patchlevel number is increased, making it easy for people tracking the branch to see what security enhancements they have implemented. In cases where there have been especially serious security flaws, an entire new release can be made from a security branch. An example of this is 4.6.2-RELEASE. 4.3. Model summary To summarise, the development model of FreeBSD can be seen as the following tree: Figure 4-3. The overall development model The tree of the FreeBSD development with ongoing development efforts and continuous integration. The tree symbolises the release versions with major versions spawning new main branches and minor versions being versions of the main branch. The top branch is the -CURRENT branch where all new development is integrated, and the -STABLE branch is the branch directly below it. Clouds of development efforts hang over the project where developers use the development models they see fit. The product of their work is then integrated into -CURRENT where it undergoes parallel debugging and is finally merged from -CURRENT into -STABLE. Security fixes are merged from -STABLE to the security branches. Chapter 5 Hats Many committers have a special area of responsibility. These roles are called hats [Losh, 2002]. These hats can be either project roles, such as public relations officer, or maintainer for a certain area of the code. Because this is a project where people give voluntarily of their spare time, people with assigned hats are not always available. They must therefore appoint a deputy that can perform the hat's role in his or her absence. The other option is to have the role held by a group. Many of these hats are not formalised. Formalised hats have a charter stating the exact purpose of the hat along with its privileges and responsibilities. The writing of such charters is a new part of the project, and has thus yet to be completed for all hats. These hat descriptions are not such a formalisation, rather a summary of the role with links to the charter where available and contact addresses, 5.1. General Hats 5.1.1. Contributor A Contributor contributes to the FreeBSD project either as a developer, as an author, by sending problem reports, or in other ways contributing to the progress of the project. A contributor has no special privileges in the FreeBSD project. [FreeBSD, 2002F] 5.1.2. Committer A person who has the required privileges to add his code or documentation to the repository. A committer has made a commit within the past 12 months. [FreeBSD, 2000A] An active committer is a committer who has made an average of one commit per month during that time. It is worth noting that there are no technical barriers to prevent someone, once having gained commit privileges to the main- or a sub-project, to make commits in parts of that project's source the committer did not specifically get permission to modify. However, when wanting to make modifications to parts a committer has not been involved in before, he/she should read the logs to see what has happened in this area before, and also read the MAINTAINER file to see if the maintainer of this part has any special requests on how changes in the code should be made 5.1.3. Core Team The core team is elected by the committers from the pool of committers and serves as the board of directors of the FreeBSD project. It promotes active contributors to committers, assigns people to well-defined hats, and is the final arbiter of decisions involving which way the project should be heading. As by July 1st, 2004, core consisted of 9 members. Elections are held every two years. 5.1.4. Maintainership Maintainership means that that person is responsible for what is allowed to go into that area of the code and has the final say should disagreements over the code occur. This involves involves proactive work aimed at stimulating contributions and reactive work in reviewing commits. With the FreeBSD source comes the MAINTAINERS file that contains a one-line summary of how each maintainer would like contributions to be made. Having this notice and contact information enables developers to focus on the development effort rather than being stuck in a slow correspondence should the maintainer be unavailable for some time. If the maintainer is unavailable for an unreasonably long period of time, and other people do a significant amount of work, maintainership may be switched without the maintainer's approval. This is based on the stance that maintainership should be demonstrated, not declared. Maintainership of a particular piece of code is a hat that is not held as a group. 5.2. Official Hats The official hats in the FreeBSD Project are hats that are more or less formalised and mainly administrative roles. They have the authority and responsibility for their area. The following illustration shows the responsibility lines. After this follows a description of each hat, including who it is held by. Figure 5-1. Overview of official hats All boxes consist of groups of committers, except for the dotted boxes where the holders are not necessarily committers. The flattened circles are sub-projects and consist of both committers and non-committers of the main project. 5.2.1. Documentation project manager The FreeBSD Documentation Project architect is responsible for defining and following up documentation goals for the committers in the Documentation project. Hat held by: The DocEng team . The DocEng Charter. 5.2.2. CVSup Mirror Site Coordinator The CVSup Mirror Site Coordinator coordinates all the CVSup Mirror Site Admins to ensure that they are distributing current versions of the software, that they have the capacity to update themselves when major updates are in progress, and making it easy for the general public to find their closest CVSup mirror. Hat currently held by: John Polstra . 5.2.3. Internationalisation The Internationalisation hat is responsible for coordinating the localisation efforts of the FreeBSD kernel and userland utilities. The translation effort are coordinated by The FreeBSD Documentation Project. The Internationalisation hat should suggest and promote standards and guidelines for writing and maintaining the software in a fashion that makes it easier to translate. Hat currently available. 5.2.4. Postmaster The Postmaster is responsible for mail being correctly delivered to the committers' email address. He is also responsible for ensuring that the mailing lists work and should take measures against possible disruptions of mail such as having troll-, spam- and virus-filters. Hat currently held by: David Wolfskill . 5.2.5. Quality Assurance The responsibilities of this role are to manage the quality assurance measures. Hat currently held by: Robert Watson . 5.2.6. Release Coordination The responsibilities of the Release Engineering Team are Setting, publishing and following a release schedule for official releases Documenting and formalising release engineering procedures Creation and maintenance of code branches Coordinating with the Ports and Documentation teams to have an updated set of packages and documentation released with the new releases Coordinating with the Security team so that pending releases are not affected by recently disclosed vulnerabilities. Further information about the development process is available in the release engineering section. Hat held by: the Release Engineering team , currently headed by Murray Stokely . The Release Engineering Charter. 5.2.7. Public Relations & Corporate Liaison The Public Relations & Corporate Liaison's responsibilities are: Making press statements when happenings that are important to the FreeBSD Project happen. Being the official contact person for corporations that are working close with the FreeBSD Project. Take steps to promote FreeBSD within both the Open Source community and the corporate world. Handle the “freebsd-advocacy” mailing list. This hat is currently not occupied. 5.2.8. Security Officer The Security Officer's main responsibility is to coordinate information exchange with others in the security community and in the FreeBSD project. The Security Officer is also responsible for taking action when security problems are reported and promoting proactive development behaviour when it comes to security. Because of the fear that information about vulnerabilities may leak out to people with malicious intent before a patch is available, only the Security Officer, consisting of an officer, a deputy and two Core team members, receive sensitive information about security issues. However, to create or implement a patch, the Security Officer has the Security Officer Team to help do the work. Hat held by: the Security Officer , currently headed by Colin Percival . The Security Officer and The Security Officer Team's charter. 5.2.9. Source Repository Manager The Source Repository Manager is the only one who is allowed to directly modify the repository without using the CVS tool. It is his/her responsibility to ensure that technical problems that arise in the repository are resolved quickly. The source repository manager has the authority to back out commits if this is necessary to resolve a CVS technical problem. Hat held by: the Source Repository Manager , currently headed by Peter Wemm . 5.2.10. Election Manager The Election Manager is responsible for the Core election process. The manager is responsible for running and maintaining the election system, and is the final authority should minor unforseen events happen in the election process. Major unforseen events have to be discussed with the Core team Hat held only during elections. 5.2.11. Web site Management The Web site Management hat is responsible for coordinating the rollout of updated web pages on mirrors around the world, for the overall structure of the primary web site and the system it is running upon. The management needs to coordinate the content with The FreeBSD Documentation Project and acts as maintainer for the “www” tree. Hat held by: the FreeBSD Webmasters . 5.2.12. Ports Manager The Ports Manager acts as a liaison between The Ports Subproject and the core project, and all requests from the project should go to the ports manager. Hat held by: the Ports Management Team , 5.2.13. Standards The Standards hat is responsible for ensuring that FreeBSD complies with the standards it is committed to , keeping up to date on the development of these standards and notifying FreeBSD developers of important changes that allows them to take a proactive role and decrease the time between a standards update and FreeBSD's compliancy. Hat currently held by: Garrett Wollman . 5.2.14. Core Secretary The Core Secretary's main responsibility is to write drafts to and publish the final Core Reports. The secretary also keeps the core agenda, thus ensuring that no balls are dropped unresolved. Hat currently held by: Joel Dahl . 5.2.15. GNATS Administrator The GNATS Administrator is responsible for ensuring that the maintenance database is in working order, that the entries are correctly categorised and that there are no invalid entries. Hat currently held by: Ceri Davies and Mark Linimon . 5.2.16. Bugmeister The Bugmeister is the person in charge of the problem report group. Hat currently held by: Ceri Davies and Mark Linimon . 5.2.17. Donations Liaison Officer The task of the donations liason officer is to match the developers with needs with people or organisations willing to make a donation. The Donations Liason Charter is available here Hat held by: the Donations Liaison Office , currently headed by Michael W. Lucas . 5.2.18. Admin (Also called “FreeBSD Cluster Admin”) The admin team consists of the people responsible for administrating the computers that the project relies on for its distributed work and communication to be synchronised. It consists mainly of those people who have physical access to the servers. Hat held by: the Admin team , currently headed by Mark Murray 5.3. Process dependent hats 5.3.1. Report originator The person originally responsible for filing a Problem Report. 5.3.2. Bugbuster A person who will either find the right person to solve the problem, or close the PR if it is a duplicate or otherwise not an interesting one. 5.3.3. Mentor A mentor is a committer who takes it upon him/her to introduce a new committer to the project, both in terms of ensuring the new committers setup is valid, that the new committer knows the available tools required in his/her work and that the new committer knows what is expected of him/her in terms of behaviour. 5.3.4. Vendor The person(s) or organisation whom external code comes from and whom patches are sent to. 5.3.5. Reviewers People on the mailing list where the request for review is posted. 5.3.6. CVSup Mirror Site Admin A CVSup Mirror Site Admin has accesses to a server that he/she uses to mirror the CVS repository. The admin works with the CVSup Mirror Site Coordinator to ensure the site remains up-to-date and is following the general policy of official mirror sites. Chapter 6 Processes The following section will describe the defined project processes. Issues that are not handled by these processes happen on an ad-hoc basis based on what has been customary to do in similar cases. 6.1. Adding new and removing old committers The Core team has the responsibility of giving and removing commit privileges to contributors. This can only be done through a vote on the core mailing list. The ports and documentation sub-projects can give commit privileges to people working on these projects, but have to date not removed such privileges. Normally a contributor is recommended to core by a committer. For contributors or outsiders to contact core asking to be a committer is not well thought of and is usually rejected. If the area of particular interest for the developer potentially overlaps with other committers' area of maintainership, the opinion of those maintainers is sought. However, it is frequently this committer that recommends the developer. When a contributor is given committer status, he is assigned a mentor. The committer who recommended the new committer will, in the general case, take it upon himself to be the new committers mentor. When a contributor is given his commit bit, a PGP-signed email is sent from either Core Secretary, Ports Manager or nik@freebsd.org to both admins@freebsd.org, the assigned mentor, the new committer and core confirming the approval of a new account. The mentor then gathers a password line, SSH 2 public key and PGP key from the new committer and sends them to Admin. When the new account is created, the mentor activates the commit bit and guides the new committer through the rest of the initial process. Figure 6-1. Process summary: adding a new committer When a contributor sends a piece of code, the receiving committer may choose to recommend that the contributor is given commit privileges. If he recommends this to core, they will vote on this recommendation. If they vote in favour, a mentor is assigned the new committer and the new committer has to email his details to the administrators for an account to be created. After this, the new committer is all set to make his first commit. By tradition, this is by adding his name to the committers list. Recall that a committer is considered to be someone who has committed code during the past 12 months. However, it is not until after 18 months of inactivity have passed that commit privileges are eligible to be revoked. [FreeBSD, 2002H] There are, however, no automatic procedures for doing this. For reactions concerning commit privileges not triggered by time, see section 1.5.8. Figure 6-2. Process summary: removing a committer When Core decides to clean up the committers list, they check who has not made a commit for the past 18 months. Committers who have not done so have their commit bits revoked. It is also possible for committers to request that their commit bit be retired if for some reason they are no longer going to be actively committing to the project. In this case, it can also be restored at a later time by core, should the committer ask. Roles in this process: Core team Contributor Committer Maintainership Mentor [FreeBSD, 2000A] [FreeBSD, 2002H] [FreeBSD, 2002I] 6.2. Adding/Removing an official CVSup Mirror A CVSup mirror is a replica of the official CVSup master that contains all the up-to-date source code for all the branches in the FreeBSD project, ports and documentation. Adding an official CVSup mirror starts with the potential CVSup Mirror Site Admin installing the “cvsup-mirror” package. Having done this and updated the source code with a mirror site, he now runs a fairly recent unofficial CVSup mirror. Deciding he has a stable environment, the processing power, the network capacity and the storage capacity to run an official mirror, he mails the CVSup Mirror Site Coordinator who decides whether the mirror should become an official mirror or not. In making this decision, the CVSup Mirror Site Coordinator has to determine whether that geographical area needs another mirror site, if the mirror administrator has the skills to run it reliably, if the network bandwidth is adequate and if the master server has the capacity to server another mirror. If CVSup Mirror Site Coordinator decides that the mirror should become an official mirror, he obtains an authentication key from the mirror admin that he installs so the mirror admin can update the mirror from the master server. Figure 6-3. Process summary: adding a CVSup mirror When a CVSup mirror administrator of an unofficial mirror offers to become an official mirror site, the CVSup coordinator decides if another mirror is needed and if there is sufficient capacity to accommodate it. If so, an authorisation key is requested and the mirror is given access to the main distribution site and added to the list of official mirrors. Tools used in this process: CVSup SSH 2 Hats involved in this process: CVSup Mirror Site Coordinator CVSup Mirror Site Admin 6.3. Committing code The committing of new or modified code is one of the most frequent processes in the FreeBSD project and will usually happen many times a day. Committing of code can only be done by a “committer”. Committers commit either code written by themselves, code submitted to them or code submitted through a problem report. When code is written by the developer that is non-trivial, he should seek a code review from the community. This is done by sending mail to the relevant list asking for review. Before submitting the code for review, he should ensure it compiles correctly with the entire tree and that all relevant tests run. This is called “pre-commit test”. When contributed code is received, it should be reviewed by the committer and tested the same way. When a change is committed to a part of the source that has been contributed from an outside Vendor, the maintainer should ensure that the patch is contributed back to the vendor. This is in line with the open source philosophy and makes it easier to stay in sync with outside projects as the patches do not have to be reapplied every time a new release is made. After the code has been available for review and no further changes are necessary, the code is committed into the development branch, -CURRENT. If the change applies for the -STABLE branch or the other branches as well, a “Merge From Current” ("MFC") countdown is set by the committer. After the number of days the committer chose when setting the MFC have passed, an email will automatically be sent to the committer reminding him to commit it to the -STABLE branch (and possibly security branches as well). Only security critical changes should be merged to security branches. Delaying the commit to -STABLE and other branches allows for “parallel debugging” where the committed code is tested on a wide range of configurations. This makes changes to -STABLE to contain fewer faults and thus giving the branch its name. Figure 6-4. Process summary: A committer commits code When a committer has written a piece of code and wants to commit it, he first needs to determine if it is trivial enough to go in without prior review or if it should first be reviewed by the developer community. If the code is trivial or has been reviewed and the committer is not the maintainer, he should consult the maintainer before proceeding. If the code is contributed by an outside vendor, the maintainer should create a patch that is sent back to the vendor. The code is then committed and the deployed by the users. Should they find problems with the code, this will be reported and the committer can go back to writing a patch. If a vendor is affected, he can choose to implement or ignore the patch. Figure 6-5. Process summary: A contributor commits code The difference when a contributor makes a code contribution is that he submits the code through the send-pr program. This report is picked up by the maintainer who reviews the code and commits it. Hats included in this process are: Committer Contributor Vendor Reviewer [FreeBSD, 2001] [Jørgensen, 2001] 6.4. Core election Core elections are held at least every two years. [8] Nine core members are elected. New elections are held if the number of core members drops below seven. New elections can also be held should at least 1/3 of the active committers demand this. When an election is to take place, core announces this at least 6 weeks in advance, and appoints an election manager to run the elections. Only committers can be elected into core. The candidates need to submit their candidacy at least one week before the election starts, but can refine their statements until the voting starts. They are presented in the candidates list. When writing their election statements, the candidates must answer a few standard questions submitted by the election manager. During elections, the rule that a committer must have committed during the 12 past months is followed strictly. Only these committers are eligible to vote. When voting, the committer may vote once in support of up to nine nominees. The voting is done over a period of four weeks with reminders being posted on “developers” mailing list that is available to all committers. The election results are released one week after the election ends, and the new core team takes office one week after the results have been posted. Should there be a voting tie, this will be resolved by the new, unambiguously elected core members. Votes and candidate statements are archived, but the archives are not publicly available. Figure 6-6. Process summary: Core elections Core announces the election and selects an election manager. He prepares the elections, and when ready, candidates can announce their candidacies through submitting their statements. The committers then vote. After the vote is over, the election results are announced and the new core team takes office. Hats in core elections are: Core team Committer Election Manager [FreeBSD, 2000A] [FreeBSD, 2002B] [FreeBSD, 2002G] 6.5. Development of new features Within the project there are sub-projects that are working on new features. These projects are generally done by one person [Jørgensen, 2001]. Every project is free to organise development as it sees fit. However, when the project is merged to the -CURRENT branch it must follow the project guidelines. When the code has been well tested in the -CURRENT branch and deemed stable enough and relevant to the -STABLE branch, it is merged to the -STABLE branch. The requirements of the project are given by developer wishes, requests from the community in terms of direct requests by mail, Problem Reports, commercial funding for the development of features, or contributions by the scientific community. The wishes that come within the responsibility of a developer are given to that developer who prioritises his time between the request and his wishes. A common way to do this is maintain a TODO-list maintained by the project. Items that do not come within someone's responsibility are collected on TODO-lists unless someone volunteers to take the responsibility. All requests, their distribution and follow-up are handled by the GNATS tool. Requirements analysis happens in two ways. The requests that come in are discussed on mailing lists, both within the main project and in the sub-project that the request belongs to or is spawned by the request. Furthermore, individual developers on the sub-project will evaluate the feasibility of the requests and determine the prioritisation between them. Other than archives of the discussions that have taken place, no outcome is created by this phase that is merged into the main project. As the requests are prioritised by the individual developers on the basis of doing what they find interesting, necessary or are funded to do, there is no overall strategy or priorisation of what requests to regard as requirements and following up their correct implementation. However, most developers have some shared vision of what issues are more important, and they can ask for guidelines from the release engineering team. The verification phase of the project is two-fold. Before committing code to the current-branch, developers request their code to be reviewed by their peers. This review is for the most part done by functional testing, but also code review is important. When the code is committed to the branch, a broader functional testing will happen, that may trigger further code review and debugging should the code not behave as expected. This second verification form may be regarded as structural verification. Although the sub-projects themselves may write formal tests such as unit tests, these are usually not collected by the main project and are usually removed before the code is committed to the current branch. [9] 6.6. Maintenance It is an advantage to the project to for each area of the source have at least one person that knows this area well. Some parts of the code have designated maintainers. Others have de-facto maintainers, and some parts of the system do not have maintainers. The maintainer is usually a person from the sub-project that wrote and integrated the code, or someone who has ported it from the platform it was written for. [10] The maintainer's job is to make sure the code is in sync with the project the code comes from if it is contributed code, and apply patches submitted by the community or write fixes to issues that are discovered. The main bulk of work that is put into the FreeBSD project is maintenance. [Jørgensen, 2001] has made a figure showing the life cycle of changes. Figure 6-7. Jørgenssen's model for change integration Here “development release” refers to the -CURRENT branch while “production release” refers to the -STABLE branch. The “pre-commit test” is the functional testing by peer developers when asked to do so or trying out the code to determine the status of the sub-project. “Parallel debugging” is the functional testing that can trigger more review, and debugging when the code is included in the -CURRENT branch. As of this writing, there were 269 committers in the project. When they commit a change to a branch, that constitutes a new release. It is very common for users in the community to track a particular branch. The immediate existence of a new release makes the changes widely available right away and allows for rapid feedback from the community. This also gives the community the response time they expect on issues that are of importance to them. This makes the community more engaged, and thus allows for more and better feedback that again spurs more maintenance and ultimately should create a better product. Before making changes to code in parts of the tree that has a history unknown to the committer, the committer is required to read the commit logs to see why certain features are implemented the way they are in order not to make mistakes that have previously either been thought through or resolved. 6.7. Problem reporting FreeBSD comes with a problem reporting tool called “send-pr” that is a part of the GNATS package. All users and developers are encouraged to use this tool for reporting problems in software they do not maintain. Problems include bug reports, feature requests, features that should be enhanced and notices of new versions of external software that is included in the project. Problem reports are sent to an email address where it is inserted into the GNATS maintenance database. A Bugbuster classifies the problem and sends it to the correct group or maintainer within the project. After someone has taken responsibility for the report, the report is being analysed. This analysis includes verifying the problem and thinking out a solution for the problem. Often feedback is required from the report originator or even from the FreeBSD community. Once a patch for the problem is made, the originator may be asked to try it out. Finally, the working patch is integrated into the project, and documented if applicable. It there goes through the regular maintenance cycle as described in section maintenance. These are the states a problem report can be in: open, analyzed, feedback, patched, suspended and closed. The suspended state is for when further progress is not possible due to the lack of information or for when the task would require so much work that nobody is working on it at the moment. Figure 6-8. Process summary: problem reporting A problem is reported by the report originator. It is then classified by a bugbuster and handed to the correct maintainer. He verifies the problem and discusses the problem with the originator until he has enough information to create a working patch. This patch is then committed and the problem report is closed. The roles included in this process are: Report originator Maintainership Bugbuster [FreeBSD, 2002C]. [FreeBSD, 2002D] 6.8. Reacting to misbehaviour [FreeBSD, 2001] has a number of rules that committers should follow. However, it happens that these rules are broken. The following rules exist in order to be able to react to misbehaviour. They specify what actions will result in how long a suspension the committer's commit privileges. Committing during code freezes without the approval of the Release Engineering team - 2 days Committing to a security branch without approval - 2 days Commit wars - 5 days to all participating parties Impolite or inappropriate behaviour - 5 days [Lehey, 2002] For the suspensions to be efficient, any single core member can implement a suspension before discussing it on the “core” mailing list. Repeat offenders can, with a 2/3 vote by core, receive harsher penalties, including permanent removal of commit privileges. (However, the latter is always viewed as a last resort, due to its inherent tendency to create controversy). All suspensions are posted to the “developers” mailing list, a list available to committers only. It is important that you cannot be suspended for making technical errors. All penalties come from breaking social etiquette. Hats involved in this process: Core team Committer 6.9. Release engineering The FreeBSD project has a Release Engineering team with a principal release engineer that is responsible for creating releases of FreeBSD that can be brought out to the user community via the net or sold in retail outlets. Since FreeBSD is available on multiple platforms and releases for the different architectures are made available at the same time, the team has one person in charge of each architecture. Also, there are roles in the team responsible for coordinating quality assurance efforts, building a package set and for having an updated set of documents. When referring to the release engineer, a representative for the release engineering team is meant. When a release is coming, the FreeBSD project changes shape somewhat. A release schedule is made containing feature- and code-freezes, release of interim releases and the final release. A feature-freeze means no new features are allowed to be committed to the branch without the release engineers' explicit consent. Code-freeze means no changes to the code (like bugs-fixes) are allowed to be committed without the release engineers explicit consent. This feature- and code-freeze is known as stabilising. During the release process, the release engineer has the full authority to revert to older versions of code and thus "back out" changes should he find that the changes are not suitable to be included in the release. There are three different kinds of releases: .0 releases are the first release of a major version. These are branched of the -CURRENT branch and have a significantly longer release engineering cycle due to the unstable nature of the -CURRENT branch .X releases are releases of the -STABLE branch. They are scheduled to come out every 4 months. .X.Y releases are security releases that follow the .X branch. These come out only when sufficient security fixes have been merged since the last release on that branch. New features are rarely included, and the security team is far more involved in these than in regular releases. For releases of the -STABLE-branch, the release process starts 45 days before the anticipated release date. During the first phase, the first 15 days, the developers merge what changes they have had in -CURRENT that they want to have in the release to the release branch. When this period is over, the code enters a 15 day code freeze in which only bug fixes, documentation updates, security-related fixes and minor device driver changes are allowed. These changes must be approved by the release engineer in advance. At the beginning of the last 15 day period a release candidate is created for widespread testing. Updates are less likely to be allowed during this period, except for important bug fixes and security updates. In this final period, all releases are considered release candidates. At the end of the release process, a release is created with the new version number, including binary distributions on web sites and the creation of a CD-ROM images. However, the release is not considered "really released" until a PGP-signed message stating exactly that, is sent to the mailing list freebsd-announce; anything labelled as a "release" before that may well be in-process and subject to change before the PGP-signed message is sent. [11]. The releases of the -CURRENT-branch (that is, all releases that end with “.0”) are very similar, but with twice as long timeframe. It starts 8 weeks prior to the release with announcement of the release time line. Two weeks into the release process, the feature freeze is initiated and performance tweaks should be kept to a minimum. Four weeks prior to the release, an official beta version is made available. Two weeks prior to release, the code is officially branched into a new version. This version is given release candidate status, and as with the release engineering of -STABLE, the code freeze of the release candidate is hardened. However, development on the main development branch can continue. Other than these differences, the release engineering processes are alike. .0 releases go into their own branch and are aimed mainly at early adopters. The branch then goes through a period of stabilisation, and it is not until the Release Engineering Team> decides the demands to stability have been satisfied that the branch becomes -STABLE and -CURRENT targets the next major version. While this for the majority has been with .1 versions, this is not a demand. Most releases are made when a given date that has been deemed a long enough time since the previous release comes. A target is set for having major releases every 18 months and minor releases every 4 months. The user community has made it very clear that security and stability cannot be sacrificed by self-imposed deadlines and target release dates. For slips of time not to become too long with regards to security and stability issues, extra discipline is required when committing changes to -STABLE. Figure 6-9. Process summary: release engineering These are the stages in the release engineering process. Multiple release candidates may be created until the release is deemed stable enough to be released. [FreeBSD, 2002E] Chapter 7 Tools The major support tools for supporting the development process are CVS, CVSup, Perforce, GNATS, Mailman and OpenSSH. Except for CVSup, these are externally developed tools. These tools are commonly used in the open source world. 7.1. Concurrent Versions System (CVS) Concurrent Versions System or simply “CVS” is a system to handle multiple versions of text files and tracking who committed what changes and why. A project lives within a “repository” and different versions are considered different “branches”. 7.2. CVSup CVSup is a software package for distributing and updating collections of files across a network. It consists of a client program, cvsup, and a server program, cvsupd. The package is tailored specifically for distributing CVS repositories, and by taking advantage of CVS' properties, it performs updates much faster than traditional systems. 7.3. GNATS GNATS is a maintenance database consisting of a set of tools to track bugs at a central site. It supports the bug tracking process for sending and handling bugs as well as querying and updating the database and editing bug reports. The project uses one of its many client interfaces, “send-pr”, to send “Problem Reports” by email to the projects central GNATS server. The committers have also web and command-line clients available. 7.4. Mailman Mailman is a program that automates the management of mailing lists. The FreeBSD Project uses it to run 16 general lists, 60 technical lists, 4 limited lists and 5 lists with CVS commit logs. It is also used for many mailing lists set up and used by other people and projects in the FreeBSD community. General lists are lists for the general public, technical lists are mainly for the development of specific areas of interest, and closed lists are for internal communication not intended for the general public. The majority of all the communication in the project goes through these 85 lists [FreeBSD, 2003A, Appendix C]. 7.5. Perforce Perforce is a commercial software configuration management system developed by Perforce Systems that is available on over 50 operating systems. It is a collection of clients built around the Perforce server that contains the central file repository and tracks the operations done upon it. The clients are both clients for accessing the repository and administration of its configuration. 7.6. Pretty Good Privacy Pretty Good Privacy, better known as PGP, is a cryptosystem using a public key architecture to allow people to digitally sign and/or encrypt information in order to ensure secure communication between two parties. A signature is used when sending information out many recipients, enabling them to verify that the information has not been tampered with before they received it. In the FreeBSD Project this is the primary means of ensuring that information has been written by the person who claims to have written it, and not altered in transit. 7.7. Secure Shell Secure Shell is a standard for securely logging into a remote system and for executing commands on the remote system. It allows other connections, called tunnels, to be established and protected between the two involved systems. This standard exists in two primary versions, and only version two is used for the FreeBSD Project. The most common implementation of the standard is OpenSSH that is a part of the project's main distribution. Since its source is updated more often than FreeBSD releases, the latest version is also available in the ports tree. Chapter 8 Sub-projects Sub-projects are formed to reduce the amount of communication needed to coordinate the group of developers. When a problem area is sufficiently isolated, most communication would be within the group focusing on the problem, requiring less communication with the groups they communicate with than were the group not isolated. 8.1. The Ports Subproject A “port” is a set of meta-data and patches that are needed to fetch, compile and install correctly an external piece of software on a FreeBSD system. The amount of ports have grown at a tremendous rate, as shown by the following figure. Figure 8-1. Number of ports added between 1996 and 2005 Figure 8-1 is taken from the FreeBSD web site. It shows the number of ports available to FreeBSD in the period 1995 to 2005. It looks like the curve has first grown exponentionally, and then since the middle of 2001 grown linerly. As the external software described by the port often is under continued development, the amount of work required to maintain the ports is already large, and increasing. This has led to the ports part of the FreeBSD project gaining a more empowered structure, and is more and more becoming a sub-project of the FreeBSD project. Ports has its own core team with the Ports Manager as its leader, and this team can appoint committers without FreeBSD Core's approval. Unlike in the FreeBSD Project, where a lot of maintenance frequently is rewarded with a commit bit, the ports sub-project contains many active maintainers that are not committers. Unlike the main project, the ports tree is not branched. Every release of FreeBSD follows the current ports collection and has thus available updated information on where to find programs and how to build them. This, however, means that a port that makes dependencies on the system may need to have variations depending on what version of FreeBSD it runs on. With an unbranched ports repository it is not possible to guarantee that any port will run on anything other than -CURRENT and -STABLE, in particular older, minor releases. There is neither the infrastructure nor volunteer time needed to guarantee this. For efficiency of communication, teams depending on Ports, such as the release engineering team, have their own ports liaisons. 8.2. The FreeBSD Documentation Project The FreeBSD Documentation project was started January 1995. From the initial group of a project leader, four team leaders and 16 members, they are now a total of 44 committers. The documentation mailing list has just under 300 members, indicating that there is quite a large community around it. The goal of the Documentation project is to provide good and useful documentation of the FreeBSD project, thus making it easier for new users to get familiar with the system and detailing advanced features for the users. The main tasks in the Documentation project are to work on current projects in the “FreeBSD Documentation Set”, and translate the documentation to other languages. Like the FreeBSD Project, documentation is split in the same branches. This is done so that there is always an updated version of the documentation for each version. Only documentation errors are corrected in the security branches. Like the ports sub-project, the Documentation project can appoint documentation committers without FreeBSD Core's approval. [FreeBSD, 2003B]. The Documentation project has a primer. This is used both to introduce new project members to the standard tools and syntaxes and acts as a reference when working on the project. References [Brooks, 1995] Frederick P. Brooks, 1975, 1995, 0201835959, Addison-Wesley Pub Co, The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition (2nd Edition). [Saers, 2003] Niklas Saers, 2003, A project model for the FreeBSD Project: Candidatus Scientiarum thesis. [Jørgensen, 2001] Niels Jørgensen, 2001, Putting it All in the Trunk: Incremental Software Development in the FreeBSD Open Source Project. [PMI, 2000] Project Management Institute, 1996, 2000, 1-880410-23-0, Project Management Institute, Pennsylvania, PMBOK Guide: A Guide to the Project Management Body of Knowledge, 2000 Edition. [FreeBSD, 2000A] 2002, Core Bylaws. [FreeBSD, 2002A] 2002, FreeBSD Developer's Handbook. [FreeBSD, 2002B] 2002, Core team election 2002. [Losh, 2002] Warner Losh, 2002, Working with Hats. [FreeBSD, 2002C] Dag-Erling Smørgrav and Hiten Pandya, 2002, The FreeBSD Documentation Project, Problem Report Handling Guidelines. [FreeBSD, 2002D] Dag-Erling Smørgrav, 2002, The FreeBSD Documentation Project, Writing FreeBSD Problem Reports. [FreeBSD, 2001] 2001, The FreeBSD Documentation Project, Committers Guide. [FreeBSD, 2002E] Murray Stokely, 2002, The FreeBSD Documentation Project, FreeBSD Release Engineering. [FreeBSD, 2003A] The FreeBSD Documentation Project, FreeBSD Handbook. [FreeBSD, 2002F] 2002, The FreeBSD Documentation Project, Contributors to FreeBSD. [FreeBSD, 2002G] 2002, The FreeBSD Project, Core team elections 2002. [FreeBSD, 2002H] 2002, The FreeBSD Project, Commit Bit Expiration Policy: 2002/04/06 15:35:30. [FreeBSD, 2002I] 2002, The FreeBSD Project, New Account Creation Procedure: 2002/08/19 17:11:27. [FreeBSD, 2003B] 2002, The FreeBSD Documentation Project, FreeBSD DocEng Team Charter: 2003/03/16 12:17. [Lehey, 2002] Greg Lehey, 2002, Greg Lehey, Two years in the trenches: The evolution of a software project. Notes [1] This goes hand-in-hand with Brooks' law that “adding another person to a late project will make it later” since it will increase the communication needs Brooks, 1995. A project model is a tool to reduce the communication needs. [2] Statistics are generated by counting the number of entries in the file fetched by portsdb by April 1st, 2005. portsdb is a part of the port sysutils/portupgrade. [3] The period from January 1st, 2004 to December 31st, 2004 was examined to find this number. [4] For instance, the development of the Bluetooth stack started as a sub-project until it was deemed stable enough to be merged into the -CURRENT branch. Now it is a part of the core FreeBSD system. [5] According to Kirk McKusick, after 20 years of developing UNIX operating systems, the interfaces are for the most part figured out. There is therefore no need for much design. However, new applications of the system and new hardware leads to some implementations being more beneficial than those that used to be preferred. One example is the introduction of web browsing that made the normal TCP/IP connection a short burst of data rather than a steady stream over a longer period of time. [6] The first release this actually happened for was 4.5-RELEASE, but security branches were at the same time created for 4.3-RELEASE and 4.4-RELEASE. [7] There is a terminology overlap with respect to the word "stable", which leads to some confusion. The -STABLE branch is still a development branch, whose goal is to be useful for most people. If it is never acceptable for a system to get changes that are not announced at the time it is deployed, that system should run a security branch. [8] The first Core election was held September 2000 [9] More and more tests are however performed when building the system &

22,209

社区成员

发帖
与我相关
我的任务
社区描述
MS-SQL Server 疑难问题
社区管理员
  • 疑难问题社区
  • 尘觉
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧