select Lastname,EmployeeID FROM Employees ORDER BY LastName里的order by是什么意思?

cnaspx 2003-10-13 04:24:18
我知道这句的意思是从Employees表里读取Lastname,EmployeeID ,那么都面的ORDER BY LastName是用来干什么的呢?
...全文
177 3 打赏 收藏 转发到动态 举报
写回复
用AI写文章
3 条回复
切换为时间正序
请发表友善的回复…
发表回复
ALong_Yue 2003-10-13
  • 打赏
  • 举报
回复
根据 LastName 来排序
mzyp2002 2003-10-13
  • 打赏
  • 举报
回复
按LastName排序
yyf_321 2003-10-13
  • 打赏
  • 举报
回复
根据 LastName 来排序
Join操作 适用场景:在我们表关系中有一对一关系,一对多关系,多对多关系等。对各个表之间的关系,就用这些实现对多个表的操作。 说明:在Join操作中,分别为Join(Join查询), SelectMany(Select一对多选择)和GroupJoin(分组Join查询)。 该扩展方法对两个序列中键匹配的元素进行inner join操作 SelectMany 说明:我们在写查询语句时,如果被翻译成SelectMany需要满足2个条件。1:查询语句中没有join和into,2:必须出现EntitySet。在我们表关系中有一对一关系,一对多关系,多对多关系等,下面分别介绍一下。 1.一对多关系(1 to Many): var q = from c in db.Customers from o in c.Orders where c.City == "London" select o; 语句描述:Customers与Orders是一对多关系。即Orders在Customers类中以EntitySet形式出现。所以第二个 from是从c.Orders而不是db.Orders进行筛选。这个例子在From子句中使用外键导航选择伦敦客户的所有订单。 var q = from p in db.Products where p.Supplier.Country == "USA" && p.UnitsInStock == 0 select p; 语句描述:这一句使用了p.Supplier.Country条件,间接关联了Supplier表。这个例子在Where子句中使用外键导航筛选其供应商在美国且缺货的产品。生成SQL语句为: SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID],[t0].[QuantityPerUnit],[t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder],[t0].[ReorderLevel], [t0].[Discontinued] FROM [dbo].[Products] AS [t0] LEFT OUTER JOIN [dbo].[Suppliers] AS [t1] ON [t1].[SupplierID] = [t0].[SupplierID] WHERE ([t1].[Country] = @p0) AND ([t0].[UnitsInStock] = @p1) -- @p0: Input NVarChar (Size = 3; Prec = 0; Scale = 0) [USA] -- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0] 2.多对多关系(Many to Many): var q = from e in db.Employees from et in e.EmployeeTerritories where e.City == "Seattle" select new { e.FirstName, e.LastName, et.Territory.TerritoryDescription }; 说明:多对多关系一般会涉及三个表(如果有一个表是自关联的,那有可能只有2个表)。这一句语句涉及Employees, EmployeeTerritories, Territories三个表。它们的关系是1:M:1。Employees和Territories没有很明确的关系。 LINQ to SQL语句之Join和Order By部分代码 语句描述:这个例子在From子句中使用外键导航筛选在西雅图的雇员,同时列出其所在地区。这条生成SQL语句为: SELECT [t0].[FirstName], [t0].[LastName], [t2].[TerritoryDescription] FROM [dbo].[Employees] AS [t0] CROSS JOIN [dbo].[EmployeeTerritories] AS [t1] INNER JOIN [dbo].[Territories] AS [t2] ON [t2].[TerritoryID] = [t1].[TerritoryID] WHERE ([t0].[City] = @p0) AND ([t1].[EmployeeID] = [t0].[EmployeeID]) -- @p0: Input NVarChar (Siz
MYSQL常用命令 1.导出整个数据库 mysqldump -u 用名 -p –default-character-set=latin1 数据库名 > 导出的文件名(数据库默认编码是latin1) mysqldump -u wcnc -p smgp_apps_wcnc > wcnc.sql 2.导出一个表 mysqldump -u 用户名 -p 数据库名 表名> 导出的文件名 mysqldump -u wcnc -p smgp_apps_wcnc users> wcnc_users.sql 3.导出一个数据库结构 mysqldump -u wcnc -p -d –add-drop-table smgp_apps_wcnc >d:wcnc_db.sql -d 没有数据 –add-drop-table 在每个create语句之前增加一个drop table 4.导入数据库 A:常用source 命令 进入mysql数据库控制台, 如mysql -u root -p mysql>use 数据库 然后使用source命令,后面参数为脚本文件(如这用到的.sql) mysql>source wcnc_db.sql B:使用mysqldump命令 mysqldump -u username -p dbname < filename.sql C:使用mysql命令 mysql -u username -p -D dbname 2、退出MySQL:quit或exit 二、库操作 1、创建数据库 命令:create database 例如:建立一个名为xhkdb的数据库 mysql> create database xhkdb; 2、显示所有的数据库 命令:show databases (注意:最后有个s) mysql> show databases; 3、删除数据库 命令:drop database 例如:删除名为 xhkdb的数据库 mysql> drop database xhkdb; 4、连接数据库 命令:use 例如:如果xhkdb数据库存在,尝试存取它: mysql> use xhkdb; 屏幕提示:Database changed 5、查看当前使用的数据库 mysql> select database(); 6、当前数据库包含的表信息: mysql> show tables; (注意:最后有个s) 三、表操作,操作之前应连接某个数据库 1、建表 命令:create table ( [,.. ]); mysql> create table MyClass( > id int(4) not null primary key auto_increment, > name char(20) not null, > sex int(4) not null default ’′, > degree double(16,2)); 2、获取表结构 命令:desc 表名,或者show columns from 表名 mysql>DESCRIBE MyClass mysql> desc MyClass; mysql> show columns from MyClass; 3、删除表 命令:drop table 例如:删除表名为 MyClass 的表 mysql> drop table MyClass; 4、插入数据 命令:insert into [( [,.. ])] values ( 值 )[, ( 值n )] 例如,往表 MyClass中插入二条记录, 这二条记录表示:编号为的名为Tom的成绩为.45, 编号为 的名为Joan 的成绩为.99,编号为 的名为Wang 的成绩为.5. mysql>insert into MyClass values(1,’Tom’,96.45),(2,’Joan’,82.99), (2,’Wang’, 96.59); 5、查询表中的数据 1)、查询所有行 命令:select from where 例如:查看表 MyClass 中所有数据 mysql> select * from MyClass; 2)、查询前几行数据 例如:查看表 MyClass 中前行数据 mysql> select * from MyClass order by id limit 0,2; 或者: mysql> select * from MyClass limit 0,2; 6、删除表中数据 命令:delete from 表名 where 表达式 例如:删除表 MyClass中编号为 的记录 mysql> delete from MyClass where id=1; 7、修改表中数据:update 表名 set 字段=新值,…where 条件 mysql> update MyClass set name=’Mary’where id=1; 7、在表中增加字段: 命令:alter table 表名 add字段 类型 其他; 例如:在表MyClass中添加了一个字段passtest,类型为int(4),默认值为 mysql> alter table MyClass add passtest int(4) default ’′ 8、更改表名: 命令:rename table 原表名 to 新表名; 例如:在表MyClass名字更改为YouClass mysql> rename table MyClass to YouClass; 更新字段内容 update 表名 set 字段名 = 新内容 update 表名 set 字段名 = replace(字段名,’旧内容’,'新内容’) 文章前面加入个空格 update article set content=concat(‘  ’,content); 字段类型 1.INT[(M)] 型:正常大小整数类型 2.DOUBLE[(M,D)] [ZEROFILL] 型:正常大小(双精密)浮点数字类型 3.DATE 日期类型:支持的范围是-01-01到-12-31。MySQL以YYYY-MM-DD格式来显示DATE值,但是允许你使用字符串或数字把值赋给DATE列 4.CHAR(M) 型:定长字符串类型,当存储时,总是是用空格填满右边到指定的长度 5.BLOB TEXT类型,最大长度为(2^16-1)个字符。 6.VARCHAR型:变长字符串类型 5.导入数据库表   ()创建.sql文件   ()先产生一个库如auction.c:mysqlbin>mysqladmin -u root -p creat auction,会提示输入密码,然后成功创建。   ()导入auction.sql文件   c:mysqlbin>mysql -u root -p auction grant select,insert,delete,create,drop   on *.* (或test.*/user.*/..)   to 用户名@localhost   identified by ‘密码’;   如:新建一个用户帐号以便可以访问数据库,需要进行如下操作:   mysql> grant usage   -> ON test.*   -> TO testuser@localhost;   Query OK, 0 rows affected (0.15 sec)   此后就创建了一个新用户叫:testuser,这个用户只能从localhost连接到数据库并可以连接到test 数据库。下一步,我们必须指定testuser这个用户可以执行哪些操作:   mysql> GRANT select, insert, delete,update   -> ON test.*   -> TO testuser@localhost;   Query OK, 0 rows affected (0.00 sec)   此操作使testuser能够在每一个test数据库中的表执行SELECT,INSERT和DELETE以及UPDATE查询操作。现在我们结束操作并退出MySQL客户程序:   mysql> exit   Bye9! 1:使用SHOW语句找出在服务器上当前存在什么数据库: mysql> SHOW DATABASES; 2:2、创建一个数据库MYSQLDATA mysql> Create DATABASE MYSQLDATA; 3:选择你所创建的数据库 mysql> USE MYSQLDATA; (按回车键出现Database changed 时说明操作成功!) 4:查看现在的数据库中存在什么表 mysql> SHOW TABLES; 5:创建一个数据库表 mysql> Create TABLE MYTABLE (name VARCHAR(20), sex CHAR(1)); 6:显示表的结构: mysql> DESCRIBE MYTABLE; 7:往表中加入记录 mysql> insert into MYTABLE values (“hyq”,”M”); 8:用文本方式将数据装入数据库表中(例如D:/mysql.txt) mysql> LOAD DATA LOCAL INFILE “D:/mysql.txt”INTO TABLE MYTABLE; 9:导入.sql文件命令(例如D:/mysql.sql) mysql>use database; mysql>source d:/mysql.sql; 10:删除表 mysql>drop TABLE MYTABLE; 11:清空表 mysql>delete from MYTABLE; 12:更新表中数据 mysql>update MYTABLE set sex=”f”where name=’hyq’; 以下是无意中在网络看到的使用MySql的管理心得, 在windows中MySql以服务形式存在,在使用前应确保此服务已经启动,未启动可用net start mysql命令启动。而Linux中启动时可用“/etc/rc.d/init.d/mysqld start”命令,注意启动者应具有管理员权限。 刚安装好的MySql包含一个含空密码的root帐户和一个匿名帐户,这是很大的安全隐患,对于一些重要的应用我们应将安全性尽可能提高,在这应把匿名帐户删除、root帐户设置密码,可用如下命令进行: use mysql; delete from User where User=”"; update User set Password=PASSWORD(‘newpassword’) where User=’root’; 如果要对用户所用的登录终端进行限制,可以更新User表中相应用户的Host字段,在进行了以上更改后应重新启动数据库服务,此时登录时可用如下类似命令: mysql -uroot -p; mysql -uroot -pnewpassword; mysql mydb -uroot -p; mysql mydb -uroot -pnewpassword; 上面命令参数是常用参数的一部分,详细情况可参考文档。此处的mydb是要登录的数据库的名称。 在进行开发和实际应用中,用户不应该只用root用户进行连接数据库,虽然使用root用户进行测试时很方便,但会给系统带来重大安全隐患,也不利于管理技术的提高。我们给一个应用中使用的用户赋予最恰当的数据库权限。如一个只进行数据插入的用户不应赋予其删除数据的权限。MySql的用户管理是通过User表来实现的,添加新用户常用的方法有两个,一是在User表插入相应的数据行,同时设置相应的权限;二是通过GRANT命令创建具有某种权限的用户。其中GRANT的常用用法如下: grant all on mydb.* to NewUserName@HostName identified by “password”; grant usage on *.* to NewUserName@HostName identified by “password”; grant select,insert,update on mydb.* to NewUserName@HostName identified by “password”; grant update,delete on mydb.TestTable to NewUserName@HostName identified by “password”; 若要给此用户赋予他在相应对象上的权限的管理能力,可在GRANT后面添加WITH GRANT OPTION选项。而对于用插入User表添加的用户,Password字段应用PASSWORD 函数进行更新加密,以防不轨之人窃看密码。对于那些已经不用的用户应给予清除,权限过界的用户应及时回收权限,回收权限可以通过更新User表相应字段,也可以使用REVOKE操作。 下面给出本人从其它资料(www.cn-java.com)获得的对常用权限的解释: 全局管理权限: FILE: 在MySQL服务器上读写文件。 PROCESS: 显示或杀死属于其它用户的服务线程。 RELOAD: 重载访问控制表,刷新日志等。 SHUTDOWN: 关闭MySQL服务。 数据库/数据表/数据列权限: Alter: 修改已存在的数据表(例如增加/删除列)和索引。 Create: 建立新的数据库或数据表。 Delete: 删除表的记录。 Drop: 删除数据表或数据库。 INDEX: 建立或删除索引。 Insert: 增加表的记录。 Select: 显示/搜索表的记录。 Update: 修改表中已存在的记录。 特别的权限: ALL: 允许做任何事(和root一样)。 USAGE: 只允许登录–其它什么也不允许做。 ——————— MYSQL常用命令 有很多朋友虽然安装好了mysql但却不知如何使用它。在这篇文章中我们就从连接MYSQL、修改密码、增加用户等方面来学习一些MYSQL的常用命令。   有很多朋友虽然安装好了mysql但却不知如何使用它。在这篇文章中我们就从连接MYSQL、修改密码、增加用户等方面来学习一些MYSQL的常用命令。    一、连接MYSQL    格式:mysql -h主机地址-u用户名-p用户密码     、例:连接到本机上的MYSQL   首先在打开DOS窗口,然后进入目录mysqlbin,再键入命令mysql -uroot -p,回车后提示你输密码,如果刚安装好MYSQL,超级用户root是没有密码的,故直接回车即可进入到MYSQL中了,MYSQL的提示符是:mysql>      、例:连接到远程主机上的MYSQL   假设远程主机的IP为:.110.110.110,用户名为root,密码为abcd123。则键入以下命令:      mysql -h110.110.110.110 -uroot -pabcd123      (注:u与root可以不用加空格,其它也一样)     、退出MYSQL命令:exit (回车)   二、修改密码     格式:mysqladmin -u用户名-p旧密码password 新密码    、例:给root加个密码ab12。首先在DOS下进入目录mysqlbin,然后键入以下命令     mysqladmin -uroot -password ab12      注:因为开始时root没有密码,所以-p旧密码一项就可以省略了。     、例:再将root的密码改为djg345   mysqladmin -uroot -pab12 password djg345 MYSQL常用命令(下)   一、操作技巧   、如果你打命令时,回车后发现忘记加分号,你无须重打一遍命令,只要打个分号回车就可以了。也就是说你可以把一个完整的命令分成几行来打,完后用分号作结束标志就OK。   、你可以使用光标上下键调出以前的命令。但以前我用过的一个MYSQL旧版本不支持。我现在用的是mysql-3.23.27-beta-win。   二、显示命令   、显示数据库列表。   show databases;   刚开始时才两个数据库:mysql和test。mysql库很重要它面有MYSQL的系统信息,我们改密码和新增用户,实际上就是用这个库进行操作。   、显示库中的数据表:   use mysql;//打开库,学过FOXBASE的一定不会陌生吧   show tables;   、显示数据表的结构:   describe 表名;   、建库:   create database 库名;   、建表:   use 库名;   create table 表名(字段设定列表);   、删库和删表:   drop database 库名;   drop table 表名;   、将表中记录清空:   delete from 表名;   、显示表中的记录:   select * from 表名; 三、一个建库和建表以及插入数据的实例   drop database if exists school; //如果存在SCHOOL则删除   create database school; //建立库SCHOOL   use school; //打开库SCHOOL   create table teacher //建立表TEACHER   (   id int(3) auto_increment not null primary key,   name char(10) not null,   address varchar(50) default ‘深圳’,   year date   ); //建表结束   //以下为插入字段   insert into teacher values(”,’glchengang’,'深圳一中’,’-10-10′);   insert into teacher values(”,’jack’,'深圳一中’,’-12-23′);   注:在建表中()将ID设为长度为的数字字段:int(3)并让它每个记录自动加一:auto_increment并不能为空:not null而且让他成为主字段primary key   ()将NAME设为长度为的字符字段   ()将ADDRESS设为长度的字符字段,而且缺省值为深圳。varchar和char有什么区别呢,只有等以后的文章再说了。   ()将YEAR设为日期字段。   如果你在mysql提示符键入上面的命令也可以,但不方便调试。你可以将以上命令原样写入一个文本文件中假设为school.sql,然后复制到c:\下,并在DOS状态进入目录\mysql\bin,然后键入以下命令:   mysql -uroot -p密码school.bbb   注释:将数据库school备份到school.bbb文件,school.bbb是一个文本文件,文件名任取,打开看看你会有新发现。 一.SELECT语句的完整语法为: SELECT[ALL|DISTINCT|DISTINCTROW|TOP] {*|talbe.*|[table.]field1[AS alias1][,[table.]field2[AS alias2][,…]]} FROM tableexpression[,…][IN externaldatabase] [WHERE…] [GROUP BY…] [HAVING…] [ORDER BY…] [WITH OWNERACCESS OPTION] 说明: 用中括号([])括起来的部分表示是可选的,用大括号({})括起来的部分是表示必须从中选择其中的一个。 1 FROM子句 FROM 子句指定了SELECT语句中字段的来源。FROM子句后面是包含一个或多个的表达式(由逗号分开),其中的表达式可为单一表名称、已保存的查询或由INNER JOIN、LEFT JOIN 或RIGHT JOIN 得到的复合结果。如果表或查询存储在外部数据库,在IN 子句之后指明其完整路径。 例:下列SQL语句返回所有有定单的客户: SELECT OrderID,Customer.customerID FROM Orders Customers WHERE Orders.CustomerID=Customers.CustomeersID 2 ALL、DISTINCT、DISTINCTROW、TOP谓词 (1) ALL 返回满足SQL语句条件的所有记录。如果没有指明这个谓词,默认为ALL。 例:SELECT ALL FirstName,LastName FROM Employees (2) DISTINCT 如果有多个记录的选择字段的数据相同,只返回一个。 (3) DISTINCTROW 如果有重复的记录,只返回一个 (4) TOP显示查询头尾若干记录。也可返回记录的百分比,这是要用TOP N PERCENT子句(其中N 表示百分比) 例:返回%定货额最大的定单 SELECT TOP 5 PERCENT* FROM [ Order Details] ORDER BY UnitPrice*Quantity*(1-Discount) DESC 3 用AS 子句为字段取别名 如果想为返回的列取一个新的标题,或者,经过对字段的计算或总结之后,产生了一个新的值,希望把它放到一个新的列显示,则用AS保留。 例:返回FirstName字段取别名为NickName SELECT FirstName AS NickName ,LastName ,City FROM Employees 例:返回新的一列显示库存价值 SELECT ProductName ,UnitPrice ,UnitsInStock ,UnitPrice*UnitsInStock AS valueInStock FROM Products 二.WHERE 子句指定查询条件 1 比较运算符 比较运算符含义 = 等于 > 大于 = 大于等于 <= 小于等于 不等于 !> 不大于 !#1/1/96# AND OrderDate#96-1-1# 也可以表示为: WHERE OrderDate>Datevalue(‘/1/96’) 使用NOT 表达式求反。 例:查看年月日以后的定单 WHERE Not OrderDateQuantity 另一种方法是用Microsof JET SQL 独有的JNNER JOIN 语法: FROM table1 INNER JOIN table2 ON table1.field1 comparision table2.field2 其中comparision 就是前面WHERE子句用到的比较运算符。 SELECT FirstName,lastName,OrderID,CustomerID,OrderDate FROM Employees INNER JOIN Orders ON Employees.EmployeeID=Orders.EmployeeID 注意: INNER JOIN不能连接Memo OLE Object Single Double 数据类型字段。 在一个JOIN语句中连接多个ON子句 语法: SELECT fields FROM table1 INNER JOIN table2 ON table1.field1 compopr table2.field1 AND ON table1.field2 compopr table2.field2 OR ON table1.field3 compopr table2.field3 也可以 SELECT fields FROM table1 INNER JOIN (table2 INNER JOIN [( ]table3 [INNER JOER] [( ]tablex[INNER JOIN] ON table1.field1 compopr table2.field1 ON table1.field2 compopr table2.field2 ON table1.field3 compopr table2.field3 外部连接返回更多记录,在结果中保留不匹配的记录,不管存不存在满足条件的记录都要返回另一侧的所有记录。 FROM table [LEFT|RIGHT]JOIN table2 ON table1.field1comparision table.field2 用左连接来建立外部连接,在表达式的左边的表会显示其所有的数据 例:不管有没有定货量,返回所有商品 SELECT ProductName ,OrderID FROM Products LEFT JOIN Orders ON Products.PrductsID=Orders.ProductID 右连接与左连接的差别在于:不管左侧表有没有匹配的记录,它都从左侧表中返回所有记录。 例:如果想了解客户的信息,并统计各个地区的客户分布,这时可以用一个右连接,即使某个地区没有客户,也要返回客户信息。 空值不会相互匹配,可以通过外连接才能测试被连接的某个表的字段是否有空值。 SELECT * FROM talbe1 LEFT JOIN table2 ON table1.a=table2.c 1 连接查询中使用Iif函数实现以值显示空值 Iif表达式:Iif(IsNull(Amount,0,Amout) 例:无论定货大于或小于¥,都要返回一个标志。 Iif([Amount]>50,?Big order?,?Small order?) 五. 分组和总结查询结果 在SQL的语法,GROUP BY和HAVING子句用来对数据进行汇总。GROUP BY子句指明了按照哪几个字段来分组,而将记录分组后,用HAVING子句过滤这些记录。 GROUP BY 子句的语法 SELECT fidldlist FROM table WHERE criteria [GROUP BY groupfieldlist [HAVING groupcriteria]] 注:Microsoft Jet数据库Jet 不能对备注或OLE对象字段分组。 GROUP BY字段中的Null值以备分组但是不能被省略。 在任何SQL合计函数中不计算Null值。 GROUP BY子句后最多可以带有十个字段,排序优先级按从左到右的顺序排列。 例:在‘WA’地区的雇员表中按头衔分组后,找出具有同等头衔的雇员数目大于人的所有头衔。 SELECT Title ,Count(Title) as Total FROM Employees WHERE Region = ‘WA’ GROUP BY Title HAVING Count(Title)>1 JET SQL 中的聚积函数 聚集函数意义 SUM ( ) 求和 AVG ( ) 平均值 COUNT ( ) 表达式中记录的数目 COUNT (* ) 计算记录的数目 MAX 最大值 MIN 最小值 VAR 方差 STDEV 标准误差 FIRST 第一个值 LAST 最后一个值 六. 用Parameters声明创建参数查询 Parameters声明的语法: PARAMETERS name datatype[,name datatype[, …]] 其中name 是参数的标志符,可以通过标志符引用参数. Datatype说明参数的数据类型. 使用时要把PARAMETERS 声明置于任何其他语句之前. 例: PARAMETERS[Low price] Currency,[Beginning date]datatime SELECT OrderID ,OrderAmount FROM Orders WHERE OrderAMount>[low price] AND OrderDate>=[Beginning date] 七. 功能查询 所谓功能查询,实际上是一种操作查询,它可以对数据库进行快速高效的操作.它以选择查询为目的,挑选出符合条件的数据,再对数据进行批处理.功能查询包括更新查询,删除查询,添加查询,和生成表查询. 1 更新查询 UPDATE子句可以同时更改一个或多个表中的数据.它也可以同时更改多个字段的值. 更新查询语法: UPDATE 表名 SET 新值 WHERE 准则 例:英国客户的定货量增加%,货运量增加% UPDATE OEDERS SET OrderAmount = OrderAmount *1.1 Freight = Freight*1.03 WHERE ShipCountry = ‘UK’ 2 删除查询 DELETE子句可以使用户删除大量的过时的或冗于的数据. 注:删除查询的对象是整个记录. DELETE子句的语法: DELETE [表名.*] FROM 来源表 WHERE 准则 例: 要删除所有年前的定单 DELETE * FROM Orders WHERE OrderData<#94-1-1# 3 追加查询 INSERT子句可以将一个或一组记录追加到一个或多个表的尾部. INTO 子句指定接受新记录的表 valueS 关键字指定新记录所包含的数据值. INSERT 子句的语法: INSETR INTO 目的表或查询(字段,字段,…) valueS(数值,数值,…) 例:增加一个客户 INSERT INTO Employees(FirstName,LastName,title) valueS(‘Harry’,’Washington’,’Trainee’) 4 生成表查询 可以一次性地把所有满足条件的记录拷贝到一张新表中.通常制作记录的备份或副本或作为报表的基础. SELECT INTO子句用来创建生成表查询语法: SELECT 字段,字段,… INTO 新表[IN 外部数据库] FROM 来源数据库 WHERE 准则 例:为定单制作一个存档备份 SELECT * INTO OrdersArchive FROM Orders 八. 联合查询 UNION运算可以把多个查询的结果合并到一个结果集显示. UNION运算的一般语法: [表]查询UNION [ALL]查询UNION … 例:返回巴西所有供给商和客户的名字和城市 SELECT CompanyName,City FROM Suppliers WHERE Country = ‘Brazil’ UNION SELECT CompanyName,City FROM Customers WHERE Country = ‘Brazil’ 注: 缺省的情况下,UNION子句不返回重复的记录.如果想显示所有记录,可以加ALL选项 UNION运算要求查询具有相同数目的字段.但是,字段数据类型不必相同. 每一个查询参数中可以使用GROUP BY 子句或HAVING 子句进行分组.要想以指定的顺序来显示返回的数据,可以在最后一个查询的尾部使用OREER BY子句. 九. 交叉查询 交叉查询可以对数据进行总和,平均,计数或其他总和计算法的计算,这些数据通过两种信息进行分组:一个显示在表的左部,另一个显示在表的顶部. Microsoft Jet SQL 用TRANSFROM语句创建交叉表查询语法: TRANSFORM aggfunction SELECT 语句 GROUP BY 子句 PIVOT pivotfield[IN(value1 [,value2[,…]]) ] Aggfounction指SQL聚积函数, SELECT语句选择作为标题的的字段, GROUP BY 分组 说明: Pivotfield 在查询结果集中创建列标题时用的字段或表达式,用可选的IN子句限制它的取值. value代表创建列标题的固定值. 例:显示在年每一季度每一位员工所接的定单的数目: TRANSFORM Count(OrderID) SELECT FirstName&’’&LastName; AS FullName FROM Employees INNER JOIN Orders ON Employees.EmployeeID = Orders.EmployeeID WHERE DatePart(“yyyy”,OrderDate)= ‘’ GROUP BY FirstName&’’&LastName; ORDER BY FirstName&’’&LastName; POVOT DatePart(“q”,OrderDate)&’季度’ 十.子查询 子查询可以理解为套查询.子查询是一个SELECT语句. 1 表达式的值与子查询返回的单一值做比较 语法: 表达式comparision [ANY|ALL|SOME](子查询) 说明: ANY 和SOME谓词是同义词,与比较运算符(=,,,=)一起使用.返回一个布尔值True或False.ANY的意思是,表达式与子查询返回的一系列的值逐一比较,只要其中的一次比较产生True结果,ANY测试的返回True值(既WHERE子句的结果),对应于该表达式的当前记录将进入主查询的结果中.ALL测试则要求表达式与子查询返回的一系列的值的比较都产生True结果,才回返回True值. 例:主查询返回单价比任何一个折扣大于等于%的产品的单价要高的所有产品 SELECT * FROM Products WHERE UnitPrice>ANY (SELECT UnitPrice FROM[Order Details] WHERE Discount>0.25) 2 检查表达式的值是否匹配子查询返回的一组值的某个值 语法: [NOT]IN(子查询) 例:返回库存价值大于等于的产品. SELECT ProductName FROM Products WHERE ProductID IN (SELECT PrdoctID FROM [Order DEtails] WHERE UnitPrice*Quantity>= 1000) 3检测子查询是否返回任何记录 语法: [NOT]EXISTS (子查询) 例:用EXISTS检索英国的客户 SELECT ComPanyName,ContactName FROM Orders WHERE EXISTS (SELECT * FROM Customers WHERE Country = ‘UK’AND Customers.CustomerID= Orders.CustomerID)
MYSQL常用命令 1.导出整个数据库 mysqldump -u 用户名 -p --default-character-set=latin1 数据库名 > 导出的文件名(数据库默认编码是latin1) mysqldump -u wcnc -p smgp_apps_wcnc > wcnc.sql 2.导出一个表 mysqldump -u 用户名 -p 数据库名 表名> 导出的文件名 mysqldump -u wcnc -p smgp_apps_wcnc users> wcnc_users.sql 3.导出一个数据库结构 mysqldump -u wcnc -p -d –add-drop-table smgp_apps_wcnc >d:wcnc_db.sql -d 没有数据 –add-drop-table 在每个create语句之前增加一个drop table 4.导入数据库 A:常用source 命令 进入mysql数据库控制台, 如mysql -u root -p mysql>use 数据库 然后使用source命令,后面参数为脚本文件(如这用到的.sql) mysql>source wcnc_db.sql B:使用mysqldump命令 mysqldump -u username -p dbname < filename.sql C:使用mysql命令 mysql -u username -p -D dbname 2、退出MySQL:quit或exit 二、库操作 1、、创建数据库 命令:create database 例如:建立一个名为xhkdb的数据库 mysql> create database xhkdb; 2、显示所有的数据库 命令:show databases (注意:最后有个s) mysql> show databases; 3、删除数据库 命令:drop database 例如:删除名为 xhkdb的数据库 mysql> drop database xhkdb; 4、连接数据库 命令: use 例如:如果xhkdb数据库存在,尝试存取它: mysql> use xhkdb; 屏幕提示:Database changed 5、查看当前使用的数据库 mysql> select database(); 6、当前数据库包含的表信息: mysql> show tables; (注意:最后有个s) 三、表操作,操作之前应连接某个数据库 1、建表 命令:create table ( [,.. ]); mysql> create table MyClass( > id int(4) not null primary key auto_increment, > name char(20) not null, > sex int(4) not null default '0', > degree double(16,2)); 2、获取表结构 命令: desc 表名,或者show columns from 表名 mysql>DESCRIBE MyClass mysql> desc MyClass; mysql> show columns from MyClass; 3、删除表 命令:drop table 例如:删除表名为 MyClass 的表 mysql> drop table MyClass; 4、插入数据 命令:insert into [( [,.. ])] values ( 值1 )[, ( 值n )] 例如,往表 MyClass中插入二条记录, 这二条记录表示:编号为1的名为Tom的成绩为96.45, 编号为2 的名为Joan 的成绩为82.99,编号为3 的名为Wang 的成绩为96.5. mysql> insert into MyClass values(1,'Tom',96.45),(2,'Joan',82.99), (2,'Wang', 96.59); 5、查询表中的数据 1)、查询所有行 命令: select from where 例如:查看表 MyClass 中所有数据 mysql> select * from MyClass; 2)、查询前几行数据 例如:查看表 MyClass 中前2行数据 mysql> select * from MyClass order by id limit 0,2; 或者: mysql> select * from MyClass limit 0,2; 6、删除表中数据 命令:delete from 表名 where 表达式 例如:删除表 MyClass中编号为1 的记录 mysql> delete from MyClass where id=1; 7、修改表中数据:update 表名 set 字段=新值,… where 条件 mysql> update MyClass set name='Mary' where id=1; 7、在表中增加字段: 命令:alter table 表名 add字段 类型 其他; 例如:在表MyClass中添加了一个字段passtest,类型为int(4),默认值为0 mysql> alter table MyClass add passtest int(4) default '0' 8、更改表名: 命令:rename table 原表名 to 新表名; 例如:在表MyClass名字更改为YouClass mysql> rename table MyClass to YouClass; 更新字段内容 update 表名 set 字段名 = 新内容 update 表名 set 字段名 = replace(字段名,'旧内容','新内容'); 文章前面加入4个空格 update article set content=concat('  ',content); 字段类型 1.INT[(M)] 型: 正常大小整数类型 2.DOUBLE[(M,D)] [ZEROFILL] 型: 正常大小(双精密)浮点数字类型 3.DATE 日期类型:支持的范围是1000-01-01到9999-12-31。MySQL以YYYY-MM-DD格式来显示DATE值,但是允许你使用字符串或数字把值赋给DATE列 4.CHAR(M) 型:定长字符串类型,当存储时,总是是用空格填满右边到指定的长度 5.BLOB TEXT类型,最大长度为65535(2^16-1)个字符。 6.VARCHAR型:变长字符串类型 5.导入数据库表    (1)创建.sql文件    (2)先产生一个库如auction.c:mysqlbin>mysqladmin -u root -p creat auction,会提示输入密码,然后成功创建。    (2)导入auction.sql文件    c:mysqlbin>mysql -u root -p auction grant select,insert,delete,create,drop    on *.* (或test.*/user.*/..)    to 用户名@localhost    identified by '密码';    如:新建一个用户帐号以便可以访问数据库,需要进行如下操作:    mysql> grant usage    -> ON test.*    -> TO testuser@localhost;    Query OK, 0 rows affected (0.15 sec)    此后就创建了一个新用户叫:testuser,这个用户只能从localhost连接到数据库并可以连接到test 数据库。下一步,我们必须指定testuser这个用户可以执行哪些操作:    mysql> GRANT select, insert, delete,update    -> ON test.*    -> TO testuser@localhost;    Query OK, 0 rows affected (0.00 sec)    此操作使testuser能够在每一个test数据库中的表执行SELECT,INSERT和DELETE以及UPDATE查询操作。现在我们结束操作并退出MySQL客户程序:    mysql> exit    Bye9! 1:使用SHOW语句找出在服务器上当前存在什么数据库: mysql> SHOW DATABASES; 2:2、创建一个数据库MYSQLDATA mysql> Create DATABASE MYSQLDATA; 3:选择你所创建的数据库 mysql> USE MYSQLDATA; (按回车键出现Database changed 时说明操作成功!) 4:查看现在的数据库中存在什么表 mysql> SHOW TABLES; 5:创建一个数据库表 mysql> Create TABLE MYTABLE (name VARCHAR(20), sex CHAR(1)); 6:显示表的结构: mysql> DESCRIBE MYTABLE; 7:往表中加入记录 mysql> insert into MYTABLE values ("hyq","M"); 8:用文本方式将数据装入数据库表中(例如D:/mysql.txt) mysql> LOAD DATA LOCAL INFILE "D:/mysql.txt" INTO TABLE MYTABLE; 9:导入.sql文件命令(例如D:/mysql.sql) mysql>use database; mysql>source d:/mysql.sql; 10:删除表 mysql>drop TABLE MYTABLE; 11:清空表 mysql>delete from MYTABLE; 12:更新表中数据 mysql>update MYTABLE set sex="f" where name='hyq'; 以下是无意中在网络看到的使用MySql的管理心得, 摘自:http://www1.xjtusky.com/article/htmldata/2004_12/3/57/article_1060_1.html 在windows中MySql以服务形式存在,在使用前应确保此服务已经启动,未启动可用net start mysql命令启动。而Linux中启动时可用“/etc/rc.d/init.d/mysqld start"命令,注意启动者应具有管理员权限。 刚安装好的MySql包含一个含空密码的root帐户和一个匿名帐户,这是很大的安全隐患,对于一些重要的应用我们应将安全性尽可能提高,在这应把匿名帐户删除、 root帐户设置密码,可用如下命令进行: use mysql; delete from User where User=""; update User set Password=PASSWORD('newpassword') where User='root'; 如果要对用户所用的登录终端进行限制,可以更新User表中相应用户的Host字段,在进行了以上更改后应重新启动数据库服务,此时登录时可用如下类似命令: mysql -uroot -p; mysql -uroot -pnewpassword; mysql mydb -uroot -p; mysql mydb -uroot -pnewpassword; 上面命令参数是常用参数的一部分,详细情况可参考文档。此处的mydb是要登录的数据库的名称。 在进行开发和实际应用中,用户不应该只用root用户进行连接数据库,虽然使用root用户进行测试时很方便,但会给系统带来重大安全隐患,也不利于管理技术的提高。我们给一个应用中使用的用户赋予最恰当的数据库权限。如一个只进行数据插入的用户不应赋予其删除数据的权限。MySql的用户管理是通过 User表来实现的,添加新用户常用的方法有两个,一是在User表插入相应的数据行,同时设置相应的权限;二是通过GRANT命令创建具有某种权限的用户。其中GRANT的常用用法如下: grant all on mydb.* to NewUserName@HostName identified by "password" ; grant usage on *.* to NewUserName@HostName identified by "password"; grant select,insert,update on mydb.* to NewUserName@HostName identified by "password"; grant update,delete on mydb.TestTable to NewUserName@HostName identified by "password"; 若要给此用户赋予他在相应对象上的权限的管理能力,可在GRANT后面添加WITH GRANT OPTION选项。而对于用插入User表添加的用户,Password字段应用PASSWORD 函数进行更新加密,以防不轨之人窃看密码。对于那些已经不用的用户应给予清除,权限过界的用户应及时回收权限,回收权限可以通过更新User表相应字段,也可以使用REVOKE操作。 下面给出本人从其它资料(www.cn-java.com)获得的对常用权限的解释: 全局管理权限: FILE: 在MySQL服务器上读写文件。 PROCESS: 显示或杀死属于其它用户的服务线程。 RELOAD: 重载访问控制表,刷新日志等。 SHUTDOWN: 关闭MySQL服务。 数据库/数据表/数据列权限: Alter: 修改已存在的数据表(例如增加/删除列)和索引。 Create: 建立新的数据库或数据表。 Delete: 删除表的记录。 Drop: 删除数据表或数据库。 INDEX: 建立或删除索引。 Insert: 增加表的记录。 Select: 显示/搜索表的记录。 Update: 修改表中已存在的记录。 特别的权限: ALL: 允许做任何事(和root一样)。 USAGE: 只允许登录--其它什么也不允许做。 --------------------- MYSQL常用命令 有很多朋友虽然安装好了mysql但却不知如何使用它。在这篇文章中我们就从连接MYSQL、修改密码、增加用户等方面来学习一些MYSQL的常用命令。   有很多朋友虽然安装好了mysql但却不知如何使用它。在这篇文章中我们就从连接MYSQL、修改密码、增加用户等方面来学习一些MYSQL的常用命令。     一、连接MYSQL    格式: mysql -h主机地址 -u用户名 -p用户密码      1、例1:连接到本机上的MYSQL   首先在打开DOS窗口,然后进入目录 mysqlbin,再键入命令mysql -uroot -p,回车后提示你输密码,如果刚安装好MYSQL,超级用户root是没有密码的,故直接回车即可进入到MYSQL中了,MYSQL的提示符是:mysql>      2、例2:连接到远程主机上的MYSQL   假设远程主机的IP为:110.110.110.110,用户名为root,密码为abcd123。则键入以下命令:       mysql -h110.110.110.110 -uroot -pabcd123      (注:u与root可以不用加空格,其它也一样)      3、退出MYSQL命令: exit (回车)   二、修改密码     格式:mysqladmin -u用户名 -p旧密码 password 新密码     1、例1:给root加个密码ab12。首先在DOS下进入目录mysqlbin,然后键入以下命令      mysqladmin -uroot -password ab12      注:因为开始时root没有密码,所以-p旧密码一项就可以省略了。      2、例2:再将root的密码改为djg345   mysqladmin -uroot -pab12 password djg345 MYSQL常用命令(下)   一、操作技巧   1、如果你打命令时,回车后发现忘记加分号,你无须重打一遍命令,只要打个分号回车就可以了。也就是说你可以把一个完整的命令分成几行来打,完后用分号作结束标志就OK。   2、你可以使用光标上下键调出以前的命令。但以前我用过的一个MYSQL旧版本不支持。我现在用的是mysql-3.23.27-beta-win。   二、显示命令   1、显示数据库列表。   show databases;   刚开始时才两个数据库:mysql和test。mysql库很重要它面有MYSQL的系统信息,我们改密码和新增用户,实际上就是用这个库进行操作。   2、显示库中的数据表:   use mysql; //打开库,学过FOXBASE的一定不会陌生吧   show tables;   3、显示数据表的结构:   describe 表名;   4、建库:   create database 库名;   5、建表:   use 库名;   create table 表名 (字段设定列表);   6、删库和删表:   drop database 库名;   drop table 表名;   7、将表中记录清空:   delete from 表名;   8、显示表中的记录:   select * from 表名; 三、一个建库和建表以及插入数据的实例   drop database if exists school; //如果存在SCHOOL则删除   create database school; //建立库SCHOOL   use school; //打开库SCHOOL   create table teacher //建立表TEACHER   (   id int(3) auto_increment not null primary key,   name char(10) not null,   address varchar(50) default '深圳',   year date   ); //建表结束   //以下为插入字段   insert into teacher values('','glchengang','深圳一中','1976-10-10');   insert into teacher values('','jack','深圳一中','1975-12-23');   注:在建表中(1)将ID设为长度为3的数字字段:int(3)并让它每个记录自动加一:auto_increment并不能为空:not null而且让他成为主字段primary key   (2)将NAME设为长度为10的字符字段   (3)将ADDRESS设为长度50的字符字段,而且缺省值为深圳。varchar和char有什么区别呢,只有等以后的文章再说了。   (4)将YEAR设为日期字段。   如果你在mysql提示符键入上面的命令也可以,但不方便调试。你可以将以上命令原样写入一个文本文件中假设为school.sql,然后复制到c:\下,并在DOS状态进入目录\mysql\bin,然后键入以下命令:   mysql -uroot -p密码 school.bbb   注释:将数据库school备份到school.bbb文件,school.bbb是一个文本文件,文件名任取,打开看看你会有新发现。 一.SELECT语句的完整语法为: SELECT[ALL|DISTINCT|DISTINCTROW|TOP] {*|talbe.*|[table.]field1[AS alias1][,[table.]field2[AS alias2][,…]]} FROM tableexpression[,…][IN externaldatabase] [WHERE…] [GROUP BY…] [HAVING…] [ORDER BY…] [WITH OWNERACCESS OPTION] 说明: 用中括号([])括起来的部分表示是可选的,用大括号({})括起来的部分是表示必须从中选择其中的一个。 1 FROM子句 FROM 子句指定了SELECT语句中字段的来源。FROM子句后面是包含一个或多个的表达式(由逗号分开),其中的表达式可为单一表名称、已保存的查询或由 INNER JOIN、LEFT JOIN 或 RIGHT JOIN 得到的复合结果。如果表或查询存储在外部数据库,在IN 子句之后指明其完整路径。 例:下列SQL语句返回所有有定单的客户: SELECT OrderID,Customer.customerID FROM Orders Customers WHERE Orders.CustomerID=Customers.CustomeersID 2 ALL、DISTINCT、DISTINCTROW、TOP谓词 (1) ALL 返回满足SQL语句条件的所有记录。如果没有指明这个谓词,默认为ALL。 例:SELECT ALL FirstName,LastName FROM Employees (2) DISTINCT 如果有多个记录的选择字段的数据相同,只返回一个。 (3) DISTINCTROW 如果有重复的记录,只返回一个 (4) TOP显示查询头尾若干记录。也可返回记录的百分比,这是要用 TOP N PERCENT子句(其中N 表示百分比) 例:返回5%定货额最大的定单 SELECT TOP 5 PERCENT* FROM [ Order Details] ORDER BY UnitPrice*Quantity*(1-Discount) DESC 3 用 AS 子句为字段取别名 如果想为返回的列取一个新的标题,或者,经过对字段的计算或总结之后,产生了一个新的值,希望把它放到一个新的列显示,则用AS保留。 例:返回FirstName字段取别名为NickName SELECT FirstName AS NickName ,LastName ,City FROM Employees 例:返回新的一列显示库存价值 SELECT ProductName ,UnitPrice ,UnitsInStock ,UnitPrice*UnitsInStock AS valueInStock FROM Products 二 .WHERE 子句指定查询条件 1 比较运算符 比较运算符 含义 = 等于 > 大于 = 大于等于 <= 小于等于 不等于 !> 不大于 !#1/1/96# AND OrderDate#96-1-1# 也可以表示为: WHERE OrderDate>Datevalue(‘1/1/96’) 使用 NOT 表达式求反。 例:查看96年1月1日以后的定单 WHERE Not OrderDateQuantity 另一种方法是用 Microsof JET SQL 独有的 JNNER JOIN 语法: FROM table1 INNER JOIN table2 ON table1.field1 comparision table2.field2 其中comparision 就是前面WHERE子句用到的比较运算符。 SELECT FirstName,lastName,OrderID,CustomerID,OrderDate FROM Employees INNER JOIN Orders ON Employees.EmployeeID=Orders.EmployeeID 注意: INNER JOIN不能连接Memo OLE Object Single Double 数据类型字段。 在一个JOIN语句中连接多个ON子句 语法: SELECT fields FROM table1 INNER JOIN table2 ON table1.field1 compopr table2.field1 AND ON table1.field2 compopr table2.field2 OR ON table1.field3 compopr table2.field3 也可以 SELECT fields FROM table1 INNER JOIN (table2 INNER JOIN [( ]table3 [INNER JOER] [( ]tablex[INNER JOIN] ON table1.field1 compopr table2.field1 ON table1.field2 compopr table2.field2 ON table1.field3 compopr table2.field3 外部连接返回更多记录,在结果中保留不匹配的记录,不管存不存在满足条件的记录都要返回另一侧的所有记录。 FROM table [LEFT|RIGHT]JOIN table2 ON table1.field1comparision table.field2 用左连接来建立外部连接,在表达式的左边的表会显示其所有的数据 例:不管有没有定货量,返回所有商品 SELECT ProductName ,OrderID FROM Products LEFT JOIN Orders ON Products.PrductsID=Orders.ProductID 右连接与左连接的差别在于:不管左侧表有没有匹配的记录,它都从左侧表中返回所有记录。 例:如果想了解客户的信息,并统计各个地区的客户分布,这时可以用一个右连接,即使某个地区没有客户,也要返回客户信息。 空值不会相互匹配,可以通过外连接才能测试被连接的某个表的字段是否有空值。 SELECT * FROM talbe1 LEFT JOIN table2 ON table1.a=table2.c 1 连接查询中使用Iif函数实现以0值显示空值 Iif表达式: Iif(IsNull(Amount,0,Amout) 例:无论定货大于或小于¥50,都要返回一个标志。 Iif([Amount]>50,?Big order?,?Small order?) 五. 分组和总结查询结果 在SQL的语法,GROUP BY和HAVING子句用来对数据进行汇总。GROUP BY子句指明了按照哪几个字段来分组,而将记录分组后,用HAVING子句过滤这些记录。 GROUP BY 子句的语法 SELECT fidldlist FROM table WHERE criteria [GROUP BY groupfieldlist [HAVING groupcriteria]] 注:Microsoft Jet数据库 Jet 不能对备注或OLE对象字段分组。 GROUP BY字段中的Null值以备分组但是不能被省略。 在任何SQL合计函数中不计算Null值。 GROUP BY子句后最多可以带有十个字段,排序优先级按从左到右的顺序排列。 例:在‘WA’地区的雇员表中按头衔分组后,找出具有同等头衔的雇员数目大于1人的所有头衔。 SELECT Title ,Count(Title) as Total FROM Employees WHERE Region = ‘WA’ GROUP BY Title HAVING Count(Title)>1 JET SQL 中的聚积函数 聚集函数 意义 SUM ( ) 求和 AVG ( ) 平均值 COUNT ( ) 表达式中记录的数目 COUNT (* ) 计算记录的数目 MAX 最大值 MIN 最小值 VAR 方差 STDEV 标准误差 FIRST 第一个值 LAST 最后一个值 六. 用Parameters声明创建参数查询 Parameters声明的语法: PARAMETERS name datatype[,name datatype[, …]] 其中name 是参数的标志符,可以通过标志符引用参数. Datatype说明参数的数据类型. 使用时要把PARAMETERS 声明置于任何其他语句之前. 例: PARAMETERS[Low price] Currency,[Beginning date]datatime SELECT OrderID ,OrderAmount FROM Orders WHERE OrderAMount>[low price] AND OrderDate>=[Beginning date] 七. 功能查询 所谓功能查询,实际上是一种操作查询,它可以对数据库进行快速高效的操作.它以选择查询为目的,挑选出符合条件的数据,再对数据进行批处理.功能查询包括更新查询,删除查询,添加查询,和生成表查询. 1 更新查询 UPDATE子句可以同时更改一个或多个表中的数据.它也可以同时更改多个字段的值. 更新查询语法: UPDATE 表名 SET 新值 WHERE 准则 例:英国客户的定货量增加5%,货运量增加3% UPDATE OEDERS SET OrderAmount = OrderAmount *1.1 Freight = Freight*1.03 WHERE ShipCountry = ‘UK’ 2 删除查询 DELETE子句可以使用户删除大量的过时的或冗于的数据. 注:删除查询的对象是整个记录. DELETE子句的语法: DELETE [表名.*] FROM 来源表 WHERE 准则 例: 要删除所有94年前的定单 DELETE * FROM Orders WHERE OrderData<#94-1-1# 3 追加查询 INSERT子句可以将一个或一组记录追加到一个或多个表的尾部. INTO 子句指定接受新记录的表 valueS 关键字指定新记录所包含的数据值. INSERT 子句的语法: INSETR INTO 目的表或查询(字段1,字段2,…) valueS(数值1,数值2,…) 例:增加一个客户 INSERT INTO Employees(FirstName,LastName,title) valueS(‘Harry’,’Washington’,’Trainee’) 4 生成表查询 可以一次性地把所有满足条件的记录拷贝到一张新表中.通常制作记录的备份或副本或作为报表的基础. SELECT INTO子句用来创建生成表查询语法: SELECT 字段1,字段2,… INTO 新表[IN 外部数据库] FROM 来源数据库 WHERE 准则 例:为定单制作一个存档备份 SELECT * INTO OrdersArchive FROM Orders 八. 联合查询 UNION运算可以把多个查询的结果合并到一个结果集显示. UNION运算的一般语法: [表]查询1 UNION [ALL]查询2 UNION … 例:返回巴西所有供给商和客户的名字和城市 SELECT CompanyName,City FROM Suppliers WHERE Country = ‘Brazil’ UNION SELECT CompanyName,City FROM Customers WHERE Country = ‘Brazil’ 注: 缺省的情况下,UNION子句不返回重复的记录.如果想显示所有记录,可以加ALL选项 UNION运算要求查询具有相同数目的字段.但是,字段数据类型不必相同. 每一个查询参数中可以使用GROUP BY 子句 或 HAVING 子句进行分组.要想以指定的顺序来显示返回的数据,可以在最后一个查询的尾部使用OREER BY子句. 九. 交叉查询 交叉查询可以对数据进行总和,平均,计数或其他总和计算法的计算,这些数据通过两种信息进行分组:一个显示在表的左部,另一个显示在表的顶部. Microsoft Jet SQL 用TRANSFROM语句创建交叉表查询语法: TRANSFORM aggfunction SELECT 语句 GROUP BY 子句 PIVOT pivotfield[IN(value1 [,value2[,…]]) ] Aggfounction指SQL聚积函数, SELECT语句选择作为标题的的字段, GROUP BY 分组 说明: Pivotfield 在查询结果集中创建列标题时用的字段或表达式,用可选的IN子句限制它的取值. value代表创建列标题的固定值. 例:显示在1996年每一季度每一位员工所接的定单的数目: TRANSFORM Count(OrderID) SELECT FirstName&’’&LastName AS FullName FROM Employees INNER JOIN Orders ON Employees.EmployeeID = Orders.EmployeeID WHERE DatePart(“yyyy”,OrderDate)= ‘1996’ GROUP BY FirstName&’’&LastName ORDER BY FirstName&’’&LastName POVOT DatePart(“q”,OrderDate)&’季度’ 十 .子查询 子查询可以理解为 套查询.子查询是一个SELECT语句. 1 表达式的值与子查询返回的单一值做比较 语法: 表达式 comparision [ANY|ALL|SOME](子查询) 说明: ANY 和SOME谓词是同义词,与比较运算符(=,,,=)一起使用.返回一个布尔值True或 False.ANY的意思是,表达式与子查询返回的一系列的值逐一比较,只要其中的一次比较产生True结果,ANY测试的返回 True值(既WHERE子句的结果),对应于该表达式的当前记录将进入主查询的结果中.ALL测试则要求表达式与子查询返回的一系列的值的比较都产生 True结果,才回返回True值. 例:主查询返回单价比任何一个折扣大于等于25%的产品的单价要高的所有产品 SELECT * FROM Products WHERE UnitPrice>ANY (SELECT UnitPrice FROM[Order Details] WHERE Discount>0.25) 2 检查表达式的值是否匹配子查询返回的一组值的某个值 语法: [NOT]IN(子查询) 例:返回库存价值大于等于1000的产品. SELECT ProductName FROM Products WHERE ProductID IN (SELECT PrdoctID FROM [Order DEtails] WHERE UnitPrice*Quantity>= 1000) 3检测子查询是否返回任何记录 语法: [NOT]EXISTS (子查询) 例:用EXISTS检索英国的客户 SELECT ComPanyName,ContactName FROM Orders WHERE EXISTS (SELECT * FROM Customers WHERE Country = ‘UK’ AND Customers.CustomerID= Orders.CustomerID)
Contents Overview 1 Lesson 1: Index Concepts 3 Lesson 2: Concepts – Statistics 29 Lesson 3: Concepts – Query Optimization 37 Lesson 4: Information Collection and Analysis 61 Lesson 5: Formulating and Implementing Resolution 75 Module 6: Troubleshooting Query Performance Overview At the end of this module, you will be able to:  Describe the different types of indexes and how indexes can be used to improve performance.  Describe what statistics are used for and how they can help in optimizing query performance.  Describe how queries are optimized.  Analyze the information collected from various tools.  Formulate resolution to query performance problems. Lesson 1: Index Concepts Indexes are the most useful tool for improving query performance. Without a useful index, Microsoft® SQL Server™ must search every row on every page in table to find the rows to return. With a multitable query, SQL Server must sometimes search a table multiple times so each page is scanned much more than once. Having useful indexes speeds up finding individual rows in a table, as well as finding the matching rows needed to join two tables. What You Will Learn After completing this lesson, you will be able to:  Understand the structure of SQL Server indexes.  Describe how SQL Server uses indexes to find rows.  Describe how fillfactor can impact the performance of data retrieval and insertion.  Describe the different types of fragmentation that can occur within an index. Recommended Reading  Chapter 8: “Indexes”, Inside SQL Server 2000 by Kalen Delaney  Chapter 11: “Batches, Stored Procedures and Functions”, Inside SQL Server 2000 by Kalen Delaney Finding Rows without Indexes With No Indexes, A Table Must Be Scanned SQL Server keeps track of which pages belong to a table or index by using IAM pages. If there is no clustered index, there is a sysindexes row for the table with an indid value of 0, and that row will keep track of the address of the first IAM for the table. The IAM is a giant bitmap, and every 1 bit indicates that the corresponding extent belongs to the table. The IAM allows SQL Server to do efficient prefetching of the table’s extents, but every row still must be examined. General Index Structure All SQL Server Indexes Are Organized As B-Trees Indexes in SQL Server store their information using standard B-trees. A B-tree provides fast access to data by searching on a key value of the index. B-trees cluster records with similar keys. The B stands for balanced, and balancing the tree is a core feature of a B-tree’s usefulness. The trees are managed, and branches are grafted as necessary, so that navigating down the tree to find a value and locate a specific record takes only a few page accesses. Because the trees are balanced, finding any record requires about the same amount of resources, and retrieval speed is consistent because the index has the same depth throughout. Clustered and Nonclustered Indexes Both Index Types Have Many Common Features An index consists of a tree with a root from which the navigation begins, possible intermediate index levels, and bottom-level leaf pages. You use the index to find the correct leaf page. The number of levels in an index will vary depending on the number of rows in the table and the size of the key column or columns for the index. If you create an index using a large key, fewer entries will fit on a page, so more pages (and possibly more levels) will be needed for the index. On a qualified select, update, or delete, the correct leaf page will be the lowest page of the tree in which one or more rows with the specified key or keys reside. A qualified operation is one that affects only specific rows that satisfy the conditions of a WHERE clause, as opposed to accessing the whole table. An index can have multiple node levels An index page above the leaf is called a node page. Each index row in node pages contains an index key (or set of keys for a composite index) and a pointer to a page at the next level for which the first key value is the same as the key value in the current index row. Leaf Level contains all key values In any index, whether clustered or nonclustered, the leaf level contains every key value, in key sequence. In SQL Server 2000, the sequence can be either ascending or descending. The sysindexes table contains all sizing, location and distribution information Any information about size of indexes or tables is stored in sysindexes. The only source of any storage location information is the sysindexes table, which keeps track of the address of the root page for every index, and the first IAM page for the index or table. There is also a column for the first page of the table, but this is not guaranteed to be reliable. SQL Server can find all pages belonging to an index or table by examining the IAM pages. Sysindexes contains a pointer to the first IAM page, and each IAM page contains a pointer to the next one. The Difference between Clustered and Nonclustered Indexes The main difference between the two types of indexes is how much information is stored at the leaf. The leaf levels of both types of indexes contain all the key values in order, but they also contain other information. Clustered Indexes The Leaf Level of a Clustered Index Is the Data The leaf level of a clustered index contains the data pages, not just the index keys. Another way to say this is that the data itself is part of the clustered index. A clustered index keeps the data in a table ordered around the key. The data pages in the table are kept in a doubly linked list called the page chain. The order of pages in the page chain, and the order of rows on the data pages, is the order of the index key or keys. Deciding which key to cluster on is an important performance consideration. When the index is traversed to the leaf level, the data itself has been retrieved, not simply pointed to. Uniqueness Is Maintained In Key Values In SQL Server 2000, all clustered indexes are unique. If you build a clustered index without specifying the unique keyword, SQL Server forces uniqueness by adding a uniqueifier to the rows when necessary. This uniqueifier is a 4-byte value added as an additional sort key to only the rows that have duplicates of their primary sort key. You can see this extra value if you use DBCC PAGE to look at the actual index rows the section on indexes internal. . Finding Rows in a Clustered Index The Leaf Level of a Clustered Index Contains the Data A clustered index is like a telephone directory in which all of the rows for customers with the same last name are clustered together in the same part of the book. Just as the organization of a telephone directory makes it easy for a person to search, SQL Server quickly searches a table with a clustered index. Because a clustered index determines the sequence in which rows are stored in a table, there can only be one clustered index for a table at a time. Performance Considerations Keeping your clustered key value small increases the number of index rows that can be placed on an index page and decreases the number of levels that must be traversed. This minimizes I/O. As we’ll see, the clustered key is duplicated in every nonclustered index row, so keeping your clustered key small will allow you to have more index fit per page in all your indexes. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname = ‘Ota’ Nonclustered Indexes The Leaf Level of a Nonclustered Index Contains a Bookmark A nonclustered index is like the index of a textbook. The data is stored in one place and the index is stored in another. Pointers indicate the storage location of the indexed items in the underlying table. In a nonclustered index, the leaf level contains each index key, plus a bookmark that tells SQL Server where to find the data row corresponding to the key in the index. A bookmark can take one of two forms:  If the table has a clustered index, the bookmark is the clustered index key for the corresponding data row. This clustered key can be multiple column if the clustered index is composite, or is defined to be non-unique.  If the table is a heap (in other words, it has no clustered index), the bookmark is a RID, which is an actual row locator in the form File#:Page#:Slot#. Finding Rows with a NC Index on a Heap Nonclustered Indexes Are Very Efficient When Searching For A Single Row After the nonclustered key at the leaf level of the index is found, only one more page access is needed to find the data row. Searching for a single row using a nonclustered index is almost as efficient as searching for a single row in a clustered index. However, if we are searching for multiple rows, such as duplicate values, or keys in a range, anything more than a small number of rows will make the nonclustered index search very inefficient. Note The query corresponding to the slide is: SELECT lastname, firstname FROM member WHERE lastname BETWEEN ‘Master’ AND ‘Rudd’ Finding Rows with a NC Index on a Clustered Table A Clustered Key Is Used as the Bookmark for All Nonclustered Indexes If the table has a clustered index, all columns of the clustered key will be duplicated in the nonclustered index leaf rows, unless there is overlap between the clustered and nonclustered key. For example, if the clustered index is on (lastname, firstname) and a nonclustered index is on firstname, the firstname value will not be duplicated in the nonclustered index leaf rows. Note The query corresponding to the slide is: SELECT lastname, firstname, phone FROM member WHERE firstname = ‘Mike’ Covering Indexes A Covering Index Provides the Fastest Data Access A covering index contains ALL the fields accessed in the query. Normally, only the columns in the WHERE clause are helpful in determining useful indexes, but for a covering index, all columns must be included. If all columns needed for the query are in the index, SQL Server never needs to access the data pages. If even one column in the query is not part of the index, the data rows must be accessed. The leaf level of an index is the only level that contains every key value, or set of key values. For a clustered index, the leaf level is the data itself, so in reality, a clustered index ALWAYS covers any query. Nevertheless, for most of our optimization discussions, we only consider nonclustered indexes. Scanning the leaf level of a nonclustered index is almost always faster than scanning a clustered index, so covering indexes are particular valuable when we need ALL the key values of a particular nonclustered index. Example: Select an aggregate value of a column with a clustered index. Suppose we have a nonclustered index on price, this query is covered: SELECT avg(price) from titles Since the clustered key is included in every nonclustered index row, the clustered key can be included in the covering. Suppose you have a nonclustered index on price and a clustered index on title_id; then this query is covered: SELECT title_id, price FROM titles WHERE price between 10 and 20 Performance Considerations In general, you do want to keep your indexes narrow. However, if you have a critical query that just is not giving you satisfactory performance no matter what you do, you should consider creating an index to cover it, or adding one or two extra columns to an existing index, so that the query will be covered. The leaf level of a nonclustered index is like a ‘mini’ clustered index, so you can have most of the benefits of clustering, even if there already is another clustered index on the table. The tradeoff to adding more, wider indexes for covering queries are the added disk space, and more overhead for updating those columns that are now part of the index. Bug In general, SQL Server will detect when a query is covered, and detect the possible covering indexes. However, in some cases, you must force SQL Server to use a covering index by including a WHERE clause, even if the WHERE clause will return ALL the rows in the table. This is SHILOH bug #352079 Steps to reproduce 1. Make copy of orders table from Northwind: USE Northwind CREATE TABLE [NewOrders] ( [OrderID] [int] NOT NULL , [CustomerID] [nchar] (5) NULL , [EmployeeID] [int] NULL , [OrderDate] [datetime] NULL , [RequiredDate] [datetime] NULL , [ShippedDate] [datetime] NULL , [ShipVia] [int] NULL , [Freight] [money] NULL , [ShipName] [nvarchar] (40) NULL, [ShipAddress] [nvarchar] (60) , [ShipCity] [nvarchar] (15) NULL, [ShipRegion] [nvarchar] (15) NULL, [ShipPostalCode] [nvarchar] (10) NULL, [ShipCountry] [nvarchar] (15) NULL ) INSERT into NewOrders SELECT * FROM Orders 2. Build nc index on OrderDate: create index dateindex on neworders(orderdate) 3. Test Query by looking at query plan: select orderdate from NewOrders The index is being scanned, as expected. 4. Build an index on orderId: create index orderid_index on neworders(orderID) 5. Test Query by looking at query plan: select orderdate from NewOrders Now the TABLE is being scanned, instead of the original index! Index Intersection Multiple Indexes Can Be Used On A Single Table In versions prior to SQL Server 7, only one index could be used for any table to process any single query. The only exception was a query involving an OR. In current SQL Server versions, multiple nonclustered indexes can each be accessed, retrieving a set of keys with bookmarks, and then the result sets can be joined on the common bookmarks. The optimizer weighs the cost of performing the unindexed join on the intermediate result sets, with the cost of only using one index, and then scanning the entire result set from that single index. Fillfactor and Performance Creating an Index with a Low Fillfactor Delays Page Splits when Inserting DBCC SHOWCONTIG will show you a low value for “Avg. Page Density” when a low fillfactor has been specified. This is good for inserts and updates, because it will delay the need to split pages to make room for new rows. It can be bad for scans, because fewer rows will be on each page, and more pages must be read to access the same amount of data. However, this cost will be minimal if the scan density value is good. Index Reorganization DBCC SHOWCONTIG Provides Lots of Information Here’s some sample output from running a basic DBCC SHOWCONTIG on the order details table in the Northwind database: DBCC SHOWCONTIG scanning 'Order Details' table... Table: 'Order Details' (325576198); index ID: 1, database ID:6 TABLE level scan performed. - Pages Scanned................................: 9 - Extents Scanned..............................: 6 - Extent Switches..............................: 5 - Avg. Pages per Extent........................: 1.5 - Scan Density [Best Count:Actual Count].......: 33.33% [2:6] - Logical Scan Fragmentation ..................: 0.00% - Extent Scan Fragmentation ...................: 16.67% - Avg. Bytes Free per Page.....................: 673.2 - Avg. Page Density (full).....................: 91.68% By default, DBCC SHOWCONTIG scans the page chain at the leaf level of the specified index and keeps track of the following values:  Average number of bytes free on each page (Avg. Bytes Free per Page)  Number of pages accessed (Pages scanned)  Number of extents accessed (Extents scanned)  Number of times a page had a lower page number than the previous page in the scan (This value for Out of order pages is not displayed, but is used for additional computations.)  Number of times a page in the scan was on a different extent than the previous page in the scan (Extent switches) SQL Server also keeps track of all the extents that have been accessed, and then it determines how many gaps are in the used extents. An extent is identified by the page number of its first page. So, if extents 8, 16, 24, 32, and 40 make up an index, there are no gaps. If the extents are 8, 16, 24, and 40, there is one gap. The value in DBCC SHOWCONTIG’s output called Extent Scan Fragmentation is computed by dividing the number of gaps by the number of extents, so in this example the Extent Scan Fragmentation is ¼, or 25 percent. A table using extents 8, 24, 40, and 56 has three gaps, and its Extent Scan Fragmentation is ¾, or 75 percent. The maximum number of gaps is the number of extents - 1, so Extent Scan Fragmentation can never be 100 percent. The value in DBCC SHOWCONTIG’s output called Logical Scan Fragmentation is computed by dividing the number of Out of order pages by the number of pages in the table. This value is meaningless in a heap. You can use either the Extent Scan Fragmentation value or the Logical Scan Fragmentation value to determine the general level of fragmentation in a table. The lower the value, the less fragmentation there is. Alternatively, you can use the value called Scan Density, which is computed by dividing the optimum number of extent switches by the actual number of extent switches. A high value means that there is little fragmentation. Scan Density is not valid if the table spans multiple files; therefore, it is less useful than the other values. SQL Server 2000 allows online defragmentation You can choose from several methods for removing fragmentation from an index. You could rebuild the index and have SQL Server allocate all new contiguous pages for you. To rebuild the index, you can use a simple DROP INDEX and CREATE INDEX combination, but in many cases using these commands is less than optimal. In particular, if the index is supporting a constraint, you cannot use the DROP INDEX command. Alternatively, you can use DBCC DBREINDEX, which can rebuild all the indexes on a table in one operation, or you can use the drop_existing clause along with CREATE INDEX. The drawback of these methods is that the table is unavailable while SQL Server is rebuilding the index. When you are rebuilding only nonclustered indexes, SQL Server takes a shared lock on the table, which means that users cannot make modifications, but other processes can SELECT from the table. Of course, those SELECT queries cannot take advantage of the index you are rebuilding, so they might not perform as well as they would otherwise. If you are rebuilding a clustered index, SQL Server takes an exclusive lock and does not allow access to the table, so your data is temporarily unavailable. SQL Server 2000 lets you defragment an index without completely rebuilding it. DBCC INDEXDEFRAG reorders the leaf-level pages into physical order as well as logical order, but using only the pages that are already allocated to the leaf level. This command does an in-place ordering, which is similar to a sorting technique called bubble sort (you might be familiar with this technique if you've studied and compared various sorting algorithms). In-place ordering can reduce logical fragmentation to 2 percent or less, making an ordered scan through the leaf level much faster. DBCC INDEXDEFRAG also compacts the pages of an index, based on the original fillfactor. The pages will not always end up with the original fillfactor, but SQL Server uses that value as a goal. The defragmentation process attempts to leave at least enough space for one average-size row on each page. In addition, if SQL Server cannot obtain a lock on a page during the compaction phase of DBCC INDEXDEFRAG, it skips the page and does not return to it. Any empty pages created as a result of compaction are removed. The algorithm SQL Server 2000 uses for DBCC INDEXDEFRAG finds the next physical page in a file belonging to the index's leaf level and the next logical page in the leaf level to swap it with. To find the next physical page, the algorithm scans the IAM pages belonging to that index. In a database spanning multiple files, in which a table or index has pages on more than one file, SQL Server handles pages on different files separately. SQL Server finds the next logical page by scanning the index's leaf level. After each page move, SQL Server drops all locks and saves the last key on the last page it moved. The next iteration of the algorithm uses the last key to find the next logical page. This process lets other users update the table and index while DBCC INDEXDEFRAG is running. Let us look at an example in which an index's leaf level consists of the following pages in the following logical order: 47 22 83 32 12 90 64 The first key is on page 47, and the last key is on page 64. SQL Server would have to scan the pages in this order to retrieve the data in sorted order. As its first step, DBCC INDEXDEFRAG would find the first physical page, 12, and the first logical page, 47. It would then swap the pages, using a temporary buffer as a holding area. After the first swap, the leaf level would look like this: 12 22 83 32 47 90 64 The next physical page is 22, which is also the next logical page, so no work would be necessary. DBCC INDEXDEFRAG would then swap the next physical page, 32, with the next logical page, 83: 12 22 32 83 47 90 64 After the next swap of 47 with 83, the leaf level would look like this: 12 22 32 47 83 90 64 Then, the defragmentation process would swap 64 with 83: 12 22 32 47 64 90 83 and 83 with 90: 12 22 32 47 64 83 90 At the end of the DBCC INDEXDEFRAG operation, the pages in the table or index are not contiguous, but their logical order matches their physical order. Now, if the pages were accessed from disk in sorted order, the head would need to move in only one direction. Keep in mind that DBCC INDEXDEFRAG uses only pages that are already part of the index's leaf level; it allocates no new pages. In addition, defragmenting a large table can take quite a while, and you will get a report every 5 minutes about the estimated percentage completed. However, except for the locks on the pages being switched, this command needs no additional locks. All the table's other pages and indexes are fully available for your applications to use during the defragmentation process. If you must completely rebuild an index because you want a new fillfactor, or if simple defragmentation is not enough because you want to remove all fragmentation from your indexes, another SQL Server 2000 improvement makes index rebuilding less of an imposition on the rest of the system. SQL Server 2000 lets you create an index in parallel—that is, using multiple processors—which drastically reduces the time necessary to perform the rebuild. The algorithm SQL Server 2000 uses, allows near-linear scaling with the number of processors you use for the rebuild, so four processors will take only one-fourth the time that one processor requires to rebuild an index. System availability increases because the length of time that a table is unavailable decreases. Note that only the SQL Server 2000 Enterprise Edition supports parallel index creation. Indexes on Views and Computed Columns Building an Index Gives the Data Physical Existence Normally, views are only logical and the rows comprising the view’s data are not generated until the view is accessed. The values for computed columns are typically not stored anywhere in the database; only the definition for the computation is stored and the computation is redone every time a computed column is accessed. The first index on a view must be a clustered index, so that the leaf level can hold all the actual rows that make up the view. Once that clustered index has been build, and the view’s data is now physical, additional (nonclustered) indexes can be built. An index on a computed column can be nonclustered, because all we need to store is the index key values. Common Prerequisites for Indexed Views and Indexes on Computed Columns In order for SQL Server to create use these special indexes, you must have the seven SET options correctly specified: ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER, ANSI_NULLS, ANSI_PADDING, ANSI_WARNING must be all ON NUMERIC_ROUNDABORT must be OFF Only deterministic expressions can be used in the definition of Indexed Views or indexes on Computed Columns. See the BOL for the list of deterministic functions and expressions. Property functions are available to check if a column or view meets the requirements and is indexable. SELECT OBJECTPROPERTY (Object_id, ‘IsIndexable’) SELECT COLUMNPROPERTY (Object_id, column_name , ‘IsIndexable’ ) Schema Binding Guarantees That Object Definition Won’t Change A view can only be indexed if it has been built with schema binding. The SQL Server Optimizer Determines If the Indexed View Can Be Used The query must request a subset of the data contained in the view. The ability of the optimizer to use the indexed view even if the view is not directly referenced is available only in SQL Server 2000 Enterprise Edition. In Standard edition, you can create indexed views, and you can select directly from them, but the optimizer will not choose to use them if they are not directly referenced. Examples of Indexed Views: The best candidates for improvement by indexed views are queries performing aggregations and joins. We will explain how the useful indexed views may be created for these two major groups of queries. The considerations are valid also for queries and indexed views using both joins and aggregations. -- Example: USE Northwind -- Identify 5 products with overall biggest discount total. -- This may be expressed for example by two different queries: -- Q1. select TOP 5 ProductID, SUM(UnitPrice*Quantity)- SUM(UnitPrice*Quantity*(1.00-Discount)) Rebate from [order details] group by ProductID order by Rebate desc --Q2. select TOP 5 ProductID, SUM(UnitPrice*Quantity*Discount) Rebate from [order details] group by ProductID order by Rebate desc --The following indexed view will be used to execute Q1. create view Vdiscount1 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount1 (ProductID) However, it will not be used by the Q2 because the indexed view does not contain the SUM(UnitPrice*Quantity*Discount) aggregate. We can construct another indexed view create view Vdiscount2 with schemabinding as select SUM(UnitPrice*Quantity) SumPrice, SUM(UnitPrice*Quantity*(1.00-Discount)) SumDiscountPrice, SUM(UnitPrice*Quantity*Discount) SumDiscoutPrice2, COUNT_BIG(*) Count, ProductID from dbo.[order details] group By ProductID create unique clustered index VDiscountInd on Vdiscount2 (ProductID) This view may be used by both Q1 and Q2. Observe that the indexed view Vdiscount2 will have the same number of rows and only one more column compared to Vdiscount1, and it may be used by more queries. In general, try to design indexed views that may be used by more queries. The following query asking for the order with the largest total discount -- Q3. select TOP 3 OrderID, SUM(UnitPrice*Quantity*Discount) OrderRebate from dbo.[order details] group By OrderID Q3 can use neither of the Vdiscount views because the column OrderID is not included in the view definition. To address this variation of the discount analysis query we may create a different indexed view, similar to the query itself. An attempt to generalize the previous indexed view Vdiscount2 so that all three queries Q1, Q2, and Q3 can take advantage of a single indexed view would require a view with both OrderID and ProductID as grouping columns. Because the OrderID, ProductID combination is unique in the original order details table the resulting view would have as many rows as the original table and we would see no savings in using such view compared to using the original table. Consider the size of the resulting indexed view. In the case of pure aggregation, the indexed view may provide no significant performance gains if its size is close to the size of the original table. Complex aggregates (STDEV, VARIANCE, AVG) cannot participate in the index view definition. However, SQL Server may use an indexed view to execute a query containing AVG aggregate. Query containing STDEV or VARIANCE cannot use indexed view to pre-compute these values. The next example shows a query producing the average price for a particular product -- Q4. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID group by ProductName, od.ProductID This is an example of indexed view that will be considered by the SQL Server to answer the Q4 create view v3 with schemabinding as select od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) Price, COUNT_BIG(*) Count, SUM(od.Quantity) Units from dbo.[order details] od group by od.ProductID go create UNIQUE CLUSTERED index iv3 on v3 (ProductID) go Observe that the view definition does not contain the table Products. The indexed view does not need to contain all tables used in the query that uses the indexed view. In addition, the following query (same as above Q4 only with one additional search condition) will use the same indexed view. Observe that the added predicate references only columns from tables not present in the v3 view definition. -- Q5. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and p.ProductName like '%tofu%' group by ProductName, od.ProductID The following query cannot use the indexed view because the added search condition od.UnitPrice>10 contains a column from the table in the view definition and the column is neither grouping column nor the predicate appears in the view definition. -- Q6. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID To contrast the Q6 case, the following query will use the indexed view v3 since the added predicate is on the grouping column of the view v3. -- Q7. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from [order details] od, Products p where od.ProductID=p.ProductID and od.ProductID in (1,2,13,41) group by ProductName, od.ProductID -- The previous query Q6 will use the following indexed view V4: create view V4 with schemabinding as select ProductName, od.ProductID, SUM(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units, COUNT_BIG(*) Count from dbo.[order details] od, dbo.Products p where od.ProductID=p.ProductID and od.UnitPrice>10 group by ProductName, od.ProductID create unique clustered index VDiscountInd on V4 (ProductName, ProductID) The same index on the view V4 will be used also for a query where a join to the table Orders is added, for example -- Q8. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 group by ProductName, od.ProductID We will show several modifications of the query Q8 and explain why such modifications cannot use the above view V4. -- Q8a. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>25 group by ProductName, od.ProductID 8a cannot use the indexed view because of the where clause mismatch. Observe that table Orders does not participate in the indexed view V4 definition. In spite of that, adding a predicate on this table will disallow using the indexed view because the added predicate may eliminate additional rows participating in the aggregates as it is shown in Q8b. -- Q8b. select ProductName, od.ProductID, AVG(od.UnitPrice*(1.00-Discount)) AvgPrice, SUM(od.Quantity) Units from dbo.[order details] od, dbo.Products p, dbo.Orders o where od.ProductID=p.ProductID and o.OrderID=od.OrderID and od.UnitPrice>10 and o.OrderDate>'01/01/1998' group by ProductName, od.ProductID Locking and Indexes In General, You Should Let SQL Server Control the Locking within Indexes The stored procedure sp_indexoption lets you manually control the unit of locking within an index. It also lets you disallow page locks or row locks within an index. Since these options are available only for indexes, there is no way to control the locking within the data pages of a heap. (But remember that if a table has a clustered index, the data pages are part of the index and are affected by the sp_indexoption setting.) The index options are set for each table or index individually. Two options, Allow Rowlocks and AllowPageLocks, are both set to TRUE initially for every table and index. If both of these options are set to FALSE for a table, only full table locks are allowed. As described in Module 4, SQL Server determines at runtime whether to initially lock rows, pages, or the entire table. The locking of rows (or keys) is heavily favored. The type of locking chosen is based on the number of rows and pages to be scanned, the number of rows on a page, the isolation level in effect, the update activity going on, the number of users on the system needing memory for their own purposes, and so on. SAP databases frequently use sp_indexoption to reduce deadlocks Setting vs. Querying In SQL Server 2000, the procedure sp_indexoption should only be used for setting an index option. To query an option, use the INDEXPROPERTY function. Lesson 2: Concepts – Statistics Statistics are the most important tool that the SQL Server query optimizer has to determine the ideal execution plan for a query. Statistics that are out of date or nonexistent seriously jeopardize query performance. SQL Server 2000 computes and stores statistics in a completely different format that all earlier versions of SQL Server. One of the improvements is an increased ability to determine which values are out of the normal range in terms of the number of occurrences. The new statistics maintenance routines are particularly good at determining when a key value has a very unusual skew of data. What You Will Learn After completing this lesson, you will be able to:  Define terms related to statistics collected by SQL Server.  Describe how statistics are maintained by SQL Server.  Discuss the autostats feature of SQL Server.  Describe how statistics are used in query optimization. Recommended Reading  Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 http://msdn.microsoft.com/library/techart/statquery.htm Definitions Cardinality The cardinality means how many unique values exist in the data. Density For each index and set of column statistics, SQL Server keeps track of details about the uniqueness (or density) of the data values encountered, which provides a measure of how selective the index is. A unique index, of course, has the lowest density —by definition, each index entry can point to only one row. A unique index has a density value of 1/number of rows in the table. Density values range from 0 through 1. Highly selective indexes have density values of 0.10 or lower. For example, a unique index on a table with 8345 rows has a density of 0.00012 (1/8345). If a nonunique nonclustered index has a density of 0.2165 on the same table, each index key can be expected to point to about 1807 rows (0.2165 × 8345). This is probably not selective enough to be more efficient than just scanning the table, so this index is probably not useful. Because driving the query from a nonclustered index means that the pages must be retrieved in index order, an estimated 1807 data page accesses (or logical reads) are needed if there is no clustered index on the table and the leaf level of the index contains the actual RID of the desired data row. The only time a data page doesn’t need to be reaccessed is when the occasional coincidence occurs in which two adjacent index entries happen to point to the same data page. In general, you can think of density as the average number of duplicates. We can also talk about the term ‘join density’, which applies to the average number of duplicates in the foreign key column. This would answer the question: in this one-to-many relationship, how many is ‘many’? Selectivity In general selectivity applies to a particular data value referenced in a WHERE clause. High selectivity means that only a small percentage of the rows satisfy the WHERE clause filter, and a low selectivity means that many rows will satisfy the filter. For example, in an employees table, the column employee_id is probably very selective, and the column gender is probably not very selective at all. Statistics Statistics are a histogram consisting of an even sampling of values for a column or for an index key (or the first column of the key for a composite index) based on the current data. The histogram is stored in the statblob field of the sysindexes table, which is of type image. (Remember that image data is actually stored in structures separate from the data row itself. The data row merely contains a pointer to the image data. For simplicity’s sake, we’ll talk about the index statistics as being stored in the image field called statblob.) To fully estimate the usefulness of an index, the optimizer also needs to know the number of pages in the table or index; this information is stored in the dpages column of sysindexes. During the second phase of query optimization, index selection, the query optimizer determines whether an index exists for a columns in your WHERE clause, assesses the index’s usefulness by determining the selectivity of the clause (that is, how many rows will be returned), and estimates the cost of finding the qualifying rows. Statistics for a single column index consist of one histogram and one density value. The multicolumn statistics for one set of columns in a composite index consist of one histogram for the first column in the index and density values for each prefix combination of columns (including the first column alone). The fact that density information is kept for all columns helps the optimizer decide how useful the index is for joins. Suppose, for example, that an index is composed of three key fields. The density on the first column might be 0.50, which is not too useful. However, as you look at more key columns in the index, the number of rows pointed to is fewer than (or in the worst case, the same as) the first column, so the density value goes down. If you are looking at both the first and second columns, the density might be 0.25, which is somewhat better. Moreover, if you examine three columns, the density might be 0.03, which is highly selective. It does not make sense to refer to the density of only the second column. The lead column density is always needed. Statistics Maintenance Statistics Information Tracks the Distribution of Key Values SQL Server statistics is basically a histogram that contains up to 200 values of a given key column. In addition to the histogram, the statblob field contains the following information:  The time of the last statistics collection  The number of rows used to produce the histogram and density information  The average key length  Densities for other combinations of columns In the statblob column, up to 200 sample values are stored; the range of key values between each sample value is called a step. The sample value is the endpoint of the range. Three values are stored along with each step: a value called EQ_ROWS, which is the number of rows that have a value equal to that sample value; a value called RANGE_ROWS, which specifies how many other values are inside the range (between two adjacent sample values); and the number of distinct values, or RANGE_DENSITY of the range. DBCC SHOW_STATISTICS The DBCC SHOW_STATISTICS output shows us the first two of these three values, but not the range density. The RANGE_DENSITY is instead used to compute two additional values:  DISTINCT_RANGE_ROWS—the number of distinct rows inside this range (not counting the RANGE_HI_KEY value itself. This is computed as 1/RANGE_DENSITY.  AVG_RANGE_ROWS—the average number of rows per distinct value, computed as RANGE_DENSITY * RANGE_ROWS. In addition to statistics on indexes, SQL Server can also keep track of statistics on columns with no indexes. Knowing the density, or the likelihood of a particular value occurring, can help the optimizer determine an optimum processing strategy, even if SQL Server can’t use an index to actually locate the values. Statistics on Columns Column statistics can be useful for two main purposes  When the SQL Server optimizer is determining the optimal join order, it frequently is best to have the smaller input processed first. By ‘input’ we mean table after all filters in the WHERE clause have been applied. Even if there is no useful index on a column in the WHERE clause, statistics could tell us that only a few rows will quality, and those the resulting input will be very small.  The SQL Server query optimizer can use column statistics on non-initial columns in a composite nonclustered index to determine if scanning the leaf level to obtain the bookmarks will be an efficient processing strategy. For example, in the member table in the credit database, the first name column is almost unique. Suppose we have a nonclustered index on (lastname, firstname), and we issue this query: select * from member where firstname = 'MPRO' In this case, statistics on the firstname column would indicate very few rows satisfying this condition, so the optimizer will choose to scan the nonclustered index, since it is smaller than the clustered index (the table). The small number of bookmarks will then be followed to retrieve the actual data. Manually Updating Statistics You can also manually force statistics to be updated in one of two ways. You can run the UPDATE STATISTICS command on a table or on one specific index or column statistics, or you can also execute the procedure sp_updatestats, which runs UPDATE STATISTICS against all user-defined tables in the current database. You can create statistics on unindexed columns using the CREATE STATISTICS command or by executing sp_createstats, which creates single-column statistics for all eligible columns for all user tables in the current database. This includes all columns except computed columns and columns of the ntext, text, or image datatypes, and columns that already have statistics or are the first column of an index. Autostats By Default SQL Server Will Update Statistics on Any Index or Column as Needed Every database is created with the database options auto create statistics and auto update statistics set to true, but you can turn either one off. You can also turn off automatic updating of statistics for a specific table in one of two ways:  UPDATE STATISTICS In addition to updating the statistics, the option WITH NORECOMPUTE indicates that the statistics should not be automatically recomputed in the future. Running UPDATE STATISTICS again without the WITH NORECOMPUTE option enables automatic updates.  sp_autostats This procedure sets or unsets a flag for a table to indicate that statistics should or should not be updated automatically. You can also use this procedure with only the table name to find out whether the table is set to automatically have its index statistics updated. ' However, setting the database option auto update statistics to FALSE overrides any individual table settings. In other words, no automatic updating of statistics takes place. This is not a recommended practice unless thorough testing has shown you that you do not need the automatic updates or that the performance overhead is more than you can afford. Trace Flags Trace flag 205 – reports recompile due to autostats. Trace flag 8721 – writes information to the errorlog when AutoStats has been run. For more information, see the following Knowledge Base article: Q195565 “INF: How SQL Server 7.0 Autostats Work.” Statistics and Performance The Performance Penalty of NOT Having Up-To-Date Statistics Far Outweighs the Benefit of Avoiding Automatic Updating Autostats should be turned off only after thorough testing shows it to be necessary. Because autostats only forces a recompile after a certain number or percentage of rows has been changed, you do not have to make any adjustments for a read-only database. Lesson 3: Concepts – Query Optimization What You Will Learn After completing this lesson, you will be able to:  Describe the phases of query optimization.  Discuss how SQL Server estimates the selectivity of indexes and column and how this estimate is used in query optimization. Recommended Reading  Chapter 15: “The Query Processor”, Inside SQL Server 2000 by Kalen Delaney  Chapter 16: “Query Tuning”, Inside SQL Server 2000 by Kalen Delaney  Whitepaper about SQL Server Query Processor Architecture by Hal Berenson and Kalen Delaney http://msdn.microsoft.com/library/backgrnd/html/sqlquerproc.htm Phases of Query Optimization Query Optimization Involves several phases Trivial Plan Optimization Optimization itself goes through several steps. The first step is something called Trivial Plan Optimization. The whole idea of trivial plan optimization is that cost based optimization is a bit expensive to run. The optimizer can try a great many possible variations trying to find the cheapest plan. If SQL Server knows that there is only one really viable plan for a query, it could avoid a lot of work. A prime example is a query that consists of an INSERT with a VALUES clause. There is only one possible plan. Another example is a SELECT where all the columns are in a unique covering index, and that index is the only one that is useable. There is no other index that has that set of columns in it. These two examples are cases where SQL Server should just generate the plan and not try to find something better. The trivial plan optimizer finds the really obvious plans, which are typically very inexpensive. In fact, all the plans that get through the autoparameterization template result in plans that the trivial plan optimizer can find. Between those two mechanisms, the plans that are simple tend to be weeded out earlier in the process and do not pay a lot of the compilation cost. This is a good thing, because the number of potential plans in 7.0 went up astronomically as SQL Server added hash joins, merge joins and index intersections, to its list of processing techniques. Simplification and Statistics Loading If a plan is not found by the trivial plan optimizer, SQL Server can perform some simplifications, usually thought of as syntactic transformations of the query itself, looking for commutative properties and operations that can be rearranged. SQL Server can do constant folding, and other operations that do not require looking at the cost or analyzing what indexes are, but that can result in a more efficient query. SQL Server then loads up the metadata including the statistics information on the indexes, and then the optimizer goes through a series of phases of cost based optimization. Cost Based Optimization Phases The cost based optimizer is designed as a set of transformation rules that try various permutations of indexes and join strategies. Because of the number of potential plans in SQL Server 7.0 and SQL Server 2000, if the optimizer just ran through all the combinations and produced a plan, the optimization process would take a very long time to run. Therefore, optimization is broken up into phases. Each phase is a set of rules. After each phase is run, the cost of any resulting plan is examined, and if SQL Server determines that the plan is cheap enough, that plan is kept and executed. If the plan is not cheap enough, the optimizer runs the next phase, which is another set of rules. In the vast majority of cases, a good plan will be found in the preliminary phases. Typically, if the plan that a query would have had in SQL Server 6.5 is also the optimal plan in SQL Server 7.0 and SQL Server 2000, the plan will tend to be found either by the trivial plan optimizer or by the first phase of the cost based optimizer. The rules were intentionally organized to try to make that be true. The plan will probably consist of using a single index and using nested loops. However, every once in a while, because of lack of statistical information, or some other nuance, the optimizer will have to proceed with the later phases of optimization. Sometimes this is because there is a real possibility that the optimizer could find a better plan. When a plan is found, it becomes the optimizer’s output, and then SQL Server goes through all the caching mechanisms that we have already discussed in Module 5. Full Optimization At some point, the optimizer determines that it has gone through enough preliminary phases, and it reverts to a phase called full optimization. If the optimizer goes through all the preliminary phases, and still has not found a cheap plan, it examines the cost for the plan that it has so far. If the cost is above the threshold, the optimizer goes into a phase called full optimization. This threshold is configurable, as the configuration option ‘cost threshold for parallelism’. The full optimization phase assumes that this plan should be run this in parallel. If the machine is very busy, the plan will end up running it in serial, but the optimizer has a goal to produce a good parallel. If the cost is below the threshold (or a single processor machine), the full optimization phase just uses a brute force method to find a serial plan. Selectivity Estimation Selectivity Is One of The Most Important Pieces of Information One of the most import things the optimizer needs to know is the number of rows from any table that will meet all the conditions in the query. If there are no restrictions on a table, and all the rows will be needed, the optimizer can determine the number of rows from the sysindexes table. This number is not absolutely guaranteed to be accurate, but it is the number the optimizer uses. If there is a filter on the table in a WHERE clause, the optimizer needs statistics information. Indexes automatically maintain statistics, and the optimizer will use these values to determine the usefulness of the index. If there is no index on the column involved in the filter, then column statistics can be used or generated. Optimizing Search Arguments In General, the Filters in the WHERE Clause Determine Which Indexes Will Be Useful If an indexed column is referenced in a Search Argument (SARG), the optimizer will analyze the cost of using that index. A SARG has the form:  column value  value column  Operator must be one of =, >, >= <, <= The value can be a constant, an operation, or a variable. Some functions also will be treated as SARGs. These queries have SARGs, and a nonclustered index on firstname will be used in most cases: select * from member where firstname < 'AKKG' select * from member where firstname = substring('HAAKGALSFJA', 2,5) select * from member where firstname = 'AA' + 'KG' declare @name char(4) set @name = 'AKKG' select * from member where firstname < @name Not all functions can be used in SARGs. select * from charge where charge_amt < 2*2 select * from charge where charge_amt < sqrt(16) Compare these queries to ones using = instead of <. With =, the optimizer can use the density information to come up with a good row estimate, even if it’s not going to actually perform the function’s calculations. A filter with a variable is usually a SARG The issue is, can the optimizer come up with useful costing information? A filter with a variable is not a SARG if the variable is of a different datatype, and the column must be converted to the variable’s datatype For more information, see the following Knowledge Base article: Q198625 Enter Title of KB Article Here Use credit go CREATE TABLE [member2] ( [member_no] [smallint] NOT NULL , [lastname] [shortstring] NOT NULL , [firstname] [shortstring] NOT NULL , [middleinitial] [letter] NULL , [street] [shortstring] NOT NULL , [city] [shortstring] NOT NULL , [state_prov] [statecode] NOT NULL , [country] [countrycode] NOT NULL , [mail_code] [mailcode] NOT NULL ) GO insert into member2 select member_no, lastname, firstname, middleinitial, street, city, state_prov, country, mail_code from member alter table member2 add constraint pk_member2 primary key clustered (lastname, member_no, firstname, country) declare @id int set @id = 47 update member2 set city = city + ' City', state_prov = state_prov + ' State' where lastname = 'Barr' and member_no = @id and firstname = 'URQYJBFVRRPWKVW' and country = 'USA' These queries don’t have SARGs, and a table scan will be done: select * from member where substring(lastname, 1,2) = ‘BA’ Some non-SARGs can be converted select * from member where lastname like ‘ba%’ In some cases, you can rewrite your query to turn a non-SARG into a SARG; for example, you can rewrite the substring query above and the LIKE query that follows it. Join Order and Types of Joins Join Order and Strategy Is Determined By the Optimizer The execution plan output will display the join order from top to bottom; i.e. the table listed on top is the first one accessed in a join. You can override the optimizer’s join order decision in two ways:  OPTION (FORCE ORDER) applies to one query  SET FORCEPLAN ON applies to entire session, until set OFF If either of these options is used, the join order is determined by the order the tables are listed in the query’s FROM clause, and no optimizer on JOIN ORDER is done. Forcing the JOIN order may force a particular join strategy. For example, in most outer join operations, the outer table is processed first, and a nested loops join is done. However, if you force the inner table to be accessed first, a merge join will need to be done. Compare the query plan for this query with and without the FORCE ORDER hint: select * from titles right join publishers on titles.pub_id = publishers.pub_id -- OPTION (FORCE ORDER) Nested Loop Join A nested iteration is when the query optimizer constructs a set of nested loops, and the result set grows as it progresses through the rows. The query optimizer performs the following steps. 1. Finds a row from the first table. 2. Uses that row to scan the next table. 3. Uses the result of the previous table to scan the next table. Evaluating Join Combinations The query optimizer automatically evaluates at least four or more possible join combinations, even if those combinations are not specified in the join predicate. You do not have to add redundant clauses. The query optimizer balances the cost and uses statistics to determine the number of join combinations that it evaluates. Evaluating every possible join combination is inefficient and costly. Evaluating Cost of Query Performance When the query optimizer performs a nested join, you should be aware that certain costs are incurred. Nested loop joins are far superior to both merge joins and hash joins when executing small transactions, such as those affecting only a small set of rows. The query optimizer:  Uses nested loop joins if the outer input is quite small and the inner input is indexed and quite large.  Uses the smaller input as the outer table.  Requires that a useful index exist on the join predicate for the inner table.  Always uses a nested loop join strategy if the join operation uses an operator other than an equality operator. Merge Joins The columns of the join conditions are used as inputs to process a merge join. SQL Server performs the following steps when using a merge join strategy: 1. Gets the first input values from each input set. 2. Compares input values. 3. Performs a merge algorithm. • If the input values are equal, the rows are returned. • If the input values are not equal, the lower value is discarded, and the next input value from that input is used for the next comparison. 4. Repeats the process until all of the rows from one of the input sets have been processed. 5. Evaluates any remaining search conditions in the query and returns only rows that qualify. Note Only one pass per input is done. The merge join operation ends after all of the input values of one input have been evaluated. The remaining values from the other input are not processed. Requires That Joined Columns Are Sorted If you execute a query with join operations, and the joined columns are in sorted order, the query optimizer processes the query by using a merge join strategy. A merge join is very efficient because the columns are already sorted, and it requires fewer page I/O. Evaluates Sorted Values For the query optimizer to use the merge join, the inputs must be sorted. The query optimizer evaluates sorted values in the following order: 1. Uses an existing index tree (most typical). The query optimizer can use the index tree from a clustered index or a covered nonclustered index. 2. Leverages sort operations that the GROUP BY, ORDER BY, and CUBE clauses use. The sorting operation only has to be performed once. 3. Performs its own sort operation in which a SORT operator is displayed when graphically viewing the execution plan. The query optimizer does this very rarely. Performance Considerations Consider the following facts about the query optimizer's use of the merge join:  SQL Server performs a merge join for all types of join operations (except cross join or full join operations), including UNION operations.  A merge join operation may be a one-to-one, one-to-many, or many-to-many operation. If the merge join is a many-to-many operation, SQL Server uses a temporary table to store the rows. If duplicate values from each input exist, one of the inputs rewinds to the start of the duplicates as each duplicate value from the other input is processed.  Query performance for a merge join is very fast, but the cost can be high if the query optimizer must perform its own sort operation. If the data volume is large and the desired data can be obtained presorted from existing Balanced-Tree (B-Tree) indexes, merge join is often the fastest join algorithm.  A merge join is typically used if the two join inputs have a large amount of data and are sorted on their join columns (for example, if the join inputs were obtained by scanning sorted indexes).  Merge join operations can only be performed with an equality operator in the join predicate. Hashing is a strategy for dividing data into equal sets of a manageable size based on a given property or characteristic. The grouped data can then be used to determine whether a particular data item matches an existing value. Note Duplicate data or ranges of data are not useful for hash joins because the data is not organized together or in order. When a Hash Join Is Used The query optimizer uses a hash join option when it estimates that it is more efficient than processing queries by using a nested loop or merge join. It typically uses a hash join when an index does not exist or when existing indexes are not useful. Assigns a Build and Probe Input The query optimizer assigns a build and probe input. If the query optimizer incorrectly assigns the build and probe input (this may occur because of imprecise density estimates), it reverses them dynamically. The ability to change input roles dynamically is called role reversal. Build input consists of the column values from a table with the lowest number of rows. Build input creates a hash table in memory to store these values. The hash bucket is a storage place in the hash table in which each row of the build input is inserted. Rows from one of the join tables are placed into the hash bucket where the hash key value of the row matches the hash key value of the bucket. Hash buckets are stored as a linked list and only contain the columns that are needed for the query. A hash table contains hash buckets. The hash table is created from the build input. Probe input consists of the column values from the table with the most rows. Probe input is what the build input checks to find a match in the hash buckets. Note The query optimizer uses column or index statistics to help determine which input is the smaller of the two. Processing a Hash Join The following list is a simplified description of how the query optimizer processes a hash join. It is not intended to be comprehensive because the algorithm is very complex. SQL Server: 1. Reads the probe input. Each probe input is processed one row at a time. 2. Performs the hash algorithm against each probe input and generates a hash key value. 3. Finds the hash bucket that matches the hash key value. 4. Accesses the hash bucket and looks for the matching row. 5. Returns the row if a match is found. Performance Considerations Consider the following facts about the hash joins that the query optimizer uses:  Similar to merge joins, a hash join is very efficient, because it uses hash buckets, which are like a dynamic index but with less overhead for combining rows.  Hash joins can be performed for all types of join operations (except cross join operations), including UNION and DIFFERENCE operations.  A hash operator can remove duplicates and group data, such as SUM (salary) GROUP BY department. The query optimizer uses only one input for both the build and probe roles.  If join inputs are large and are of similar size, the performance of a hash join operation is similar to a merge join with prior sorting. However, if the size of the join inputs is significantly different, the performance of a hash join is often much faster.  Hash joins can process large, unsorted, non-indexed inputs efficiently. Hash joins are useful in complex queries because the intermediate results: • Are not indexed (unless explicitly saved to disk and then indexed). • Are often not sorted for the next operation in the execution plan.  The query optimizer can identify incorrect estimates and make corrections dynamically to process the query more efficiently.  A hash join reduces the need for database denormalization. Denormalization is typically used to achieve better performance by reducing join operations despite redundancy, such as inconsistent updates. Hash joins give you the option to vertically partition your data as part of your physical database design. Vertical partitioning represents groups of columns from a single table in separate files or indexes. Subquery Performance Joins Are Not Inherently Better Than Subqueries Here is an example showing three different ways to update a table, using a second table for lookup purposes. The first uses a JOIN with the update, the second uses a regular introduced with IN, and the third uses a correlated subquery. All three yield nearly identical performance. Note Note that performance comparisons cannot just be made based on I/Os. With HASHING and MERGING techniques, the number of reads may be the same for two queries, yet one may take a lot longer and use more memory resources. Also, always be sure to monitor statistics time. Suppose you want to add a 5 percent discount to order items in the Order Details table for which the supplier is Exotic Liquids, whose supplierid is 1. -- JOIN solution BEGIN TRAN UPDATE OD SET discount = discount + 0.05 FROM [Order Details] AS OD JOIN Products AS P ON OD.productid = P.productid WHERE supplierid = 1 ROLLBACK TRAN -- Regular subquery solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE productid IN (SELECT productid FROM Products WHERE supplierid = 1) ROLLBACK TRAN -- Correlated Subquery Solution BEGIN TRAN UPDATE [Order Details] SET discount = discount + 0.05 WHERE EXISTS(SELECT supplierid FROM Products WHERE [Order Details].productid = Products.productid AND supplierid = 1) ROLLBACK TRAN Internally, Your Join May Be Rewritten SQL Server’s query processor had many different ways of resolving your JOIN expressions. Subqueries may be converted to a JOIN with an implied distinct, which may result in a logical operator of SEMI JOIN. Compare the plans of the first two queries: USE credit select member_no from member where member_no in (select member_no from charge) select distinct m.member_no from member m join charge c on m.member_no = c.member_no The second query uses a HASH MATCH as the final step to remove the duplicates. The first query only had to do a semi join. For these queries, although the I/O values are the same, the first query (with the subquery) runs much faster (almost twice as fast). Another similar looking join is

62,047

社区成员

发帖
与我相关
我的任务
社区描述
.NET技术交流专区
javascript云原生 企业社区
社区管理员
  • ASP.NET
  • .Net开发者社区
  • R小R
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告

.NET 社区是一个围绕开源 .NET 的开放、热情、创新、包容的技术社区。社区致力于为广大 .NET 爱好者提供一个良好的知识共享、协同互助的 .NET 技术交流环境。我们尊重不同意见,支持健康理性的辩论和互动,反对歧视和攻击。

希望和大家一起共同营造一个活跃、友好的社区氛围。

试试用AI创作助手写篇文章吧