STL之父Stepanov文章中译版及注释

myan 2001-01-04 06:45:00
作为C++标准草案的一部分,STL提供了用以建立通用的、高度可重用的算法和数据结构框架
Alexander Stepanov
Byte杂志1995年10月刊
中文精简译本 myan译
************************************************************************************************
[作者简介]亚历山大·斯特帕诺夫,STL之父,是SGI公司的一名技术专家,他在那里从事有关C++库和工具的
研究工作。他曾经在惠普实验室、贝尔实验室、纽约州布鲁克林的Polytechinc大学和通用电气公司研发部门
工作过。 研究领域包括:通用软件库,存储系统,操作系统,编译系统,机器人寻径算法。
email:stepanov@mti.sgi.com
************************************************************************************************
/* 译者注:
* ISO/ANSI C++已经与1998年初标准化。STL的发展和应用是目前C++发展中最活跃也是最有前途的方向,
* 在OOP领域C++显然已经不敌后来者Java和C#之类,但是在STL的强大支持下,C++仍然是目前进行高效灵活
* 的通用编程的首选语言。随着STL背后的Genericity思想的不断发展和演化,以及这种思想对各个领域内的
* 不断渗透,最终可能使人们编程范式发生又一次的变革。这种情形被侯俊杰先生拿来跟十年前OOP初显端倪
* 时相比较,可以说是不无道理。本文是1995年10月STL之父Stepanov发表在Byte杂志上的文章。其中有不少
* STL具体技术的介绍,由于并不十分全面,所以翻译时有所删简。至于有关STL设计思想、特点的介绍则全
* 部译出。
*
*/

在每种程序设计语言中都需要构造不同的数据结构,象vector, lists, associative arrays等。程序员
们也需要为这些数据结构提供一些基本算法用来实现排序、查找和拷贝等操作。很遗憾的,长时间以来C++没
有提供优秀的标准数据结构集合。
/* 译者注:
* vector是顺序存取,而且可以自动扩展存贮空间的的线性表;list是线性链表;associative array
* 是关联数组。关联数祖在实践中存在广泛的应用,所以Larry Wall在他创造
*/

不过最终这个问题得到了补救。标准模板库(STL)作为一个数据结构(在STL中称为容器)和算法的框架
(framework),已经被接纳成为标准C++草案的一部分。

在STL诞生之后的短暂时间里,已经产生了很多充满感情色彩,而又相互抵触的评价。一方面,以Bjarne
Stroustrup(C++之父)的话为例,他称STL为“庞大的、系统的、简洁的、形式健全的、广泛的、精致的和高
效的框架”。另一方面,Leiden大学Pamela Seymour写道:“STL看上去象是一个记忆力特别好的的汇编语言
程序员的机器宏语言库(macro library)。”

一、目标:通用 + 高效
STL不是企图又加上一个标准来折磨人。它不是由某个委员会,或者为某个委员会设计的。它是15年来我在
不同地点、使用不同语言、与不同的伙伴合作对通用编程(generic programming)研究的结果。我是带着这样一
个坚定的目标做这个研究的:找一个用最通用的手段写算法的方法,而且使用这种方法,算法的抽象性不会给
性能打折扣。
在提到“最通用的手段”时,我的意思是:该算法对于任何有意义的数据结构都适用。例如,一个用最通用
的线性查找应该可以对于任何数据结构进行,它能够“观察”数据、移动到下一个元素、在查找范围结束时停止。
这样这个算法就可以适用于数组、单链表、双链表、文件,甚至是一个二叉树。这个算法应当也能够对于数据结
构的一部分使用。比如你可能希望把它用在半长的链表或者数组中元素间距为n的全体元素集合(也叫Stride).
在提到“不会给性能打折扣”时,我的意思是什么呢?换句话说,你怎么知道一个通用算法是不是高效?一
个通用算法如果与用同样语言写的非通用算法效率相同,我称它是相对高效;如果与用汇编语言写的非通用算
法效率相同,我称它是绝对高效。
多年以来,我一直试图使用高级语言(比如Ada和Scheme)达成相对高效,但是失败了。我的通用版本的算法,
甚至是很简单的算法也无法和语言内建的算法相匹敌。但是使用C++,我不但最终达成了相对高效,而且接近了
更加野心勃勃的目标:达成绝对高效。为了验证这一点,我花了无数个小时观察不同体系中用不同编译器产生
的汇编代码。
我发现效率和通用性并非是绝对矛盾的。实际上反过来说可能更接近事实。如果一个组件不够高效,那么
通常是因为它不够抽象。这是因为效率和抽象都需要简洁的、正交化的设计。在数学上存在相似的现象:一个
数学证明越是抽象,就越是简洁精致。
/* 译者注:
* 这里的正交化是othogonal的翻译,其具体意义简下面的解释
*/

二、正交化的组件空间
过去25年来,所有的程序设计学变革都试图把所有的程序缩减为单一的概念。例如函数式编程,把一切
都看成函数,一切有关状态、地址和副作用(side effect)的想法都被禁止。然后,随着OOP的诞生,函数成
了过街老鼠,所有的东西都是带有状态的对象。
/* 译者注:
* 在汇编语言中,利用大量的(指令产生的)程序状态信息、对象地址信息和指令副作用能够设计出精妙而高效
* 的算法,而这些用法通常在高级语言中没有用上,所以导致高级语言的效率低于汇编语言。C语言的效率接近
* 汇编,也正是因为C能够被比较容易地翻译成高效的汇编代码,并且在一定程度上利用状态信息、地址信息和
* 副作用。
*/

STL受到函数式编程和OOP两方面的影响都很深。但它不是一个单一范式的库,而是一个诺伊曼计算机上通用
的编程库。
STL基于构件空间的正交分解。例如:一个数组和一个二分查找不应当捆绑到一起。这两者是很不一样的。
数祖是一个容纳数据的组件,一个数据结构。二分查找是在某数据结构上进行计算工作的组件,一个算法。只要
数据结构提供了合适的方法,就可以对它进行二分查找。只有认识到数组和二分查找的本质区别,高效和精致才
能同时达成。
/* 译者注:
* 作者认为OOP中把通用算法与数据结构捆绑到一起的做法无益于解决方案的高效和精致,所以他大胆地提出
* 重视两者的不同,允许全局性的通用算法(相当于C中的函数)在数据结构上直接操作。算法打破了数据结构
* 的封装,并且能够作用于多种数据结构之上,这就是正交分解。虽然这破坏了了所谓数据结构的封装性,但是
* 最大限度地利用了各种信息和副作用,达成了简洁和高效。
*/


三、迭代子(Iterators)
理解STL的关键概念是迭代子,它是把算法和数据结构粘在一起的通用指针。STL无视那些把指针看成魔鬼的学
术教条,在这一点上好像是一种倒退。STL把指针看成是设计的基石,而不是拼命地把它藏在那些貌似值变量的东西
后面。作出让指针重新成为受人注目的元素的决定,是基于以下的简单事实:在程序设计中,大部分的东西实际上
是用于表明数据的位置,这一点上与指针是一样的。例如:因特网地址、SCSI地址、文件句柄等都是象指针那样工作。
比如你要完成一个任务,把生产雇员的名单打出来。所有雇员的名字被存储在一个vector中——在STL中,vector
是一个一维的动态数组,你使用STL算法remove_copy_if()来打印列表。
An STL Implementation of remove_copy_if()
template <class InputIterator, class OutputIterator, class Predicate>
OutputIterator remove_copy_if(InputIterator first, InputIterator last,
OutputIterator result, Predicate pred) {
while (first != last) {
if (!prod(*first)) *result++ = *first;
++ first; }
return result;
}
函数begin()end()返回容器中第一个元素的位置,end()返回容器中最后一个元素后面的位置。STL要求每种容器中迭
代子所能指向的位置数量恰好比实际元素的数量多1。STL组件ostream_iterator提供了用迭代子方式向屏幕输出的手
段。
这一点很重要:如果你稍后决定换用链表来存储雇员的名字,那么你除了一个变量声明之外,什么都不用改变。
remove_copy_if()算法在vector, lists, deque, set甚至在用户自定义的容器类上都可以使用,只要这些容器类
提供了类STL的迭代子。这个算法也可以用于常规的C数组。

四、迭代子分类
STL把迭代子分成五类:input, output, forward, bidirectional和random-access。我的一个重要发现是:
数以百计的实用算法可以借助这些抽象的术语来描述。
STL为每一类迭代子指定了各自的合法表达式集合,并对这些表达式的复杂程度做出了限定性的要求。使用
这些抽象的接口来描述的算法保证用户能得到高效的性能。
不同的算法需要不同的迭代子,在不同的数据结构上执行不同的工作。STL使用了一种新颖的语言技术依据
迭代子的种类在编译时选择合适的算法。

五、通用算法
从刚才remove_copy_if的例子中,我们发现它的使用看上去就像常规的C代码,只不过函数声明所不同。我已
经发现,连那些不了解C++的C程序员也觉得用STL编程是很容易的,因为一些基本术语他们早就熟悉了。正因为所
有的几类迭代子都是从指针中抽象出来的,所以可以保证用非常有效率的手段实现它们。
在计算机科学中,把抽象建立在高效模型的基础上是非常重要的。换言之,我确信remove_copy_if()的高效
性是因为当它运用在平常的C数组上时能够产生优质的代码。实际上,如果你使用STL函数对象(function object)
而不是函数指针来调用remove_copy_if(),你可以获得通常只有手写汇编代码才能达到的效率。
/* 译者注:
* 所谓函数对象,就是实现了operator()操作符重载的类的对象。这里作者指出,函数对象较之普通的函数指针
* 能够产生更加高效的代码,值得我们注意
*/

六、未来
我希望,STL能够被成为大型的、系统目录式的(systematic catalogs)、高度可参数化的软件组件发展过程的
开端。ANSI/ISO C++委员会看到了STL的希望,提供了一条途径帮助那些正在工作的程序员能够使用通用编程。
我愿意借此机会倡议建立一个业界共同体来开发新型的通用组件。没有哪一个单独的公司能够积累足够的计算
技术专业知识完成这项目标。让所有的基础算法和数据结构成为公用的、廉价的,这是每一个人都感兴趣的事。
/* 译者注:
* 这一段虽然简短,但是却是方向性的宣言。我们可以猜想,未来的大型系统是由若干具体组件构成,而每一个
* 这种具体组件都是通过类似填表、提供实际参数类型的方式从抽象的、通用的组建中实例化而来的。而且这些
* 组件都具有极高的效率。那时软件业的格局又将产生天翻地覆的变化。
*/

*****************************************************************************************************

Part of the draft C++ standard, STL provides the framework for building generic, highly
reusable algorithms and data structures
Alexander Stepanov
In every programming language, there's a need for various data structures, such as vectors,
lists, and associative arrays. Programmers also need fundamental algorithms -- for sorting,
searching, and copying -- defined for the data structures. It has long been lamented that C++
doesn't provide a good set of standard data structures.

But at last this problem has been remedied. The Standard Template Library is a framework of
data structures (called containers in STL) and algorithms accepted as part of the draft C++
standard. A reference implementation of STL has bee n put into the public domain by
Hewlett-Packard (it can be downloaded from butler.hpl.hp.com), and a growing number of
commercial vendors are now shipping STL.

In the short time since its release, STL has generated many emotional -- and conflicting --
assessments. On one hand, for example, Bjarne Stroustrup of Bell Laboratories calls it a
"large, systematic, clean, formally sound, comprehensible, elegant, and efficient framework."
On the other hand, Pamela Seymour of Leiden University writes that "STL looks like the
machine language macro library of an anally retentive assembly language programmer."

Goal: Generality + Efficiency

STL is not an attempt to impose yet another standard on a suffering humanity. And it was not
designed by or for a committee. It is the result of over 15 years of research in generic
programming that I've done in different places, with different collaborators, and in
different programming languages. I did this research with a concrete goal in mind: to find a
way to write algorithms in the most general way, but in such a way that their abstractness
would not impose any performance penalty.

What do I mean by "in the most general way"? Simply that an algorithm works on all data types
for which it makes sense. For example, a linear-search algorithm is written in the most
general way if it can search any data structure for which the operations of looking at data,
going to the next data element, and indicating the end of the search range are defined. So,
it should work for an array, a singly linked list, a doubly linked list, a file, and even a
binary tree.

An algorithm should also work for portions of such structures. For example, you might want to
search half a list or sum the set of elements in an array that are n spaces apart (i.e., a
stride).

What do I mean when I say that an algorithm does not "impose any performance penalty"? In
other words, how do you know that a generic algorithm is efficient? An algorithm is called
rel atively efficient if it's as efficient as a nongeneric version written in the same
language, and it's called absolutely efficient if it's as efficient as a nongeneric assembly
language version.

For many years, I tried to achieve relative efficiency in more advanced languages (e.g., Ada
and Scheme) but failed. My generic versions of even simple algorithms were not able to
compete with built-in primitives. But in C++ I was finally able to not only accomplish
relative efficiency but come very close to the more ambitious goal of absolute efficiency. To
verify this, I spent countless hours looking at the assembly code generated by different
compilers on different architectures.

I found that efficiency and generality were not mutually exclusive. In fact, quite the
reverse is true. If a component is not efficient enough, it usually means that it's not
abstract enough. This is because efficiency and abstractness both require a clean, orthogonal
design. A similar phenomenon occurs in mathematics: Making a proof more abstract makes it
more concise and elegant.

Orthogonal Component Space

The past 25 years have seen attempts to revolutionize programming by reducing all programs to
a single conceptual primitive. Functional programming, for example, made everything into a
function; the notions of states, addresses, and side effects were taboo. Then, with the
advent of object-oriented programming (OOP), functions became taboo; everything became an
object (with a state).

STL is heavily influenced by both functional programming and OOP. But it's not a
single-paradigm library; rather, it's a library for general-purpose programming of von
Neumann computers.

STL is based on an orthogonal decomposition of component space. For example, an array and a
binary search should not be reduced to a single, fundamental notion. The two are quite
different. An array is a data structure -- a component that holds data. A binary search is an
algorithm -- a component that performs a computation on data stored in a data structure. As
long as a data structure provides an adequate access method, you can use the binary-search
algorithm on it. Only by respecting the fundamental differences of arrays and binary searches
can efficiency and elegance be simultaneously achieved.

Iterators

The key to STL is the notion of iterators , which are generalized pointers that provide a
glue for connecting algorithms and data structures. STL is indeed retrograde in its disregard
of the current academic dogma suggesting that pointers are evil. Instead of hiding pointers
behind value semantics, it makes them the corner-stone of the design. The decision to bring
pointers back into the realm of respectability was based on a simple fact: Most things in
programming resemble pointers in that they identify a location of data. For instance,
Internet addresses, SCSI addresses, and file descriptors all function as pointers.

Consider the task of printing a list of productive employees (see the listing "Printing Names
of Productive Employees" ). The employees' names are stored in a vector , an STL version of a
one-dimensional dynamic array. To print the names of productive employees, you use the STL
function remove_copy_if() , which scans the range of elements from its first argument up to,
but not including, its second argument and copies those that do not satisfy a predicate (its
fourth argument) into positions starting from its third argument. (For most people, the code
is clearer than the explanation.) The functions begin() and end() return iterators pointing
to the first element and past the last element in the vector, respectively. (STL requires
that for every container, the number of valid iterators pointing to it is one greater than
the number of elements in the container.) The STL component ostream_iterator provides an
iterator-like interface to an output stream.

It's imp ortant to note that if you later decide to put employees' names in a list instead of
in a vector, you do not have to change anything except the declaration of the variable all .
The remove_copy_if() function works for vectors, lists, deques, and sets (which are all STL
components), as well as for any user-defined container that provides STL-conforming
iterators. It also works for regular C arrays.

Iterator Categories

STL classifies iterators into five categories: input, output, forward, bidirectional, and
random-access. These iterator categories are sets of requirements for operations that are
supported by concrete iterator types. An important experimental discovery I made was that
hundreds of different practical algorithms can be written in terms of these abstract
categories.

STL specifies a set of valid expressions for each category's iterators, as well as precise
semantics for each iterator's usage. For example, given that i is a value of a typ e that
belongs to a bidirectional iterator category, if ++i is defined, then -- (++i) == i . STL
also prescribes certain complexity requirements for these expressions. Users are thereby
guaranteed that algorithms written in terms of these abstract interfaces will work
effectively.

Different algorithms require different kinds of iterators, and different algorithms are
needed to perform different operations on different data structures. STL uses a novel
language technique that selects the right algorithm at compile time, depending on the
iterator category.

Generic Algorithms

The listing "An STL Implementation of remove_copy_if() " illustrates how STL deals with
iterators. What's most striking is the fact that it looks just like regular C code; only the
signature is different. In fact, I've found that C programmers find it quite easy to start
programming in STL even when they don't know C++, because the underlying idioms are alrea dy
familiar to them. The fact that all the iterator categories are abstracted from pointers
ensures that there is an efficient implementation for them.

In computer science it's important to base abstractions on efficient models. In other words,
I believe that remove_copy_if() is efficient because it generates good code when used with
plain C arrays. In fact, if you use remove_copy_if() with STL function objects rather than
with pointers to functions, as I did in the listing "Printing Names of Productive Employees,"
you can obtain code that is often just as efficient as hand-written assembly code.

The Future

It is my hope that STL will prove to be the beginning of a long process of developing
systematic catalogs of highly parameterized software components. The ANSI/ISO C++ standard
committee saw the promise of STL and provided a conduit through which generic programming
could reach working programmers.

I'd like to use this opp ortunity to advocate the creation of an industrywide consortium for
developing new generic components. No single company can accumulate the algorithmic expertise
that is needed for such an activity. And it is in everybody's interest that all the
fundamental algorithms and data structures be universally and inexpensively available.



--------------------------------------------------------------------------------

ACKNOWLEDGMENTS

I would like to express my gratitude to Bjarne Stroustrup, Ross Towle, Jim Dehnert, Suresh
Srinivas, Mary Loomis, and Larry Rosler for reviewing this article.


--------------------------------------------------------------------------------

Printing Names of Productive Employees
vector<Employee> all;
bool is_manager(const Employee& x) {
return x.title == manager }
...
remove_copy_if(
all.begin(),
all.end(),
ostream_iterator<Employee>(cout),
is_manager);






--------------------------------------------------------------------------------

An STL Implementation of remove_copy_if()
template <class InputIterator, class OutputIterator, class Predicate>
OutputIterator remove_copy_if(InputIterator first, InputIterator last,
OutputIterator result, Predicate pred) {
while (first != last) {
if (!prod(*first)) *result++ = *first;
++ first; }
return result; }






--------------------------------------------------------------------------------
Alexander Stepanov is a member of the technical staff at Silicon Graphics, Inc. (Mountain
View, CA), where he works on C++ libraries and tools. Prior to joining SGI, he worked for HP
Labs, AT&T Bell Labs, Polytechnic University (Brooklyn, NY), and GE R&D. He has worked in
such areas as generic software libraries, storage systems, OSes, compilers, and path-planning
algorithms for robots. He can be contacted on the Internet at mailto:stepanov@mti.sgi.comor
on BIX c/o "editors."




...全文
227 3 打赏 收藏 举报
写回复
3 条回复
切换为时间正序
当前发帖距今超过3年,不再开放新的回复
发表回复
tibco 2001-01-09
不错,感谢myan!!
  • 打赏
  • 举报
回复
Virtual 2001-01-05
这是好东西
  • 打赏
  • 举报
回复
wao 2001-01-05
好文章。我放弃使用delphi的一个主要原因就是STL
  • 打赏
  • 举报
回复
发帖
非技术区
加入

1.5w+

社区成员

C/C++ 非技术区
社区管理员
  • 非技术区社区
申请成为版主
帖子事件
创建了帖子
2001-01-04 06:45
社区公告
暂无公告