关于vector的一个小问题，挺有意思的。

孩皮妞野 2003-01-23 08:31:55

从CUJ的一篇答疑中发现了这个问题，先不忙看原文，想一下这个问题。

假如有这么个类：
class C{
public:
C():str(0){}
C(char const* src):str(new char[strlen(src)+1])
{
strcpy(str,src);
}
C(const C& c) :str(new char[strlen(c.str)+1])
{
strcpy(str,c.str);
}
~C(){delete str;}
private:
char * str;
};

现在我要把该类的对象放在std::vector<C>中，有没有问题？

...全文

108 18 打赏收藏转发到动态举报

写回复

用AI写文章

18 条回复

切换为时间正序

请发表友善的回复…

发表回复

JoshuaLi 2003-03-11

打赏
举报

学习

KennyYuan 2003-01-29

打赏
举报

不错的topic，等我有时间也来参加讨论，呵呵

一个网友发明了move_t的新方法，还发了文章和代码给我，但是我一直还没有时间能看一看，过节后应该可以有时间了，呵呵，想来这些可以放在一起做一个主题研究。

SatanLi1982 2003-01-29

打赏
举报

see

shornmao 2003-01-29

打赏
举报

你不是已经看了CUJ的文章吗？那篇move constructor就是从库的角度解决这个问题的，

但是为什么不提供operator=呢？这是设计上的失误，在目前的标准下，这种情况下提供operator=是必需的。

如果你认为operator=导致了不必要的复制，这是正确的，但是不要把例子错误的原因归结于此，你提供了operator=以后，就不会发生错误了。

那么，我们才可以开始讨论效率。

为什么语言层面不提供呢？当然它提供的话最好了。Alexander不是都已经暗示了吗？temporary关键字，很显眼哦，临时的，好了，我不需要多余的复制了。但是自从1998年ISO C++被approved以后，按照ISO的工作方式，不是随时都可以修改标准的，需要每5年一个周期，所以按照惯例，今年应该可以对C++标准进行修改了，所以所有的修改要求必须被提出，而且要通告委员会的讨论，才可以被通过。
你在这里提出从语言层面提出支持，并没有任何作用，而在标准未修改之前，只能从库的角度解决。所以Alexander的mojo是个值得使用的东西。
最后，这个要求并不一定可以获得通过，因为考虑到实现的原因，也许发现临时对象比较困难，此外向C++中增加新的关键字和运算符，并没有你想象中的那么容易。C++的关键字已经够多了。
有一个小故事不知道你是否知道，原先C++中没有typename这个关键字，模板的类型参数是借用class关键字的，就是为了尽可能少的减少关键字的使用。后来由于从模板类的外部引用模板类中声明(typedef)的类型，所以才不得不引入了新的typename关键字，否则语法分析器无法分辨是类型还是对象。
所以，为了可以简单的通过库的方法实现的要求，向C++中添加关键字的话，一向是阻力很大的。
全部改成move constructe，在某些场合下，要求必须保留原值，也不可能做成这样。

孩皮妞野 2003-01-28

打赏
举报

两位没有好好看我的贴子。

可以不谦虚地说，我选择容器的水准不一定比两位差。
vector::erase只是引出问题的一个例子。这种冗余的copy construct和assignment在c++ 社区中意见很大，包括一些C++ columnist。

destructive assignment是默认实现的，不需要为每个类重载，仅当该类有回指指针时才需要用户自定义，请问对C++用户而言麻烦从和而来？

topikachu 2003-01-28

打赏
举报

up merlinran(天行者)
确实如此，我在开始的时候就说过
“如果就事论事，那么这个程序也就这些问题了。不过在实际工作中，有人这样作，我是会打他屁股的：）
知道要在头部操作为什么还用vector ？？”

毕竟在stl中有一些更好的容器可以给你选择。可以说，现在造成了这个局面不是因为语言本身的不足，而是由于程序员（甚至是系统设计之初）的自身原因造成的。所以要记住贤人的教诲：
item 1: choose your containers with care.

merlinran 2003-01-28

打赏
举报

单单为了适应vector这种情形，而引入一个新的语言要素，是得不偿失的。这样做实际上要求每一个有可能放在容器中的类都实现destructive assignment语义，也就相当于要求每一个这样的类实现一个额外的函数（重载操作符也是一种函数）。更致命的是，这会使程序员陷入泥潭。

vector根本不是设计为这么使用的，应该从程序结构上避免这种低效的做法。

即使是使用memcpy()，要拷贝多个内存块也非常耗时；同时对vector这种连续存储的情况，又没有办法把后续的整块做一次性搬移，还是必须对每一个对象进行destructive assignment操作。

只是在容器元素有指针成员的情况，这种做法才有明显的作用，也就是用浅拷贝代替深拷贝。替代的办法是为指针成员做引用计数。

至于operator=，Big Three原则要求：copy ctor, copy assignment operator和dtor三者中只要自定义了一个，就要实现另外两个。如果不存在指针成员或者其它需要显式释放的资源，dtor不是必须，但另两者的同时存在是必须的。违备这个原则，topikachu(皮皮)所指出的那种简单的情况都会出错。

以下代码和阐述引自《ANSI-ISO C++ Professional Programmer's Handbook》（自认为是本好书，值得一看，有在线版本）：
class Year
{
private:
int y;
bool cached; //has the object been cached?
public:
//...
Year(int y);
Year(const Year& other) //cached should not be copied
{
y = other.getYear();
}
Year& operator =(const Year&other) //cached should not be copied
{
y = other.getYear();
return *this;
}
int getYear() const { return y; }
};//no destructor required for class Year

Class Year does not allocate memory from the free store, nor does it acquire any other resources during its construction. A destructor is therefore unnecessary. However, the class needs a user-defined copy constructor and assignment operator to ensure that the value of the member that is cached is not copied because it is calculated for every individual object separately.
这个例子可能不太贴切，但它确实揭示了这种可能。

孩皮妞野 2003-01-27

打赏
举报

自定义:=的一个例子。

class Parent;

struct Child{
char buff[1024*1024];
Parent * pParent;
Child * pNextSibling;
};

struct Parent{
Child * pFirstChild;

// Ctor, AddChild, etc

// Precondition *this is raw memory with sizeof(Parent)
// its not constructed, so needn't be destructed.
// Post: rhs is invalid, Parent is bitwisely equal to rhs.
Parent& operator :=(__destructible Parent& rhs){
//所有child();
pFirstChild->pParent = this;
// 也可以考虑下面一句由编译器生成，便于优化
mepcpy(this,&rhs, sizeof(Parent));
}
// 看看对应的 =
Parent& operator =（const Parent& rhs){
if(this == &rhs) return *this;
析构所有child();
为所有rhs中的child构造一个副本并设置相应的指针();
}
};

这里面节省的工作量以吨计，而这种destructive assignment语义相当常用。

孩皮妞野 2003-01-27

打赏
举报

1. 这个问题确实可以用trait解决, 但是很不方便，因为为了安全，默认应该是不可以memcpy移动的，而实际上绝大多数类对象都应该是可memcpy移动的。[不能memcpy的是包含有回指到this或this+n的指针的对象的类]

[QUOTE]
至于你说的同时要求有两种语意，我想非但编译器作不出，就连普通程序员都会糊涂
试想对于同一个语句
foo=bar;
在某种情况下是完全拷贝，在某种情况下又需要完成destructive，那么以什么标准来判断调用那种语意呢？[/QUOTE]
2. 如果语言引入一个移动赋值运算符，例如： :=, 又当如何？

假如有了destructive assign, 比如:=, erase可以这么写。

iterator erase (iterator position)
{
_Destroy(position) // 析构position出的对象
for(iterator tmp=position, tmp2=position; ++tmp2!=end(); ++tmp,++tmp2)
*tmp := *tmp2;
return position;
}

对于没有定义:=的类，:=意味着按位拷贝，而对于不满足按位拷贝语义的，才需要定义:=.

这样起码对每个对象节省了一次析构，一次构造。[assignment实际上完成了这两件事]。

据我所知，不少编译器可以把简单的连续内存拷贝有化成memcpy. 这样实际上就是泛型代码和对应的手写C代码没有效率上的差别了。

孩皮妞野 2003-01-27

打赏
举报

后来我想了一下，也许映入一个 __destructive 关键字就可以解决所有的问题了。

所谓的:=运算符根本多余，可以根据rhs的__destructive属性来resolve.

earthharp 2003-01-27

打赏
举报

这么长，有时间再看了。

hfqian 2003-01-27

打赏
举报

对代码中使用C++的深度很佩服

topikachu 2003-01-26

打赏
举报

第一，erase最后调用的是copy，至少我看的几个stl中都是这样作的，所以说到底是copy的效率

至于你说的同时要求有两种语意，我想非但编译器作不出，就连普通程序员都会糊涂
试想对于同一个语句
foo=bar;
在某种情况下是完全拷贝，在某种情况下又需要完成destructive，那么以什么标准来判断调用那种语意呢？呵呵。

是否是non-trival assignment，或者说是否可以用memcpy从stl这个规范上来说是作不出来的，除非编译器够强劲，比如N32或者N64。它在编译的时候可以判断出某个class是不是pod的。或者由程序员手工指定，当然要采用一定的技巧，在sgi中就是__type_traits了。

孩皮妞野 2003-01-25

打赏
举报

实际上我这里需要的是同时具备两种assign(及copy construct)语义。

一种就像通常的，制作一个独立的副本，另一种是destructive的.

对于vector<>::erase(), 即使有non-trival assignment的类, 大多数情况下，仍然可以[而且应该]特化成使用memcpy.

比如对上面的例子类，
class C{
public:
C():str(0){}
C(char const* src):str(new char[strlen(src)+1])
{
strcpy(str,src);
}
C(const C& c) :str(new char[strlen(c.str)+1])
{
strcpy(str,c.str);
}
~C(){delete str;}
private:
char * str;
};

完全可以把要erase()的对象析构，然后用memcpy把其后的对象前移填空。
我目前能想出的不能简单memcpy的，是其指向的对象包含有回指的指针。这种情况不太常见，但是不能忽视。

我想目前stl的各种实现版本里不可能把vector<>::erase()做到这么好吧？

孩皮妞野 2003-01-24

打赏
举报

写得不错，改天再细看。

孩皮妞野 2003-01-23

打赏
举报

class C{
public:
...
~C(){delete[] str;} // 更正
private:
char * str;
};

孩皮妞野 2003-01-23

打赏
举报

粗看之下，我也觉得std::vector<?>::erase不保证调用被erase对象的destructor太诡异了，我想这可能是VC++的实现不合理，于是在C++Builder中作了如下试验：
以下是我在C++Builder 6.0中的测试代码：

#include <iostream>
#include <vector>

class C{
int i;
public:
C(int i=0) : i(i){}
~C(){ std::cout<<i<<" is distroyed"<<std::endl;}
C& operator = (C const& rhs){
/*if( &rhs == this) return *this;*/
std::cout<<"assigning "<< rhs.i<<" to "<<i<<std::endl;
i = rhs.i;
return *this;
}
};

/*======================================================================*/

#pragma argsused
int main(int argc, char* argv[])
{
std::vector<C> vc;
for(unsigned i=0; i<5; i++){
vc.push_back(C(i));
}
std::cout<<"***************************"<<std::endl;
vc.erase(vc.begin()+1);
std::cout<<"***************************"<<std::endl;

return 0;
}
运行结果：
0 is distroyed
0 is distroyed
1 is distroyed
0 is distroyed
1 is distroyed
2 is distroyed
3 is distroyed
0 is distroyed
1 is distroyed
2 is distroyed
3 is distroyed
4 is distroyed
***************************
assigning 2 to 1
assigning 3 to 2
assigning 4 to 3
4 is distroyed
***************************

看完试验结果才发现erase这样实现确实合情合理。那么如何保证被erase对象正确析构呢？对了，operator =.

想起来看过一篇资料，好像把有default ctor, copy ctor, operator =的类称为nice类，原来如此。

所以对于其对象要放在vector中的类，如果你有需要释放的资源，不光要有destructor, 还要实现相应的operator = 。

这里面也可以看出vector::erase是比较费时的. 同样的C程序，绝对不可能这样写。看起来vector::erase应该根据被容纳的对象的类型进行区别处理，对于大多数类型来说，destruct当前对象，剩下的向前memcpy是充分的。

我感到为了效率，有必要有一种destructive assignment, 即从语义上讲赋值完后，rhs同时被销毁(不再有效，不能被再次引用，也无须再次析构）的 "="。

在最近一期的CUJ上，Andrei Alexandrescu有一篇类似题材的文章（Generic<Programming>: Move Constructors，http://www.cuj.com/experts/2102/alexandr.htm?topic=experts）
，但是还不够理想。最好是从语言的层面上来满足这个需要。

孩皮妞野 2003-01-23

打赏
举报

这是问题的原文：
Q. Hi Bobby,

I wonder if you could shed some light as to why vector::erase is not required to call the destructor of the item being erased? I have recently run into a resource leak caused by this problem. I should mention that I am speaking to a vector<CWidget> and not to a vector<CWidget *>.

The problem I had was a widget with a member variable that was a string class. The string class dynamically allocates memory as needed, and the memory is cleaned up in the destructor. If my Widget destructor had been called on vector::erase, the resource would not have leaked!

This bug was “discovered” after untold hours of trying to fix a production system! Our current strategy is to avoid vectors whenever an erase may be called or to use vectors of pointers and take control of the destruction ourselves.

I have enclosed some sample code that demonstrates the problem. By the way, I am using Visual C++ 6 with the latest service packs.

Thanks for your time.

— Ron Jacobs (avid CUJ reader)

以下是示例的代码：
Listing 1: vector erasure
#include <stdio.h>
#include <vector>
using namespace std;

bool tracing = false;

class X
{
public:
X(int value) : value(value)
{
}
X &operator=(X const &that)
{
if (tracing)
printf(“assign %d over %d\n”, that.value, value);
value = that.value;
return *this;
}
~X()
{
if (tracing)
printf(“destruct %d\n”, value);
}
private:
int value;
};

int main()
{
vector<X> c;
c.push_back(1);
c.push_back(2);
c.push_back(3);
printf(“during erase...\n”);
tracing = true;
c.erase(c.begin());
printf(“\nduring container destruction...\n”);
}

这是解答，可以先不忙看：
You’ve run into a distressingly common misconception of how vectors work. If it’s any consolation, the problem has nothing to do with Visual C++.

What you are seeing is correct vector behavior, or at least allowable vector behavior. When you erase a single element at index I from a vector of length N, the C++ Standard guarantees that:

The element type’s destructor will be called once.
The element type’s assignment operator will be called N-(I+1) times — that is, once for every element following the one being erased.
The Standard does not say which elements will have the destructor and assignment operator called upon them. You are assuming that when you erase the element at a given index I (where I is 0 in your example), that precise element will have its destructor called. The Standard makes no such guarantee; indeed, based on how vectors are implemented, the Standard couldn’t make this guarantee.

Remember, a vector is contiguous from its first element. When you erase an element, any trailing elements are shifted toward the front to fill the hole. The final element slot in the vector is thus rendered logically empty and can be destroyed.

The program in Listing 1 erases the first element of a three-element vector. When I run that program on several compilers, I see this event sequence:

The second element is copied over the first.
The third element is copied over the second.
The third element is destroyed.
Compare that sequence to what you want:

The first element is destroyed.
The second element is copied over the first.
The third element is copied over the second.
When it comes time for the vector itself to be destroyed, you are left with three elements: the two “good” ones and the third unused one. That makes for four total element-destructor calls: one during the erasure, plus three during the vector destruction. If you followed this pattern to its conclusion by continually erasing the front element, you’d end up with six destructor calls to purge a three-element vector.

To get the behavior you want, you should try a list. As an experiment, substitute list for vector in my test program and then compare the results.