整数转换为2进制数后，求该2进制数中1的个数

expert2000 2005-03-31 04:40:47

在看nasm汇编，有一段代码看不懂，在书的63页。

int count bits (unsigned int x )
{
static unsigned int mask[] = { 0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF };
int i ;
int shift ; / number of positions to shift to right /
for ( i=0, shift =1; i < 5; i++, shift = 2 )
x = (x & mask[i ]) + ( ( x >> shift) & mask[i ] );
return x;
}

代码的目的是将整数转换为2进制数后，求该2进制数中1的个数。
请教高手，这怎么就得到正确结果了呢？原理是什么？

...全文

381 26 打赏收藏转发到动态举报

写回复

用AI写文章

26 条回复

切换为时间正序

请发表友善的回复…

发表回复

junguo 2005-04-01

打赏
举报

还有搂主给的程序中存在错误：
for ( i=0, shift =1; i < 5; i++, shift = 2 )
x = (x & mask[i ]) + ( ( x >> shift) & mask[i ] );

shift =2；应该为 shift *= 2;

junguo 2005-04-01

打赏
举报

其实换成二进制数以后，就容易理解多了！

首先，我们假设一个数字，比如 1000 0100 0010 0001，我们要计算该数中“1”的个数。（我用16位代替了32位，这样分析起来简单一些）

循环第一步中mask[] 中的数是0x55 55 55 55,也就是 0101 0101 0101 0101，这样该步计算就是

1100 0100 0010 0001 ＋ 0110 0010 0001 0000 (右移一位) ＝ 1000 0100 0001 0001
& 0101 0101 0101 0101 &0101 0101 0101 0101
----------------------- ---------------------
0100 0100 0000 0001 0100 0000 0001 0000

我们可以看到在第一步中，其实是以4bit为单位来计算的，在该步运算中，第1和2，5和6，9和10，13和14位其实代表的该4bit单元中，后两位的和。这个我们可以用例子来说明。我们用4bit位代替比较好理解：比如 1,0001 2,0010 3,0011，这是后两位的不同组合条件。

0001 + 0000 0010 + 0001 0011 + 0001
&0101 &0101 &0101 &0101 &0101 &0101
---------------- = 0001 --------------- = 0001---------------- = 0010
0001 0000 0000 0001 0001 0001

我们可以看到该步操作的目的就是：第一次与预算的时候，把第一位是否为1记录下来；右移后，第二位是否为1也记录下来，这样相加后就是前两位的1的个数，从以上的式子中可以看出分为三中情况，但结果有两种1或者2（0010）。同理3，4位现在保存的是也是3，4位1的个数，也分为两种结果。

这样我们就可以得到这样一个序列 10 00 01 00 00 01 00 01，每两位代表的意义是该两位的1的和。而后我们所要做的就是把这各和序列加在一起。

第二次循环的目的是求出四bit位中的和。
我们再看：
10 00 01 00 00 01 00 01 + 00 10 00 01 00 00 01 00 （右移两位）
&00 11 00 11 00 11 00 11 &00 11 00 11 00 11 00 11
--------------------------------- --------------------------- = 00 10 00 01 00 01 00 01
00 00 00 00 00 01 00 01 00 10 00 01 00 00 00 00

我们再看该步的目的，还是以4字节位单位，第一步中先那后两位与11与操作，即保留后两位，去除前两位；第二步中，右移两位后，那后两位与11与，即保留后两位，而去除了前两位。因为前两位和后两位分别代表的他们各自的1的个数，相加后各四个bit位所保留的数字其实就是他们各自含1的个数。
好了，这样我们就得到这样一个序列 0010 0001 0001 0001。各个四位代表的意义是原数中它们位置上所含一的个数。这样下一步操作的目的就是把这四个数的和加在一起。分析步骤同上。

寻开心 2005-03-31

打赏
举报

这样说比较复杂，构造出例子就好理解了

对于duiyliixixi() ( )说的，这样构造

第一个表 1 3 5 7
第二个表 2 3 6 7
第三个表 4 5 6 7
第四个表 8 .....

arrowcy 2005-03-31

打赏
举报

一看到这句话
There is yet another clever method of counting the bits that are on in
data.
我还以为又有新方法了，原来就是楼主的那个程序的那种，这篇文章解释的还是比较透彻的

pcboyxhy 2005-03-31

打赏
举报

万变不离其宗，
最基本的数值算法

lalalalala 2005-03-31

打赏
举报

其实那个程序是把一系列的运算分成5个类似的阶段在循环中实现

expert2000 2005-03-31

打赏
举报

唉，技术不行，英文不行，学的累啊。

还好有高手无私奉献，给我希望，给我信心。

恩，太阳下山了，明天来结帖吧。^!^

expert2000 2005-03-31

打赏
举报

There is yet another clever method of counting the bits that are on in
data. This method literally adds the one’s and zero’s of the data together.
This sum must equal the number of one’s in the data. For example, consider
counting the one’s in a byte stored in a variable named data. The first step
is to perform the following operation:
data = (data & 0x55) + ((data >> 1) & 0x55);
What does this do? The hex constant 0x55 is 01010101 in binary. In the
first operand of the addition, data is AND’ed with this, bits at the odd
bit positions are pulled out. The second operand ((data >> 1) & 0x55)
first moves all the bits at the even positions to an odd position and uses
the same mask to pull out these same bits. Now, the first operand contains
the odd bits and the second operand the even bits of data. When these
two operands are added together, the even and odd bits of data are added
together. For example, if data is 10110011, then:
data & 01010101 00 01 00 01
+ (data >> 1) & 01010101 + 01 01 00 01
01 10 00 10

The addition on the right shows the actual bits added together. The
bits of the byte are divided into four 2-bit fields to show that actually there
are four independent additions being performed. Since the most these sums
can be is two, there is no possibility that the sum will overflow its field and
corrupt one of the other field’s sums.
Of course, the total number of bits have not been computed yet. However,
the same technique that was used above can be used to compute the
total in a series of similar steps. The next step would be:
data = (data & 0x33) + ((data >> 2) & 0x33);
Continuing the above example (remember that data now is 01100010):
data & 00110011 0010 0010
+ (data >> 2) & 00110011 + 0001 0000
0011 0010

Now there are two 4-bit fields to that are independently added.
The next step is to add these two bit sums together to form the final
result:
data = (data & 0x0F) + ((data >> 4) & 0x0F);
Using the example above (with data equal to 00110010):
data & 00001111 00000010
+ (data >> 4) & 00001111 + 00000011
00000101

Now data is 5 which is the correct result.
以上是书上的原文，下面是对函数代码的原文解释：
An implementation
of this method that counts the bits in a double word. It uses a for
loop to compute the sum. It would be faster to unroll the loop; however, the
loop makes it clearer how the method generalizes to different sizes of data.

expert2000 2005-03-31

打赏
举报

arrowcy(长弓手)的猜测是正确的，happy__888([顾问团]寻开心)的讲解更精彩。不知道他们怎么就想到这个方法的，尤其是那些0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF。

arrowcy 2005-03-31

打赏
举报

依次类推，第三次循环的结果就是，如果以每8位为单位，这些4个“8位数”加起来的话，就是1的个数；第四次循环的结果就是，2个“16位数”加起来就是1的个数；最后一次循环，就是1个“32位数”“加起来”就是1的个数，实际上这是就已经不需要相加就得到结果了
就是这样的
不知道楼主能看懂我说的什么不？这个东西比较难表达清楚的

arrowcy 2005-03-31

打赏
举报

第二次循环开始时，x是每两位表示一个数字的，如果把这些位的数字加起来，就是总的1的个数
所以第二次循环，用11001100……这种数来和x求与，结果就是x中的第一个“2位数”与第二个“2位数”相加了，第3个“2位数”与第4个“2位数”相加了……
这次循环的最终结果就是，x中以4位数为单位，如果把这4位4位的数字转化成一个十进制数就一共有8个数，这8个数加起来就是1的个数

arrowcy 2005-03-31

打赏
举报

第一次循环中，x与mask[0]求与以后，x的奇数位如果有1结果就在对应位上为1
x右移一位与mask[0]求与以后，x的偶数位如果有1结果就在他后面的那个奇数位上为1
这两个结果相加，结果就是从最低位开始，每两位数字的就代表原来的x中，这两位上有多少个1
如果有一个1，则那两位就是01，2个1就是10，0个就是00

寻开心 2005-03-31

打赏
举报

to liixixi() ( )
这个问题和你说的那个表格没有关系，那个表格很简单第一个表格是1，第二个表示2，第三个表示4，第四个个表示8 可以安排16个数（五个表可以安排1到31，都没有的就是0）
把1到15之间的数分解成为1，2，4，8的和
分解后，包含了那个数就放入那个表
比如 6 ＝ 2 + 4 放入第二和第三个表格当中，7＝1+2+4放入第一个第二和第三个表格当中
这样，只要告诉你数在那几个表格当中出现，你就可以根据这个加法反算出来的

而楼主给的这个算法不是这样的
怎么说呢，有点类似折半处理的意思
0x5555555表示的mask获得到的是x的偶数位
0x5555555有右移一位作为mask得到的是x的奇数位
他们相加，表示出来的是相邻两位的1的数量，注意：这个加法有进位，但是不会超出2bit的

运算后的32位的x，可以理解为一个结构，由16个2个bit的数字单元构成的，这个2bit表示的是相邻两位当中的1的个数

然后0x3333333表示的恰好是16个2bit的偶数位置的8个
0x3333333再次右移2位置，表示的是奇数位置的那个8个
他们相加，按照上面说的来理解，
结果就是8个4bit的单元构成的一个结构，每个单元表示原来的x数值的4bit当中1的数量
....

arrowcy 2005-03-31

打赏
举报

第一次循环，求出的x的最后两位数就表示了原来x的最后两位有多少个1（这个很容易看出来）
第二次循环，求出的x的最后四位…………………………四位有多少个1（猜测）
………………………………………………………………………………………………
第n次循环，求出x的最后2^n位……………………………2^n位有多少个1（猜测）

arrowcy 2005-03-31

打赏
举报

那个for循环里面有点问题，改成这样：
show_bin(x);
printf("&");
show_bin(mask[i]);
printf("=");
show_bin(x&mask[i]);
show_bin(x>>shift);
printf("&");
show_bin(mask[i]);
printf("=");
show_bin((x>>shift)&mask[i]);
x = (x & mask[i ]) + ( ( x >> shift) & mask[i ] );
printf("sum=");
show_bin(x);
printf("-----------------------------------------\n");

expert2000 2005-03-31

打赏
举报

to healer_kx(天降甘草)
数学规律看出来了，但是不知道有什么用

to liixixi()
我现在要问的是，他是怎么通过5张纸条知道我的生日的？
还有你小学就学2进制？？？

to arrowcy(长弓手)
确实是shift * = 2

arrowcy 2005-03-31

打赏
举报

比较难分析阿，用这个可能会对分析这个有些帮助
#include <stdio.h>
#include <stdlib.h>

void show_bin(unsigned int x)
{
char tt[33];
_itoa(x,tt,2);
printf("\t%32s\t0x%8X\n",tt,x);
}
int count_bits (unsigned int x )
{
static unsigned int mask[] = { 0x55555555,
0x33333333,
0x0F0F0F0F,
0x00FF00FF,
0x0000FFFF };
int i ;
int shift ;
char tt[34];
for ( i=0, shift =1; i < 5; i++, shift *= 2 )
{
show_bin(x);
printf("&");
show_bin(mask[i]);
printf("=");
show_bin(x&mask[i]);
show_bin(x);
printf("&");
show_bin(x>>shift);
printf("=");
show_bin((x>>shift)&mask[i]);
x = (x & mask[i ]) + ( ( x >> shift) & mask[i ] );
printf("sum=");
show_bin(x);
printf("-----------------------------------------\n");
}
return x;
}

main()
{
unsigned int a=0x98a5b1a3;
printf("%d\n",count_bits(a));
}

lalalalala 2005-03-31