
pinkfIoyd 2013-03-08 02:27:12

void* mymemcpy(void *dest,const void* src,int size)
char *dptr = (char*)dest;
char *sptr = (char*)src;
int i = 0;
assert(((dptr > sptr) && (dptr < sptr + size))||
((dptr < sptr) && (dptr > sptr - size))||
(dptr != NULL)||(sptr != NULL));
for(i = 0;i < size;i ++)
*dptr ++ = *sptr ++;
return dest;


「已注销」 2013-03-08
单字节拷贝虽然简单了,但效率低下。 以32位机为例: 首先要以4字节对齐内存,然后以双字(4字节)进行拷贝,单字节拷贝作为补充而已。 这样才能最快。
赵4老师 2013-03-08
引用 19 楼 Mr_warm 的回复:
要装vs2010?这么多的汇编看不懂啊!这些代码是什么意思呀?
计算机组成原理→DOS命令→汇编语言→C语言(不包括C++)、代码书写规范→数据结构、编译原理、操作系统→计算机网络、数据库原理、正则表达式→其它语言(包括C++)、架构…… 对学习编程者的忠告: 眼过千遍不如手过一遍! 书看千行不如手敲一行! 手敲千行不如单步一行! 单步源代码千行不如单步对应汇编一行!
pinkfIoyd 2013-03-08
引用 22 楼 mujiok2003 的回复:
因为memcpy不管内存重叠。 如果需要处理这情况,请使用另外一个函数memmove
mujiok2003 2013-03-08
引用 15 楼 Mr_warm 的回复:
因为memcpy不管内存重叠。 如果需要处理这情况,请使用另外一个函数memmove
pinkfIoyd 2013-03-08
引用 20 楼 bravery36 的回复:
gcc的我不清楚,vc++的msdn写得很清楚: If the source and destination overlap, the behavior of memcpy is undefined. Use memmove to handle overlapping regions. 也就是说memcpy不考虑重叠的问题,由程序员自己来控制。这样做的优点自然还是高……
bravery36 2013-03-08
gcc的我不清楚,vc++的msdn写得很清楚: If the source and destination overlap, the behavior of memcpy is undefined. Use memmove to handle overlapping regions. 也就是说memcpy不考虑重叠的问题,由程序员自己来控制。这样做的优点自然还是高效。
pinkfIoyd 2013-03-08
引用 16 楼 zhao4zhong1 的回复:
先 http://www.microsoft.com/visualstudio/chs/downloads#d-2010-express 点开Visual C++ 2010 Express下面的语言选‘简体中文’,再点立即安装 再参考C:\Program Files\Microsoft Visual Studio 10.0\VC\crt\src\intel\me……
赵4老师 2013-03-08
; Copy down to avoid propogation in overlapping buffers.
        align   @WordSize
        lea     esi,[esi+ecx-4] ;U - point to 4 bytes before src buffer end
        lea     edi,[edi+ecx-4] ;V - point to 4 bytes before dest buffer end
; See if the destination start is dword aligned

        test    edi,11b         ;U - test if dword aligned
        jnz     short CopyLeadDown ;V - if not, jump

        shr     ecx,2           ;U - shift down to dword count
        and     edx,11b         ;V - trailing byte count

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindDown ;V - if so, then jump

        std                     ;N - set direction flag
        rep     movsd           ;N - move all of our dwords
        cld                     ;N - clear direction flag back

        jmp     dword ptr TrailDownVec[edx*4] ;N - process trailing bytes

        align   @WordSize
        neg     ecx             ;U - negate dword count for table merging
                                ;V - spare

        jmp     dword ptr UnwindDownVec[ecx*4+28] ;N - unwind copy

        align   @WordSize

        mov     eax,edi         ;U - get destination offset
        mov     edx,11b         ;V - prepare for mask

        cmp     ecx,4           ;U - check for really short string
        jb      short ByteCopyDown ;V - branch to just copy bytes

        and     eax,11b         ;U - get offset within first dword
        sub     ecx,eax         ;U - to update size after lead copied

        jmp     dword ptr LeadDownVec[eax*4-4] ;N - process leading bytes

        align   @WordSize
        jmp     dword ptr TrailDownVec[ecx*4] ;N - process just bytes

        align   @WordSize
LeadDownVec     dd      LeadDown1, LeadDown2, LeadDown3

        align   @WordSize
        mov     al,[esi+3]      ;U - load first byte
        and     edx,ecx         ;V - trailing byte count

        mov     [edi+3],al      ;U - write out first byte
        sub     esi,1           ;V - point to last src dword

        shr     ecx,2           ;U - shift down to dword count
        sub     edi,1           ;V - point to last dest dword

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindDown ;V - if so, then jump

        std                     ;N - set direction flag
        rep     movsd           ;N - move all of our dwords
        cld                     ;N - clear direction flag

        jmp     dword ptr TrailDownVec[edx*4] ;N - process trailing bytes

        align   @WordSize
        mov     al,[esi+3]      ;U - load first byte
        and     edx,ecx         ;V - trailing byte count

        mov     [edi+3],al      ;U - write out first byte
        mov     al,[esi+2]      ;V - get second byte from source

        shr     ecx,2           ;U - shift down to dword count
        mov     [edi+2],al      ;V - write second byte to destination

        sub     esi,2           ;U - point to last src dword
        sub     edi,2           ;V - point to last dest dword

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindDown ;V - if so, then jump

        std                     ;N - set direction flag
        rep     movsd           ;N - move all of our dwords
        cld                     ;N - clear direction flag

        jmp     dword ptr TrailDownVec[edx*4] ;N - process trailing bytes

        align   @WordSize
        mov     al,[esi+3]      ;U - load first byte
        and     edx,ecx         ;V - trailing byte count

        mov     [edi+3],al      ;U - write out first byte
        mov     al,[esi+2]      ;V - get second byte from source

        mov     [edi+2],al      ;U - write second byte to destination
        mov     al,[esi+1]      ;V - get third byte from source

        shr     ecx,2           ;U - shift down to dword count
        mov     [edi+1],al      ;V - write third byte to destination

        sub     esi,3           ;U - point to last src dword
        sub     edi,3           ;V - point to last dest dword

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      CopyUnwindDown  ;V - if so, then jump

        std                     ;N - set direction flag
        rep     movsd           ;N - move all of our dwords
        cld                     ;N - clear direction flag

        jmp     dword ptr TrailDownVec[edx*4] ;N - process trailing bytes


        align   @WordSize
UnwindDownVec   dd      UnwindDown7, UnwindDown6, UnwindDown5, UnwindDown4
                dd      UnwindDown3, UnwindDown2, UnwindDown1, UnwindDown0

        mov     eax,[esi+ecx*4+28] ;U - get dword from source
                                   ;V - spare
        mov     [edi+ecx*4+28],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+24] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4+24],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+20] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4+20],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+16] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4+16],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+12] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4+12],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+8] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4+8],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4+4] ;U(entry)/V(not) - get dword from source
                                  ;V(entry) - spare
        mov     [edi+ecx*4+4],eax ;U - put dword into destination

        lea     eax,[ecx*4]     ;V - compute update for pointer

        add     esi,eax         ;U - update source pointer
        add     edi,eax         ;V - update destination pointer
        jmp     dword ptr TrailDownVec[edx*4] ;N - process trailing bytes


        align   @WordSize
TrailDownVec    dd      TrailDown0, TrailDown1, TrailDown2, TrailDown3

        align   @WordSize
        mov     eax,[dst]       ;U - return pointer to destination
                                ;V - spare
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

        align   @WordSize
        mov     al,[esi+3]      ;U - get byte from source
                                ;V - spare
        mov     [edi+3],al      ;U - put byte in destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

        align   @WordSize
        mov     al,[esi+3]      ;U - get first byte from source
                                ;V - spare
        mov     [edi+3],al      ;U - put first byte into destination
        mov     al,[esi+2]      ;V - get second byte from source
        mov     [edi+2],al      ;U - put second byte into destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

        align   @WordSize
        mov     al,[esi+3]      ;U - get first byte from source
                                ;V - spare
        mov     [edi+3],al      ;U - put first byte into destination
        mov     al,[esi+2]      ;V - get second byte from source
        mov     [edi+2],al      ;U - put second byte into destination
        mov     al,[esi+1]      ;V - get third byte from source
        mov     [edi+1],al      ;U - put third byte into destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

_MEM_   endp

赵4老师 2013-03-08
; The algorithm for forward moves is to align the destination to a dword
; boundary and so we can move dwords with an aligned destination.  This
; occurs in 3 steps.
;   - move x = ((4 - Dest & 3) & 3) bytes
;   - move y = ((L-x) >> 2) dwords
;   - move (L - x - y*4) bytes

        test    edi,11b         ;U - destination dword aligned?
        jnz     short CopyLeadUp ;V - if we are not dword aligned already, align

        shr     ecx,2           ;U - shift down to dword count
        and     edx,11b         ;V - trailing byte count

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindUp ;V - if so, then jump

        rep     movsd           ;N - move all of our dwords

        jmp     dword ptr TrailUpVec[edx*4] ;N - process trailing bytes

; Code to do optimal memory copies for non-dword-aligned destinations.

; The following length check is done for two reasons:
;    1. to ensure that the actual move length is greater than any possiale
;       alignment move, and
;    2. to skip the multiple move logic for small moves where it would
;       be faster to move the bytes with one instruction.

        align   @WordSize

        mov     eax,edi         ;U - get destination offset
        mov     edx,11b         ;V - prepare for mask

        sub     ecx,4           ;U - check for really short string - sub for adjust
        jb      short ByteCopyUp ;V - branch to just copy bytes

        and     eax,11b         ;U - get offset within first dword
        add     ecx,eax         ;V - update size after leading bytes copied

        jmp     dword ptr LeadUpVec[eax*4-4] ;N - process leading bytes

        align   @WordSize
        jmp     dword ptr TrailUpVec[ecx*4+16] ;N - process just bytes

        align   @WordSize
        jmp     dword ptr UnwindUpVec[ecx*4] ;N - unwind dword copy

        align   @WordSize
LeadUpVec       dd      LeadUp1, LeadUp2, LeadUp3

        align   @WordSize
        and     edx,ecx         ;U - trailing byte count
        mov     al,[esi]        ;V - get first byte from source

        mov     [edi],al        ;U - write second byte to destination
        mov     al,[esi+1]      ;V - get second byte from source

        mov     [edi+1],al      ;U - write second byte to destination
        mov     al,[esi+2]      ;V - get third byte from source

        shr     ecx,2           ;U - shift down to dword count
        mov     [edi+2],al      ;V - write third byte to destination

        add     esi,3           ;U - advance source pointer
        add     edi,3           ;V - advance destination pointer

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindUp ;V - if so, then jump

        rep     movsd           ;N - move all of our dwords

        jmp     dword ptr TrailUpVec[edx*4] ;N - process trailing bytes

        align   @WordSize
        and     edx,ecx         ;U - trailing byte count
        mov     al,[esi]        ;V - get first byte from source

        mov     [edi],al        ;U - write second byte to destination
        mov     al,[esi+1]      ;V - get second byte from source

        shr     ecx,2           ;U - shift down to dword count
        mov     [edi+1],al      ;V - write second byte to destination

        add     esi,2           ;U - advance source pointer
        add     edi,2           ;V - advance destination pointer

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindUp ;V - if so, then jump

        rep     movsd           ;N - move all of our dwords

        jmp     dword ptr TrailUpVec[edx*4] ;N - process trailing bytes

        align   @WordSize
        and     edx,ecx         ;U - trailing byte count
        mov     al,[esi]        ;V - get first byte from source

        mov     [edi],al        ;U - write second byte to destination
        add     esi,1           ;V - advance source pointer

        shr     ecx,2           ;U - shift down to dword count
        add     edi,1           ;V - advance destination pointer

        cmp     ecx,8           ;U - test if small enough for unwind copy
        jb      short CopyUnwindUp ;V - if so, then jump

        rep     movsd           ;N - move all of our dwords

        jmp     dword ptr TrailUpVec[edx*4] ;N - process trailing bytes

        align   @WordSize
UnwindUpVec     dd      UnwindUp0, UnwindUp1, UnwindUp2, UnwindUp3
                dd      UnwindUp4, UnwindUp5, UnwindUp6, UnwindUp7

        mov     eax,[esi+ecx*4-28] ;U - get dword from source
                                   ;V - spare
        mov     [edi+ecx*4-28],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-24] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4-24],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-20] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4-20],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-16] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4-16],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-12] ;U(entry)/V(not) - get dword from source
                                   ;V(entry) - spare
        mov     [edi+ecx*4-12],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-8] ;U(entry)/V(not) - get dword from source
                                  ;V(entry) - spare
        mov     [edi+ecx*4-8],eax ;U - put dword into destination
        mov     eax,[esi+ecx*4-4] ;U(entry)/V(not) - get dword from source
                                  ;V(entry) - spare
        mov     [edi+ecx*4-4],eax ;U - put dword into destination

        lea     eax,[ecx*4]     ;V - compute update for pointer

        add     esi,eax         ;U - update source pointer
        add     edi,eax         ;V - update destination pointer
        jmp     dword ptr TrailUpVec[edx*4] ;N - process trailing bytes


        align   @WordSize
TrailUpVec      dd      TrailUp0, TrailUp1, TrailUp2, TrailUp3

        align   @WordSize
        mov     eax,[dst]       ;U - return pointer to destination
        pop     esi             ;V - restore esi
        pop     edi             ;U - restore edi
                                ;V - spare

        align   @WordSize
        mov     al,[esi]        ;U - get byte from source
                                ;V - spare
        mov     [edi],al        ;U - put byte in destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

        align   @WordSize
        mov     al,[esi]        ;U - get first byte from source
                                ;V - spare
        mov     [edi],al        ;U - put first byte into destination
        mov     al,[esi+1]      ;V - get second byte from source
        mov     [edi+1],al      ;U - put second byte into destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi

        align   @WordSize
        mov     al,[esi]        ;U - get first byte from source
                                ;V - spare
        mov     [edi],al        ;U - put first byte into destination
        mov     al,[esi+1]      ;V - get second byte from source
        mov     [edi+1],al      ;U - put second byte into destination
        mov     al,[esi+2]      ;V - get third byte from source
        mov     [edi+2],al      ;U - put third byte into destination
        mov     eax,[dst]       ;V - return pointer to destination
        pop     esi             ;U - restore esi
        pop     edi             ;V - restore edi
赵4老师 2013-03-08
http://www.microsoft.com/visualstudio/chs/downloads#d-2010-express 点开Visual C++ 2010 Express下面的语言选‘简体中文’,再点立即安装 再参考C:\Program Files\Microsoft Visual Studio 10.0\VC\crt\src\intel\memcpy.asm
       page    ,132
        title   memcpy - Copy source memory bytes to destination
;memcpy.asm - contains memcpy and memmove routines
;       Copyright (c) Microsoft Corporation. All rights reserved.
;       memcpy() copies a source memory buffer to a destination buffer.
;       Overlapping buffers are not treated specially, so propogation may occur.
;       memmove() copies a source memory buffer to a destination buffer.
;       Overlapping buffers are treated specially, to avoid propogation.

        include cruntime.inc

M_EXIT  macro
        ret                     ; _cdecl return
        endm    ; M_EXIT

    extrn   _VEC_memcpy:near
    extrn   __sse2_available:dword

;memcpy - Copy source buffer to destination buffer
;       memcpy() copies a source memory buffer to a destination memory buffer.
;       This routine does NOT recognize overlapping buffers, and thus can lead
;       to propogation.
;       For cases where propogation must be avoided, memmove() must be used.
;       Algorithm:
;           Same as memmove. See Below
;memmove - Copy source buffer to destination buffer
;       memmove() copies a source memory buffer to a destination memory buffer.
;       This routine recognize overlapping buffers to avoid propogation.
;       For cases where propogation is not a problem, memcpy() can be used.
;   Algorithm:
;       void * memmove(void * dst, void * src, size_t count)
;       {
;               void * ret = dst;
;               if (dst <= src || dst >= (src + count)) {
;                       /*
;                        * Non-Overlapping Buffers
;                        * copy from lower addresses to higher addresses
;                        */
;                       while (count--)
;                               *dst++ = *src++;
;                       }
;               else {
;                       /*
;                        * Overlapping Buffers
;                        * copy from higher addresses to lower addresses
;                        */
;                       dst += count - 1;
;                       src += count - 1;
;                       while (count--)
;                               *dst-- = *src--;
;                       }
;               return(ret);
;       }
;       void *dst = pointer to destination buffer
;       const void *src = pointer to source buffer
;       size_t count = number of bytes to copy
;       Returns a pointer to the destination buffer in AX/DX:AX
;       CX, DX

ifdef MEM_MOVE
        _MEM_     equ <memmove>
else  ; MEM_MOVE
        _MEM_     equ <memcpy>
endif  ; MEM_MOVE

%       public  _MEM_
_MEM_   proc \
        dst:ptr byte, \
        src:ptr byte, \

              ; destination pointer
              ; source pointer
              ; number of bytes to copy

;       push    ebp             ;U - save old frame pointer
;       mov     ebp, esp        ;V - set new frame pointer

        push    edi             ;U - save edi
        push    esi             ;V - save esi

        mov     esi,[src]       ;U - esi = source
        mov     ecx,[count]     ;V - ecx = number of bytes to move

        mov     edi,[dst]       ;U - edi = dest

; Check for overlapping buffers:
;       If (dst <= src) Or (dst >= src + Count) Then
;               Do normal (Upwards) Copy
;       Else
;               Do Downwards Copy to avoid propagation

        mov     eax,ecx         ;V - eax = byte count...

        mov     edx,ecx         ;U - edx = byte count...
        add     eax,esi         ;V - eax = point past source end

        cmp     edi,esi         ;U - dst <= src ?
        jbe     short CopyUp    ;V - yes, copy toward higher addresses

        cmp     edi,eax         ;U - dst < (src + count) ?
        jb      CopyDown        ;V - yes, copy toward lower addresses

; Copy toward higher addresses.
; First, see if we can use a "fast" copy SSE2 routine
        ; block size greater than min threshold?
        cmp     ecx,080h
        jb      Dword_align
        ; SSE2 supported?
        cmp     DWORD PTR __sse2_available,0
        je      Dword_align
        ; alignments equal?
        push    edi
        push    esi
        and     edi,15
        and     esi,15
        cmp     edi,esi
        pop     esi
        pop     edi
        jne     Dword_align

        ; do fast SSE2 copy, params already set
        jmp     _VEC_memcpy
        ; no return
pinkfIoyd 2013-03-08
引用 12 楼 mujiok2003 的回复:
性能优化: 比如要考贝128个字节,在库源码中只需要128/8 = 16控制语句 C/C++ code?1i> 0; --i而你的实现,虽然通用,却需要 128个语句C/C++ code?1i < size;i ++) , 多了128-16= 112个语句,效率就不太高了。
mujiok2003 2013-03-08
make it work make it work better
mujiok2003 2013-03-08
引用 6 楼 Kaile 的回复:
mujiok2003 2013-03-08
性能优化: 比如要考贝128个字节,在库源码中只需要128/8 = 16控制语句
i> 0; --i
而你的实现,虽然通用,却需要 128个语句
i < size;i ++)  
, 多了128-16= 112个语句,效率就不太高了。
wugui414 2013-03-08
引用 10 楼 hepeizhong2010 的回复:
效率问题,以16byte为例, LZ循环中判断一次操作一次, 源码中判断一次操作8个,效率更高
hepeizhong2010 2013-03-08
效率问题,以16byte为例, LZ循环中判断一次操作一次, 源码中判断一次操作8个,效率更高
zilaishuichina 2013-03-08
对于 dst == src 时 直接return (dst > src) && ((dest - src) < size) 时 应该从后向前拷贝 其他情况 则从前向后拷贝
氰客 2013-03-08
引用 7 楼 Mr_warm 的回复:
引用 4 楼 gpshq 的回复:C/C++ code?123char buffer[] = "hello,world\n";memcpy(buffer + 1, buffer, 1);printf("%s", buffer); 所以LZ以为这样的逻辑不对? 你这样的话不是把buffer里的数据给覆盖掉了吗?目标是buffer + 1,输出buffer+1的时候……
那LZ对最后一个参数size的理解是什么, 可否使用者自行决定copy多少字节,copy产生覆盖在逻辑上有错误么?
pinkfIoyd 2013-03-08
引用 4 楼 gpshq 的回复:
C/C++ code?123char buffer[] = "hello,world\n";memcpy(buffer + 1, buffer, 1);printf("%s", buffer); 所以LZ以为这样的逻辑不对?
你这样的话不是把buffer里的数据给覆盖掉了吗?目标是buffer + 1,输出buffer+1的时候,发现少字符了。我发现了我的程序中存在一个问题就是,指针+1相当于加了4个字节,这一块存在问题。
Kaile 2013-03-08
