#define MAGIC_VALUE 0x10 // Must be at least 0x10 (?)
void TestFunction(unsigned char * srcbuff3)
{
unsigned char * pin27CC = srcbuff3 + 2;
unsigned long x;
pin27CC++;
srcbuff3++;
for(x = 0; x < MAGIC_VALUE; x++)
{
pin27CC++;
srcbuff3++;
if(*pin27CC != *srcbuff3)
break;
}
printf("Result is: %s\n", pin27CC);
}
int main()
{
TestFunction((unsigned char *)"123456789");
getch();
}
void TestFunction(unsigned char * srcbuff3)
{
unsigned char * pin27CC = srcbuff3 + 2;
unsigned long x;
pin27CC++;
srcbuff3++;
for(x = 0; x < MAGIC_VALUE; x++)
{
pin27CC++;
srcbuff3++;
if(*pin27CC != *srcbuff3)
break;
}
printf("Result is: %s\n", pin27CC);
}
int main()
{
TestFunction((unsigned char *)"123456789");
getch();
}
A quick compile and execute reveals that what he said is true: the debug build correctly displays "Result is: 56789", while the release build displays "Result is: 456789". So, what went wrong? Well, a look at the assembly generated reveals a number of very strange optimizations:
00401000 mov ecx,dword ptr [esp+4]
00401004 push ebx
00401005 lea eax,[ecx+3]
00401008 push esi
00401009 inc ecx
0040100A xor esi,esi
0040100C add ecx,2
0040100F nop
00401010 mov dl,byte ptr [eax+1]
00401013 cmp dl,byte ptr [ecx-1]
00401016 jne TestFunction+67h (401067h)
00401018 mov dl,byte ptr [eax+2]
0040101B cmp dl,byte ptr [ecx]
0040101D jne TestFunction+50h (401050h)
0040101F mov dl,byte ptr [eax+3]
00401022 cmp dl,byte ptr [ecx+1]
00401025 jne TestFunction+64h (401064h)
00401027 mov dl,byte ptr [eax+4]
0040102A mov bl,byte ptr [ecx+2]
0040102D add eax,4
00401030 cmp dl,bl
00401032 jne TestFunction+67h (401067h)
00401034 add esi,4
00401037 add ecx,4
0040103A cmp esi,10h
0040103D jb TestFunction+10h (401010h)
0040103F push eax
00401040 push offset string "Result is: %s\n" (40710Ch)
00401045 call printf (401095h)
0040104A add esp,8
0040104D pop esi
0040104E pop ebx
0040104F ret
00401050 add eax,2
00401053 push eax
00401054 push offset string "Result is: %s\n" (40710Ch)
00401059 call printf (401095h)
0040105E add esp,8
00401061 pop esi
00401062 pop ebx
00401063 ret
00401064 add eax,3
00401067 push eax
00401068 push offset string "Result is: %s\n" (40710Ch)
0040106D call printf (401095h)
00401072 add esp,8
00401075 pop esi
00401076 pop ebx
00401077 ret
You can see right away that the compiler unrolled his loop to 4 iterations of 4 separate compares, using offset-based MOVs to read the characters sequentially. Interestingly, the pointers are not updated inside the multiplexed loop, but rather are updated at each iteration. To compensate for this, 4 loop break points are generated, resulting in 3 separate calls to printf, each adding an appropriate number to the pointer.
Now, this is where things get really weird. The source dictates that by the time the first compare gets executed, pin27CC (eax) will have 4 added to it, and srcbuff3 (ecx) will have 2 added. Yet that's not what the assembly does. pin27CC in fact gets 3 added, and srcbuff3 3 as well. Bizarre as this is, the loop accounts for this by accessing [eax+1] to [eax+4], and [ecx-1] to [ecx+2]. Corresponding to these values, the 4 loop break points update eax before calling printf, adding as necessary 4, 3, 2, or... 0. No, that's not a typo. The first break point goes to 00401067, which is part of the +3 case, but after the addition, so that no alteration of eax is performed (this is why there are only 3 calls to printf, despite there being 4 break points in the loop). This, of course, causes eax to be off by 1, causing the output to be incorrect.
Ah, the joys of clever-but-not-quite-clever-enough optimizing compilers.
By the way, thanks to Buster for the code formatting/colorizing script.
Update:
This bug has been fixed in VC++ 2005, as well as another bug that showed up in this code (namely the use of both eax and ecx as pointers, even though they were always the same). As well, several nice new optimizations are seen, such as reusing bytes already read from the string. The assembly from VC++ 2005:
00401520 push ebx
00401521 push esi
00401522 mov esi,offset ___xi_z+3Fh (4020CFh)
00401527 xor ecx,ecx
00401529 mov eax,esi
0040152B jmp TestFunction+10h (401530h)
0040152D lea ecx,[ecx]
00401530 mov bl,byte ptr [eax+1]
00401533 cmp bl,byte ptr [eax-1]
00401536 jne TestFunction+73h (401593h)
00401538 mov dl,byte ptr [eax+2]
0040153B cmp dl,byte ptr [eax]
0040153D jne TestFunction+49h (401569h)
0040153F cmp byte ptr [eax+3],bl
00401542 jne TestFunction+5Eh (40157Eh)
00401544 add esi,4
00401547 cmp byte ptr [eax+4],dl
0040154A jne TestFunction+76h (401596h)
0040154C add ecx,4
0040154F add eax,4
00401552 cmp ecx,10h
00401555 jb TestFunction+10h (401530h)
00401557 push esi
00401558 push offset ___xi_z+2Ch (4020BCh)
0040155D call dword ptr [__imp__printf (402070h)]
00401563 add esp,8
00401566 pop esi
00401567 pop ebx
00401568 ret
00401569 add esi,2
0040156C push esi
0040156D push offset ___xi_z+2Ch (4020BCh)
00401572 call dword ptr [__imp__printf (402070h)]
00401578 add esp,8
0040157B pop esi
0040157C pop ebx
0040157D ret
0040157E add esi,3
00401581 push esi
00401582 push offset ___xi_z+2Ch (4020BCh)
00401587 call dword ptr [__imp__printf (402070h)]
0040158D add esp,8
00401590 pop esi
00401591 pop ebx
00401592 ret
00401593 add esi,1
00401596 push esi
00401597 push offset ___xi_z+2Ch (4020BCh)
0040159C call dword ptr [__imp__printf (402070h)]
004015A2 add esp,8
004015A5 pop esi
004015A6 pop ebx
004015A7 ret
No comments:
Post a Comment