Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Why RDTSC is so slow?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
bfc_forever
n00b
n00b


Joined: 10 Aug 2006
Posts: 49

PostPosted: Thu Mar 22, 2007 12:32 am    Post subject: Why RDTSC is so slow? Reply with quote

Hi,

I tested the following code on a core 2 duo E6400 computer running Gentoo linux and found the time between of the two consecutive reads of time stamp register was 88 cycles. This result is much slower than I expected. I wonder if the result is normal or I did something wrong. Thanks a lot for your help.

Code:
#include <stdio.h>
#include <stdint.h>

extern "C" {
  __inline__ uint64_t rdtsc() {
    uint32_t lo, hi;
    /* We cannot use "=A", since this would use %rax on x86_64 */
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return (uint64_t)hi << 32 | lo;
  }
}

int main(int argc, char* argv[])
{
  unsigned long long a, b;

  a = rdtsc();
  b = rdtsc();

  printf("%llu\n", b-a);

  return 0;
}


Best,

bfc_forever
Back to top
View user's profile Send private message
reillyeon
n00b
n00b


Joined: 26 Mar 2003
Posts: 44
Location: Boston (ish)

PostPosted: Fri Mar 23, 2007 4:35 am    Post subject: Reply with quote

If you look at the generated assembly for this code you see that, without any explicit optimization, GCC doesn't inline the function, and there is actually quite a bit of code between each call to rdtsc(), and thus between each execution of the instruction. If compiled with -O1 the difference jumps from about 45 cycles (on an Athlon X2) to 8 cycles, which makes sense given there are still 4 instructions between each read. Similar results should be seen on your machine.
_________________
Linux user #309501
Back to top
View user's profile Send private message
bfc_forever
n00b
n00b


Joined: 10 Aug 2006
Posts: 49

PostPosted: Fri Mar 23, 2007 4:47 am    Post subject: Reply with quote

Thanks a lot for your information. I noticed that the compiler didn't inline it and optimized the function call away. But on my computer, which is a core 2 duo E6400, the time drops to 64 cycles from 88 cycles. I will check it further to see if there is any more space to optimize.

reillyeon wrote:
If you look at the generated assembly for this code you see that, without any explicit optimization, GCC doesn't inline the function, and there is actually quite a bit of code between each call to rdtsc(), and thus between each execution of the instruction. If compiled with -O1 the difference jumps from about 45 cycles (on an Athlon X2) to 8 cycles, which makes sense given there are still 4 instructions between each read. Similar results should be seen on your machine.


Best,

bfc_forever
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum