测速神器:Stream - 内存带宽测试利器
最编程
2024-07-20 12:27:34
...
简介:
STREAM是一套综合性能测试程序集,通过fortran和C两种高级且高效的语言编写完成,由于这两种语言在数学计算方面的高效率, 使得 STREAM 测试例程可以充分发挥出内存的能力。 STREAM 测试得到的是可持续运行的内存带宽最大值,而并不是一般的硬件厂商提供的理论最大值。
测试过程:
- 下载stream.c文件
- 使用gcc编译成二进制可执行文件
gcc -O3 -mcmodel=medium -fopenmp -DSTREAM_ARRAY_SIZE=100000000 -DNTIMES=30 -DOFFSET=4096 stream.c -o stream.o
关键参数:
-O3 ; 编译器编译优化级别;
-mcmodel=medium ;当单个Memory Array Size 大于2GB时需要设置此参数。
-fopenmp; 适应多处理器环境;开启后,程序默认线程为CPU线程数,
也可以运行时也可以动态指定运行的进程数 :
export OMP_NUM_THREADS=12 #12为自定义的要使用的处理器
-DSTREAM_ARRAY_SIZE=100000000;指定计算中a[],b[],c[]数组的大小,
-DNTIMES=30 ;执行的次数,并且从这些结果中选最优值。
-DOFFSET=4096 ;数组的偏移,一般可以不定义。
测试原理:
- 主要有四种数组的运算,测试到内存带宽的性能,分别是:数组的复制(Copy)、数组的尺度变换(Scale)、数组的矢量求和(Add)、数组的复合矢量求和(Triad)
2. 数组的值采用了双精度(8个字节)
测试结果:
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 11029549909 (elements), Offset = 0 (elements)
Memory per array = 84148.8 MiB (= 82.2 GiB).
Total memory required = 252446.4 MiB (= 246.5 GiB).
Each kernel will be executed 30 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 40
Number of Threads counted = 40
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 1606530 microseconds.
(= 1606530 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 98431.6 1.856682 1.792847 2.098195
Scale: 77450.3 2.322944 2.278531 2.504460
Add: 87651.4 3.074648 3.020023 3.327516
Triad: 87603.4 3.072721 3.021679 3.305595
-------------------------------------------------------------
参考链接:
CPU优化参数:http://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html