c++ 使用 nanobench 测试性能

注意
本文最后更新于 2024-05-22,文中内容可能已过时。

nanobench 是一个简易的代码性能测试工具,有助手我们深入理解程序运行的开销。

nanobench

代码仓库地址为:nanobench。整个项目只有一个头文件,可以说十分简单了。

测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// https://github.com/martinus/nanobench
// g++ -O2 -I../../include main.cpp -o m

#define ANKERL_NANOBENCH_IMPLEMENT
#include <nanobench.h>

#include <chrono>
#include <random>
#include <thread>

int main(int, char**)
{
    uint64_t x = 1;
    ankerl::nanobench::Bench().run("x += x", [&]() {
        ankerl::nanobench::doNotOptimizeAway(x += x);
    });

    ankerl::nanobench::Bench().run("sleep 10ms", [&]() {
        std::this_thread::sleep_for(std::chrono::milliseconds(10));
    });

    std::random_device dev;
    std::mt19937_64 rng(dev());
    ankerl::nanobench::Bench().minEpochIterations(12045).run("random fluctuations", [&]() {
        // each run, perform a random number of rng calls
        auto iterations = rng() & UINT64_C(0xff);
        for (uint64_t i = 0; i < iterations; ++i) {
            (void)rng();
        }
    });
}

编译以上代码,然后运行即可得到结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Warning, results might be unstable:
* CPU frequency scaling enabled: CPU 0 between 800.0 and 4,800.0 MHz
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                0.40 |    2,485,348,469.62 |    2.6% |      0.01 | `x += x`
|       10,124,540.00 |               98.77 |    0.0% |      0.11 | `sleep 10ms`
|              237.14 |        4,216,912.81 |    4.2% |      0.04 | `random fluctuations`

对比 nanotime

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include "util/time_util.hpp"
#include <ctime>
#include <ratio>
#define ANKERL_NANOBENCH_IMPLEMENT
#include <nanobench.h>

#include <utils/nanotime.hpp>
#include <util/microtime.hpp>

using namespace snail;

int main(int, char**)
{
    ankerl::nanobench::Bench().run("ns", [&]()
    {
        int64_t ns {0};
        ankerl::nanobench::doNotOptimizeAway(ns = nanotime_t::ns());
    });

    ankerl::nanobench::Bench().run("microtime::us", [&]()
    {
        int64_t us {0};
        ankerl::nanobench::doNotOptimizeAway(us = microtime::now().count());
    });

    ankerl::nanobench::Bench().run("ns2us", [&]()
    {
        int64_t us {0};
        ankerl::nanobench::doNotOptimizeAway(us = nanotime_t::ns()/1000);
    });

    ankerl::nanobench::Bench().run("nanotime_t::us", [&]()
    {
        int64_t us {0};
        ankerl::nanobench::doNotOptimizeAway(us = nanotime_t::us());
    });

    ankerl::nanobench::Bench().run("microtime::ntime", [&]()
    {
        double ntime {.0};
        ankerl::nanobench::doNotOptimizeAway(ntime = to_ntime(microtime::now().count()));
    });

    ankerl::nanobench::Bench().run("nanotime_t::ntime", [&]()
    {
        double ntime {.0};
        ankerl::nanobench::doNotOptimizeAway(ntime = nanotime_t::ntime());
    });

    ankerl::nanobench::Bench().run("nanotime_t::to_str", [&]()
    {
        ankerl::nanobench::doNotOptimizeAway(nanotime_t::to_str(nanotime_t::ns()));
    });

    ankerl::nanobench::Bench().run("to_zgc_str", [&]()
    {
        ankerl::nanobench::doNotOptimizeAway(microtime::now().to_zgc_str());
    });
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
Warning, results might be unstable:
* CPU frequency scaling enabled: CPU 0 between 800.0 and 4,500.0 MHz
* CPU governor is 'powersave' but should be 'performance'
* Turbo is enabled, CPU frequency will fluctuate

Recommendations
* Use 'pyperf system tune' before benchmarking. See https://github.com/psf/pyperf

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|                6.67 |      149,864,077.86 |    2.1% |      0.01 | `ns`
|               15.90 |       62,902,027.27 |    1.9% |      0.01 | `microtime::us`
|                7.24 |      138,174,331.20 |    0.5% |      0.01 | `ns2us`
|                7.29 |      137,255,469.22 |    1.6% |      0.01 | `nanotime_t::us`
|               23.39 |       42,750,704.21 |    0.3% |      0.01 | `microtime::ntime`
|               12.31 |       81,232,849.21 |    0.2% |      0.01 | `nanotime_t::ntime`
|               46.80 |       21,366,089.17 |    2.6% |      0.01 | `nanotime_t::to_str`
|              393.31 |        2,542,506.82 |    2.0% |      0.01 | `to_zgc_str`

关于系统 cpu 性能

可以开启高性能模式。参考Linux 设置 cpu 高性能performance模式

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
sudo apt-get install cpufrequtils
sudo apt-get install sysfsutils

## 查看 CPU 状态
cpufreq-info
## 查看频率信息
cpupower frequency-info

## 把 cpu 调整到性能模式
sudo cpufreq-set -g performance

## 通过设置默认模式,防止重启后恢复
sudo vim  /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
## 填写
performance

## 或者全局设置
sudo vim /etc/default/cpufrequtils
GOVERNOR="performance"

## 重启配置生效
systemctl restart cpufrequtils

关于 rdtsc

参考文章:细说RDTSC的坑

1
2
3
4
5
6
uint64_t rdtsc()
{
    uint64_t a, d;
    __asm__ volatile("rdtsc" : "=a"(a), "=d"(d));
    return (d << 32) | a;
}

我们还可以直接使用 gcc 内置的函数。参考 StackOverflow How to count clock cycles with RDTSC in GCC x86

1
2
3
4
5
6
7
/* rdtsc */
extern __inline unsigned long long
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
__rdtsc (void)
{
    return __builtin_ia32_rdtsc ();
}

查看 CPU 是否支持 const tsc

1
cat /proc/cpuinfo |ag constant_tsc

const tsc
const tsc

Ref

william 支付宝支付宝
william 微信微信
0%