c++ 使用 google benchmark

william

2024-04-21 约 800 字预计阅读 2 分钟

警告

本文最后更新于 2024-04-21，文中内容可能已过时。

在低延迟场景中，我们对性能有极致的要求。为了方便对比不同函数的开心，需要借助一些测试手段。这些测试的基本流程是：

在函数调用开始是计算 rdtsc 初始值
函数调用结束后，计算 rdtsc 的差值
循环以上流程若干次
最终得到一个平均的函数开销时间

整个测试流程其实是非常的标准化，我们完全可以利用一些框架进行快速的测试。比如我现在使用的 google benchmark。

安装

1
2
3
4


git clone https://github.com/google/benchmark.git
git clone https://github.com/google/googletest.git benchmark/googletest
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE ../benchmark

测试

函数原型

测试有两部分构成

测试函数，原型为 std::function<void(benchmark::State&)>，然后使用宏命令 BENCHMARK(func) 将其注册到主程序。google benchmark 会循环运行该函数，并统计相关指标
主程序入口：BENCHMARK_MAIN()

代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89


#include <benchmark/benchmark.h>
#include <array>
#include <utils/nanotime.hpp>

using namespace falcon;

//-----------------------------------------------------------------------------
void now_ns(benchmark::State& state)
{
    for (auto _: state)
    {
        nanotime_t::ns();
    }
}
BENCHMARK(now_ns);

void now_sysns(benchmark::State& state)
{
    for (auto _: state)
    {
        nanotime_t::sysns();
    }
}
BENCHMARK(now_sysns);

void now_ntime(benchmark::State& state)
{
    for (auto _: state)
    {
        nanotime_t::ntime();
    }
}
BENCHMARK(now_ntime);

void ns2ntime(benchmark::State& state)
{
    for (auto _: state)
    {
        auto ns = nanotime_t::ns();
        nanotime_t::ns2ntime(ns);
    }
}
BENCHMARK(ns2ntime);

void ns2str(benchmark::State& state)
{
    for (auto _: state)
    {
        auto ns = nanotime_t::ns();
        nanotime_t::ns2str(ns);
    }
}
BENCHMARK(ns2str);

void ns2str_slow(benchmark::State& state)
{
    for (auto _: state)
    {
        auto ns = nanotime_t::ns();
        nanotime_t::to_str_slow(ns);
    }
}
BENCHMARK(ns2str_slow);

void ns2datetimestr(benchmark::State& state)
{
    for (auto _: state)
    {
        auto ns = nanotime_t::ns();
        nanotime_t::ns2datetimestr(ns);
    }
}
BENCHMARK(ns2datetimestr);

void strftime(benchmark::State& state)
{
    for (auto _: state)
    {
        auto ns = nanotime_t::ns();
        nanotime_t::strftime(ns);
    }
}
BENCHMARK(strftime);
//-----------------------------------------------------------------------------


///////////////////////////////////////////////////////////////////////////////
BENCHMARK_MAIN();
///////////////////////////////////////////////////////////////////////////////

结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


2024-04-21T13:52:30+08:00
Running ./test_benchmark
Run on (20 X 4800 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 1280 KiB (x10)
  L3 Unified 25600 KiB (x1)
Load Average: 1.13, 1.27, 1.62
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------
Benchmark               Time             CPU   Iterations
---------------------------------------------------------
now_ns               5.49 ns         5.49 ns    107478294
now_sysns            11.3 ns         11.3 ns     62195134
now_ntime            5.43 ns         5.43 ns    130701288
ns2ntime             5.47 ns         5.47 ns    123446131
ns2str               15.0 ns         15.0 ns     47280993
ns2str_slow          60.9 ns         60.9 ns     11144495
ns2datetimestr        460 ns          460 ns      1577622
strftime            15785 ns        15785 ns        44630