Raspberry Pi3 Model A+のベンチマーク

前回の投稿では、USBシリアルモジュールを使ってRaspberry Pi3 Model Aへのセットアップを行いました。その際、core_freq=250でクロックを低めに固定化することで、UARTとBluetoothが両方使える状態にしています。

ただ、コアクロックを下げることによって、どの程度速度低下があるのか気になります。そこでベンチマークを実施してみます。それほど変わらないならセットそのままにして、大きく変わるようなら必要な時だけ固定化しようと思います。

Raspberry Pi3 Model A+のスペック情報

> lscpu
 lscpu
Architecture:          armv7l
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Model:                 4
Model name:            ARMv7 Processor rev 4 (v7l)
CPU max MHz:           1400.0000
CPU min MHz:           600.0000
BogoMIPS:              38.40
Flags:                 half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32

現在のクロック周波数の値。常に変動しており、アイドル時は600MHzで動作していることが分かる。後述するUnixBenchを稼働させると14000000の値になり、1.4GHzで動作する。

> cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
600000
  • Raspberry Pi3 Model A+のデフォルトパラメータ

vcgencmd get_config intコマンドでデフォルト値またはconfig.txtで設定した現在の値が取得できる。

> vcgencmd get_config int
aphy_params_current=819
arm_freq=1400
audio_pwm_mode=514
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x331df0
desired_osc_freq_boost=0x3c45b0
disable_commandline_tags=2
disable_l2cache=1
display_hdmi_rotate=-1
display_lcd_rotate=-1
dphy_params_current=547
enable_uart=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=31250
over_voltage_avs_boost=0x200b2
overscan_bottom=32
overscan_left=32
overscan_right=32
overscan_top=32
pause_burst_frames=1
program_serial_random=1
sdram_freq=450

デフォルトの場合、core_freq=400であり、シリアル通信を有効にする設定を行うと、core_freq=250になっていることを事前に確認している。それ以外に違いはない。

Unix Bench

githubで管理されているアプリで、ソースをダウンロード&ビルドして実行します。

> git clone https://github.com/kdlucas/byte-unixbench.git
> cd byte-unixbench/UnixBench 
> make
:
>./Run

計測終了までに30分ほど必要になる。あと、やはりCPUが結構熱くなる。

  • デフォルト(core_freq=400)のRaspberry Pi3 Model A+
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 18:14:07 - 18:42:09
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        5067802.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1239.8 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1161.9 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        155991.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           44757.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        357170.9 KBps  (30.0 s, 2 samples)
Pipe Throughput                              329141.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  67956.1 lps   (10.0 s, 7 samples)
Process Creation                               2740.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2057.4 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    636.2 lpm   (60.1 s, 2 samples)
System Call Overhead                         700722.3 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    5067802.9    434.3
Double-Precision Whetstone                       55.0       1239.8    225.4
Execl Throughput                                 43.0       1161.9    270.2
File Copy 1024 bufsize 2000 maxblocks          3960.0     155991.5    393.9
File Copy 256 bufsize 500 maxblocks            1655.0      44757.4    270.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     357170.9    615.8
Pipe Throughput                               12440.0     329141.6    264.6
Pipe-based Context Switching                   4000.0      67956.1    169.9
Process Creation                                126.0       2740.7    217.5
Shell Scripts (1 concurrent)                     42.4       2057.4    485.2
Shell Scripts (8 concurrent)                      6.0        636.2   1060.4
System Call Overhead                          15000.0     700722.3    467.1
                                                                   ========
System Benchmarks Index Score                                         355.5

------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 18:42:09 - 19:10:28
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       17565826.8 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4299.7 MWIPS (11.4 s, 7 samples)
Execl Throughput                               2347.9 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        226359.7 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           60848.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        520184.4 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1131592.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 238301.2 lps   (10.0 s, 7 samples)
Process Creation                               5306.9 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   4582.3 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                    633.9 lpm   (60.2 s, 2 samples)
System Call Overhead                        2370589.9 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   17565826.8   1505.2
Double-Precision Whetstone                       55.0       4299.7    781.8
Execl Throughput                                 43.0       2347.9    546.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     226359.7    571.6
File Copy 256 bufsize 500 maxblocks            1655.0      60848.4    367.7
File Copy 4096 bufsize 8000 maxblocks          5800.0     520184.4    896.9
Pipe Throughput                               12440.0    1131592.3    909.6
Pipe-based Context Switching                   4000.0     238301.2    595.8
Process Creation                                126.0       5306.9    421.2
Shell Scripts (1 concurrent)                     42.4       4582.3   1080.7
Shell Scripts (8 concurrent)                      6.0        633.9   1056.6
System Call Overhead                          15000.0    2370589.9   1580.4
                                                                   ========
System Benchmarks Index Score                                         778.9

  • core_freq=250固定のRaspberry Pi3 Model A+
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 14:03:46 - 14:31:47
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        5070314.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1240.0 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1135.0 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        150556.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           44694.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        345471.2 KBps  (30.0 s, 2 samples)
Pipe Throughput                              326975.0 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  69945.1 lps   (10.0 s, 7 samples)
Process Creation                               2706.0 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   1989.7 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    617.5 lpm   (60.0 s, 2 samples)
System Call Overhead                         700974.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    5070314.4    434.5
Double-Precision Whetstone                       55.0       1240.0    225.5
Execl Throughput                                 43.0       1135.0    264.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     150556.5    380.2
File Copy 256 bufsize 500 maxblocks            1655.0      44694.9    270.1
File Copy 4096 bufsize 8000 maxblocks          5800.0     345471.2    595.6
Pipe Throughput                               12440.0     326975.0    262.8
Pipe-based Context Switching                   4000.0      69945.1    174.9
Process Creation                                126.0       2706.0    214.8
Shell Scripts (1 concurrent)                     42.4       1989.7    469.3
Shell Scripts (8 concurrent)                      6.0        617.5   1029.1
System Call Overhead                          15000.0     700974.8    467.3
                                                                   ========
System Benchmarks Index Score                                         351.2
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 14:31:47 - 15:00:06
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       17605207.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4288.6 MWIPS (11.3 s, 7 samples)
Execl Throughput                               2261.1 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        223324.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           60412.7 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        514305.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1125227.3 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 232475.4 lps   (10.0 s, 7 samples)
Process Creation                               5102.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   4417.6 lpm   (60.1 s, 2 samples)
Shell Scripts (8 concurrent)                    610.2 lpm   (60.2 s, 2 samples)
System Call Overhead                        2335559.6 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   17605207.9   1508.6
Double-Precision Whetstone                       55.0       4288.6    779.7
Execl Throughput                                 43.0       2261.1    525.8
File Copy 1024 bufsize 2000 maxblocks          3960.0     223324.1    563.9
File Copy 256 bufsize 500 maxblocks            1655.0      60412.7    365.0
File Copy 4096 bufsize 8000 maxblocks          5800.0     514305.0    886.7
Pipe Throughput                               12440.0    1125227.3    904.5
Pipe-based Context Switching                   4000.0     232475.4    581.2
Process Creation                                126.0       5102.1    404.9
Shell Scripts (1 concurrent)                     42.4       4417.6   1041.9
Shell Scripts (8 concurrent)                      6.0        610.2   1017.1
System Call Overhead                          15000.0    2335559.6   1557.0
                                                                   ========
System Benchmarks Index Score                                         764.2

確かにcore_freqをデフォルトの400から250に下げて固定化すると少しだけスペックダウンしているが、体感的に感じない気がしている。これなら常時250でシリアル通信できるようにしていても良いかもしれない。

オーバークロックの各種設定

raspi-configコマンドのメニューにもOverclockという項目があるが、現在は機能してない模様。なので、config.txtにパラメータを設定してオーバークロックするらしい。

Overclocking options in config.txt

  • Raspberry Pi3の場合、Model A+のデフォルト値
arm_freq=1400
gpu_freq=300
core_freq=400
sdram_freq=450
over_voltage=0
  • ネットで見つけたRaspberry Pi3のオーバークロック設定

arm_freqは1300と1550、gpu_freqcore_freqは500

temp_soft_limit=70
arm_freq=1550
gpu_freq=500
core_freq=500
sdram_freq=500
sdram_schmoo=0x02000020
over_voltage=6
sdram_over_voltage=2
arm_freq=1300
gpu_freq=500
sdram_freq=500
over_voltage_sdram=0
core_freq=500 # GPU Frequency 
arm_freq=1300 # CPU Frequency 
over_voltage=4 #Electric power sent to CPU / GPU (4 = 1.3V) 
disable_splash=1 # Disables the display of the electric alert screen
arm_freq=1350
over_voltage=5
gpu_freq=550

# sdram overclock
sdram_freq=550
sdram_schmoo=0x02000020
over_voltage_sdram_p=6
over_voltage_sdram_i=4
over_voltage_sdram_c=4
arm_freq=1350
sdram_freq=450
core_freq=525
over_voltage=4
  • オーバークロックの検証

実際に上記の設定を真似して試してみたところ、

  • arm_freq=1550だと起動せず
  • arm_freq=1500だとテスト途中でハングアップ

arm_freqは変更なしでトライ。

arm_freq=1400
gpu_freq=500
core_freq=500
sdram_freq=500
sdram_schmoo=0x02000020
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 22:00:11 - 22:28:13
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables        5065350.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     1240.1 MWIPS (9.9 s, 7 samples)
Execl Throughput                               1165.4 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        157901.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           45419.5 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        369224.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                              329239.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  68825.5 lps   (10.0 s, 7 samples)
Process Creation                               2745.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   2067.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    672.6 lpm   (60.0 s, 2 samples)
System Call Overhead                         700514.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0    5065350.4    434.0
Double-Precision Whetstone                       55.0       1240.1    225.5
Execl Throughput                                 43.0       1165.4    271.0
File Copy 1024 bufsize 2000 maxblocks          3960.0     157901.6    398.7
File Copy 256 bufsize 500 maxblocks            1655.0      45419.5    274.4
File Copy 4096 bufsize 8000 maxblocks          5800.0     369224.0    636.6
Pipe Throughput                               12440.0     329239.6    264.7
Pipe-based Context Switching                   4000.0      68825.5    172.1
Process Creation                                126.0       2745.1    217.9
Shell Scripts (1 concurrent)                     42.4       2067.6    487.6
Shell Scripts (8 concurrent)                      6.0        672.6   1121.1
System Call Overhead                          15000.0     700514.7    467.0
                                                                   ========
System Benchmarks Index Score                                         359.6

------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 22:28:13 - 22:56:25
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables       17874778.4 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4607.8 MWIPS (10.7 s, 7 samples)
Execl Throughput                               2453.9 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        232569.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           63347.4 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        548742.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1179231.6 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 244505.3 lps   (10.0 s, 7 samples)
Process Creation                               5579.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   4760.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                    650.9 lpm   (60.1 s, 2 samples)
System Call Overhead                        2555818.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   17874778.4   1531.7
Double-Precision Whetstone                       55.0       4607.8    837.8
Execl Throughput                                 43.0       2453.9    570.7
File Copy 1024 bufsize 2000 maxblocks          3960.0     232569.1    587.3
File Copy 256 bufsize 500 maxblocks            1655.0      63347.4    382.8
File Copy 4096 bufsize 8000 maxblocks          5800.0     548742.0    946.1
Pipe Throughput                               12440.0    1179231.6    947.9
Pipe-based Context Switching                   4000.0     244505.3    611.3
Process Creation                                126.0       5579.7    442.8
Shell Scripts (1 concurrent)                     42.4       4760.6   1122.8
Shell Scripts (8 concurrent)                      6.0        650.9   1084.8
System Call Overhead                          15000.0    2555818.7   1703.9
                                                                   ========
System Benchmarks Index Score                                         812.6

主に浮動小数点演算やファイルコピーが増加している。が全体的にこんなものかな?という印象。とくにarm_freqを上げられなかったのが悔しい。個体差かもしれない。

-->