前回の投稿では、USBシリアルモジュールを使ってRaspberry Pi3 Model Aへのセットアップを行いました。その際、core_freq=250
でクロックを低めに固定化することで、UARTとBluetoothが両方使える状態にしています。
ただ、コアクロックを下げることによって、どの程度速度低下があるのか気になります。そこでベンチマークを実施してみます。それほど変わらないならセットそのままにして、大きく変わるようなら必要な時だけ固定化しようと思います。
Raspberry Pi3 Model A+のスペック情報
> lscpu
lscpu
Architecture: armv7l
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Model: 4
Model name: ARMv7 Processor rev 4 (v7l)
CPU max MHz: 1400.0000
CPU min MHz: 600.0000
BogoMIPS: 38.40
Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32
現在のクロック周波数の値。常に変動しており、アイドル時は600MHzで動作していることが分かる。後述するUnixBenchを稼働させると14000000
の値になり、1.4GHzで動作する。
> cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
600000
- Raspberry Pi3 Model A+のデフォルトパラメータ
vcgencmd get_config int
コマンドでデフォルト値またはconfig.txt
で設定した現在の値が取得できる。
> vcgencmd get_config int
aphy_params_current=819
arm_freq=1400
audio_pwm_mode=514
config_hdmi_boost=5
core_freq=400
desired_osc_freq=0x331df0
desired_osc_freq_boost=0x3c45b0
disable_commandline_tags=2
disable_l2cache=1
display_hdmi_rotate=-1
display_lcd_rotate=-1
dphy_params_current=547
enable_uart=1
force_eeprom_read=1
force_pwm_open=1
framebuffer_ignore_alpha=1
framebuffer_swap=1
gpu_freq=300
hdmi_force_cec_address=65535
init_uart_clock=0x2dc6c00
lcd_framerate=60
over_voltage_avs=31250
over_voltage_avs_boost=0x200b2
overscan_bottom=32
overscan_left=32
overscan_right=32
overscan_top=32
pause_burst_frames=1
program_serial_random=1
sdram_freq=450
デフォルトの場合、core_freq=400
であり、シリアル通信を有効にする設定を行うと、core_freq=250
になっていることを事前に確認している。それ以外に違いはない。
Unix Bench
githubで管理されているアプリで、ソースをダウンロード&ビルドして実行します。
> git clone https://github.com/kdlucas/byte-unixbench.git
> cd byte-unixbench/UnixBench
> make
:
>./Run
計測終了までに30分ほど必要になる。あと、やはりCPUが結構熱くなる。
- デフォルト(core_freq=400)のRaspberry Pi3 Model A+
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 18:14:07 - 18:42:09
4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 5067802.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 1239.8 MWIPS (9.9 s, 7 samples)
Execl Throughput 1161.9 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 155991.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 44757.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 357170.9 KBps (30.0 s, 2 samples)
Pipe Throughput 329141.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 67956.1 lps (10.0 s, 7 samples)
Process Creation 2740.7 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2057.4 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 636.2 lpm (60.1 s, 2 samples)
System Call Overhead 700722.3 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 5067802.9 434.3
Double-Precision Whetstone 55.0 1239.8 225.4
Execl Throughput 43.0 1161.9 270.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 155991.5 393.9
File Copy 256 bufsize 500 maxblocks 1655.0 44757.4 270.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 357170.9 615.8
Pipe Throughput 12440.0 329141.6 264.6
Pipe-based Context Switching 4000.0 67956.1 169.9
Process Creation 126.0 2740.7 217.5
Shell Scripts (1 concurrent) 42.4 2057.4 485.2
Shell Scripts (8 concurrent) 6.0 636.2 1060.4
System Call Overhead 15000.0 700722.3 467.1
========
System Benchmarks Index Score 355.5
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 18:42:09 - 19:10:28
4 CPUs in system; running 4 parallel copies of tests
Dhrystone 2 using register variables 17565826.8 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4299.7 MWIPS (11.4 s, 7 samples)
Execl Throughput 2347.9 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 226359.7 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 60848.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 520184.4 KBps (30.0 s, 2 samples)
Pipe Throughput 1131592.3 lps (10.0 s, 7 samples)
Pipe-based Context Switching 238301.2 lps (10.0 s, 7 samples)
Process Creation 5306.9 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 4582.3 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 633.9 lpm (60.2 s, 2 samples)
System Call Overhead 2370589.9 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 17565826.8 1505.2
Double-Precision Whetstone 55.0 4299.7 781.8
Execl Throughput 43.0 2347.9 546.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 226359.7 571.6
File Copy 256 bufsize 500 maxblocks 1655.0 60848.4 367.7
File Copy 4096 bufsize 8000 maxblocks 5800.0 520184.4 896.9
Pipe Throughput 12440.0 1131592.3 909.6
Pipe-based Context Switching 4000.0 238301.2 595.8
Process Creation 126.0 5306.9 421.2
Shell Scripts (1 concurrent) 42.4 4582.3 1080.7
Shell Scripts (8 concurrent) 6.0 633.9 1056.6
System Call Overhead 15000.0 2370589.9 1580.4
========
System Benchmarks Index Score 778.9
core_freq=250
固定のRaspberry Pi3 Model A+
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 14:03:46 - 14:31:47
4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 5070314.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 1240.0 MWIPS (9.9 s, 7 samples)
Execl Throughput 1135.0 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 150556.5 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 44694.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 345471.2 KBps (30.0 s, 2 samples)
Pipe Throughput 326975.0 lps (10.0 s, 7 samples)
Pipe-based Context Switching 69945.1 lps (10.0 s, 7 samples)
Process Creation 2706.0 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 1989.7 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 617.5 lpm (60.0 s, 2 samples)
System Call Overhead 700974.8 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 5070314.4 434.5
Double-Precision Whetstone 55.0 1240.0 225.5
Execl Throughput 43.0 1135.0 264.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 150556.5 380.2
File Copy 256 bufsize 500 maxblocks 1655.0 44694.9 270.1
File Copy 4096 bufsize 8000 maxblocks 5800.0 345471.2 595.6
Pipe Throughput 12440.0 326975.0 262.8
Pipe-based Context Switching 4000.0 69945.1 174.9
Process Creation 126.0 2706.0 214.8
Shell Scripts (1 concurrent) 42.4 1989.7 469.3
Shell Scripts (8 concurrent) 6.0 617.5 1029.1
System Call Overhead 15000.0 700974.8 467.3
========
System Benchmarks Index Score 351.2
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 14:31:47 - 15:00:06
4 CPUs in system; running 4 parallel copies of tests
Dhrystone 2 using register variables 17605207.9 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4288.6 MWIPS (11.3 s, 7 samples)
Execl Throughput 2261.1 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 223324.1 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 60412.7 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 514305.0 KBps (30.0 s, 2 samples)
Pipe Throughput 1125227.3 lps (10.0 s, 7 samples)
Pipe-based Context Switching 232475.4 lps (10.0 s, 7 samples)
Process Creation 5102.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 4417.6 lpm (60.1 s, 2 samples)
Shell Scripts (8 concurrent) 610.2 lpm (60.2 s, 2 samples)
System Call Overhead 2335559.6 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 17605207.9 1508.6
Double-Precision Whetstone 55.0 4288.6 779.7
Execl Throughput 43.0 2261.1 525.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 223324.1 563.9
File Copy 256 bufsize 500 maxblocks 1655.0 60412.7 365.0
File Copy 4096 bufsize 8000 maxblocks 5800.0 514305.0 886.7
Pipe Throughput 12440.0 1125227.3 904.5
Pipe-based Context Switching 4000.0 232475.4 581.2
Process Creation 126.0 5102.1 404.9
Shell Scripts (1 concurrent) 42.4 4417.6 1041.9
Shell Scripts (8 concurrent) 6.0 610.2 1017.1
System Call Overhead 15000.0 2335559.6 1557.0
========
System Benchmarks Index Score 764.2
確かにcore_freq
をデフォルトの400から250に下げて固定化すると少しだけスペックダウンしているが、体感的に感じない気がしている。これなら常時250でシリアル通信できるようにしていても良いかもしれない。
オーバークロックの各種設定
raspi-config
コマンドのメニューにもOverclockという項目があるが、現在は機能してない模様。なので、config.txt
にパラメータを設定してオーバークロックするらしい。
Overclocking options in config.txt
- Raspberry Pi3の場合、Model A+のデフォルト値
arm_freq=1400
gpu_freq=300
core_freq=400
sdram_freq=450
over_voltage=0
- ネットで見つけたRaspberry Pi3のオーバークロック設定
arm_freq
は1300と1550、gpu_freq
とcore_freq
は500
temp_soft_limit=70
arm_freq=1550
gpu_freq=500
core_freq=500
sdram_freq=500
sdram_schmoo=0x02000020
over_voltage=6
sdram_over_voltage=2
arm_freq=1300
gpu_freq=500
sdram_freq=500
over_voltage_sdram=0
core_freq=500 # GPU Frequency
arm_freq=1300 # CPU Frequency
over_voltage=4 #Electric power sent to CPU / GPU (4 = 1.3V)
disable_splash=1 # Disables the display of the electric alert screen
arm_freq=1350
over_voltage=5
gpu_freq=550
# sdram overclock
sdram_freq=550
sdram_schmoo=0x02000020
over_voltage_sdram_p=6
over_voltage_sdram_i=4
over_voltage_sdram_c=4
arm_freq=1350
sdram_freq=450
core_freq=525
over_voltage=4
- オーバークロックの検証
実際に上記の設定を真似して試してみたところ、
arm_freq=1550
だと起動せずarm_freq=1500
だとテスト途中でハングアップ
でarm_freq
は変更なしでトライ。
arm_freq=1400
gpu_freq=500
core_freq=500
sdram_freq=500
sdram_schmoo=0x02000020
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 22:00:11 - 22:28:13
4 CPUs in system; running 1 parallel copy of tests
Dhrystone 2 using register variables 5065350.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 1240.1 MWIPS (9.9 s, 7 samples)
Execl Throughput 1165.4 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 157901.6 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 45419.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 369224.0 KBps (30.0 s, 2 samples)
Pipe Throughput 329239.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 68825.5 lps (10.0 s, 7 samples)
Process Creation 2745.1 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 2067.6 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 672.6 lpm (60.0 s, 2 samples)
System Call Overhead 700514.7 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 5065350.4 434.0
Double-Precision Whetstone 55.0 1240.1 225.5
Execl Throughput 43.0 1165.4 271.0
File Copy 1024 bufsize 2000 maxblocks 3960.0 157901.6 398.7
File Copy 256 bufsize 500 maxblocks 1655.0 45419.5 274.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 369224.0 636.6
Pipe Throughput 12440.0 329239.6 264.7
Pipe-based Context Switching 4000.0 68825.5 172.1
Process Creation 126.0 2745.1 217.9
Shell Scripts (1 concurrent) 42.4 2067.6 487.6
Shell Scripts (8 concurrent) 6.0 672.6 1121.1
System Call Overhead 15000.0 700514.7 467.0
========
System Benchmarks Index Score 359.6
------------------------------------------------------------------------
Benchmark Run: Sun Mar 17 2019 22:28:13 - 22:56:25
4 CPUs in system; running 4 parallel copies of tests
Dhrystone 2 using register variables 17874778.4 lps (10.0 s, 7 samples)
Double-Precision Whetstone 4607.8 MWIPS (10.7 s, 7 samples)
Execl Throughput 2453.9 lps (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 232569.1 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 63347.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 548742.0 KBps (30.0 s, 2 samples)
Pipe Throughput 1179231.6 lps (10.0 s, 7 samples)
Pipe-based Context Switching 244505.3 lps (10.0 s, 7 samples)
Process Creation 5579.7 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 4760.6 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 650.9 lpm (60.1 s, 2 samples)
System Call Overhead 2555818.7 lps (10.0 s, 7 samples)
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 17874778.4 1531.7
Double-Precision Whetstone 55.0 4607.8 837.8
Execl Throughput 43.0 2453.9 570.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 232569.1 587.3
File Copy 256 bufsize 500 maxblocks 1655.0 63347.4 382.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 548742.0 946.1
Pipe Throughput 12440.0 1179231.6 947.9
Pipe-based Context Switching 4000.0 244505.3 611.3
Process Creation 126.0 5579.7 442.8
Shell Scripts (1 concurrent) 42.4 4760.6 1122.8
Shell Scripts (8 concurrent) 6.0 650.9 1084.8
System Call Overhead 15000.0 2555818.7 1703.9
========
System Benchmarks Index Score 812.6
主に浮動小数点演算やファイルコピーが増加している。が全体的にこんなものかな?という印象。とくにarm_freq
を上げられなかったのが悔しい。個体差かもしれない。