Comparison between Hikey960 and Hikey970

no_maddo · June 13, 2018, 8:03am

Recently, I am trying to compare performance of GPUs between Hikey960 and Hikey970.
The result is strange to me. Any idea about it?

-	TinkerBoard	HIkey960	Hikey970
alexnet	135	31	31
googlenet	224	46	52
inception_v3	802	177	195
inception_v4	—	542	525
lenet	8	9	14
mobilenet	124	21	32
mobilenet_qasymm8	121	26	178
resnet50	546	96	112
resnext50	907	221	252
squeezenet	132	23	159
squeezenet_v1_1	70	19	110
vgg16	887	180	188
vgg19	1069	244	232

All test-cases are the examples of https://github.com/ARM-software/ComputeLibrary. Every value is msec and minimum in 100 times exectuion.
Hikey960 and Hikey970 ran on AOSP, TinkerBoard ran on TinkerOS using OpenCL.

It is understandable for me that Mali-G71 in Hikey960 is much faster then Mali-T764 in TinkerBoard. But Mali-G72 in Hikey970 is equal or slower than Mali-G71 in Hikey960.

It may be just a problem of software tuning.
I expected that Mali-G72 is faster because of the number of shader core.
Does anybody have idea why such result come from?

psyhtest · July 12, 2018, 8:45am

Hi no_maddo, I have found a significant performance regression (and its cause!) with ArmCL v18.05 compared to ArmCL v18.03, especially for 1x1 convolutions used e.g. in SqueezeNet and MobileNets. As your results show a big slowdown between HiKey960 and HiKey970 especially for these models, I am wondering if it’s due to the different library versions deployed on your platforms?

no_maddo · July 13, 2018, 5:30am

Thanks. In my case, I only used v18.05.

jainrahul1 · October 3, 2018, 5:29am

@no_maddo Could you please share how you built compute library on Hikey 970? Have you tried any tensorflow Models?