Micro-benchmarking iOS devices

Update: Added iPhone 5.
Update: Added iPhone 4s, iPad 3rd gen.
Update: Added iPhone 4, iPad 1st gen.

I follow the excellent weekly posts by Mike Ash, and entered a brief discussion in comments about toll free bridging. In particular, the difference between calling a method via Objective-C (objc_msgSend) and it’s equivalent CoreFoundation C call. Mike suggested adding it to his original suite of tests, which lead to the following results.

iPhone 5 (-mno-thumb)

Custom Apple A6 ARM Cortex A15, up to 1.2GHz

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.3 3.1
C++ virtual method call 100000000 0.3 3.3
Integer division 100000000 1.1 10.9
Objective-C message send 100000000 1.4 13.7
Float division with int conversion 10000000 0.2 24.7
Floating-point division 100000000 2.5 24.8
Objective-C objectAtIndex: 10000000 0.4 35.9
CF CFArrayGetValueAtIndex 10000000 0.5 50.9
16 byte memcpy 10000000 0.7 65.8
16 byte malloc/free 10000000 4.8 482.7
NSAutoreleasePool alloc/init/release 100000 0.1 533.4
NSObject alloc/init/release 100000 0.1 1169.0
NSInvocation message send 100000 0.1 1391.8
16MB malloc/free 1000 0.0 13331.8
Zero-second delayed perform 1000 0.1 99329.1
pthread create/join 100 0.0 120390.0
1MB memcpy 100 0.0 421517.1

iPhone 5 (thumb)

Custom Apple A6 ARM Cortex A15, up to 1.2GHz

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.3 3.1
C++ virtual method call 100000000 0.4 4.0
Integer division 100000000 1.1 10.8
Objective-C message send 100000000 1.4 13.6
Float division with int conversion 10000000 0.2 24.9
Floating-point division 100000000 2.6 26.4
Objective-C objectAtIndex: 10000000 0.4 35.6
CF CFArrayGetValueAtIndex 10000000 0.5 51.0
16 byte memcpy 10000000 0.7 65.8
16 byte malloc/free 10000000 4.7 474.3
NSAutoreleasePool alloc/init/release 100000 0.1 513.2
NSObject alloc/init/release 100000 0.1 1183.1
NSInvocation message send 100000 0.1 1241.5
16MB malloc/free 1000 0.0 12979.7
Zero-second delayed perform 1000 0.1 83574.5
pthread create/join 100 0.0 121289.2
1MB memcpy 100 0.0 426971.7

iPad 3 (thumb)

Apple A5x ARM Cortex A9 1000MHz

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 1.1 11.5
C++ virtual method call 100000000 1.3 12.7
Floating-point division 100000000 2.6 26.1
16 byte memcpy 10000000 0.3 26.2
Float division with int conversion 10000000 0.3 26.3
Integer division 100000000 2.9 28.7
Objective-C message send 100000000 3.6 35.5
Objective-C objectAtIndex: 10000000 0.7 69.0
CF CFArrayGetValueAtIndex 10000000 1.3 131.7
16 byte malloc/free 10000000 4.3 433.9
NSAutoreleasePool alloc/init/release 100000 0.1 600.2
NSObject alloc/init/release 100000 0.1 1235.4
NSInvocation message send 100000 0.3 2966.6
16MB malloc/free 1000 0.0 11633.0
Zero-second delayed perform 1000 0.1 121336.0
pthread create/join 100 0.0 130293.3
1MB memcpy 100 0.2 1662780.4

iPad 3 (-mno-thumb)

Apple A5x ARM Cortex A9 1000MHz

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.9 9.0
C++ virtual method call 100000000 1.1 11.1
Integer division 100000000 2.4 24.1
Floating-point division 100000000 2.6 26.1
Float division with int conversion 10000000 0.3 26.1
16 byte memcpy 10000000 0.3 27.1
Objective-C message send 100000000 2.7 27.2
Objective-C objectAtIndex: 10000000 0.7 68.4
CF CFArrayGetValueAtIndex 10000000 1.0 103.3
16 byte malloc/free 10000000 4.3 432.5
NSAutoreleasePool alloc/init/release 100000 0.1 570.4
NSObject alloc/init/release 100000 0.1 1209.9
NSInvocation message send 100000 0.2 1682.2
16MB malloc/free 1000 0.0 10251.2
pthread create/join 100 0.0 118494.2
Zero-second delayed perform 1000 0.1 121578.2
1MB memcpy 100 0.2 1635983.3

iPhone 4s (thumb)

Apple A5 ARM Cortex A9 ~800MHz

Name Iterations Total time (sec) Time per (ns)
C++ virtual method call 100000000 1.1 11.3
IMP-cached message send 100000000 1.3 12.6
Integer division 100000000 3.1 31.4
Float division with int conversion 10000000 0.3 32.6
Floating-point division 100000000 3.3 32.6
16 byte memcpy 10000000 0.3 32.6
Objective-C message send 100000000 3.4 33.8
Objective-C objectAtIndex: 10000000 0.9 85.9
CF CFArrayGetValueAtIndex 10000000 1.6 165.0
16 byte malloc/free 10000000 5.4 542.2
NSAutoreleasePool alloc/init/release 100000 0.1 753.2
NSObject alloc/init/release 100000 0.2 1511.7
NSInvocation message send 100000 0.2 2111.9
16MB malloc/free 1000 0.0 19033.7
pthread create/join 100 0.0 142817.5
Zero-second delayed perform 1000 0.1 146302.7
1MB memcpy 100 0.2 1787482.1

iPhone 4 (-fthumb)

Apple A4 ARM Cortex A8 ~800MHz

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.9 9.0
C++ virtual method call 100000000 1.0 10.2
16 byte memcpy 10000000 0.4 36.2
Integer division 100000000 4.1 40.6
Objective-C message send 100000000 4.1 40.8
Floating-point division 10000000 0.9 89.4
Objective-C objectAtIndex: 10000000 1.1 105.8
Float division with int conversion 10000000 1.1 105.8
CF CFArrayGetValueAtIndex 10000000 1.7 168.1
NSInvocation message send 100000 0.1 550.8
16 byte malloc/free 10000000 6.6 656.3
NSAutoreleasePool alloc/init/release 100000 0.1 979.5
NSObject alloc/init/release 100000 0.4 4277.9
16MB malloc/free 1000 0.0 20406.7
pthread create/join 100 0.0 139971.2
Zero-second delayed perform 1000 0.2 243883.3
1MB memcpy 100 0.1 1150657.9

iPhone 3GS (-fthumb)

ARM Cortex A8 ~600MHz / 1.66 ns per cycle

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 1.2 11.7
C++ virtual method call 100000000 1.3 13.5
16 byte memcpy 10000000 0.5 46.0
Objective-C message send 100000000 5.4 53.9
Integer division 100000000 6.3 62.9
Floating-point division 10000000 1.2 117.4
Float division with int conversion 10000000 1.4 138.2
Objective-C objectAtIndex: 10000000 1.4 140.1
CF CFArrayGetValueAtIndex 10000000 2.2 220.0
16 byte malloc/free 10000000 6.4 642.6
NSInvocation message send 100000 0.1 723.0
NSAutoreleasePool alloc/init/release 100000 0.1 1305.9
NSObject alloc/init/release 100000 0.6 5743.7
16MB malloc/free 1000 0.0 16104.0
pthread create/join 100 0.0 185759.2
Zero-second delayed perform 1000 0.4 353519.4
1MB memcpy 100 0.2 2170179.2

iPhone 3GS (no thumb)

ARM Cortex A8 ~600MHz / 1.66 ns per cycle

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 1.2 11.8
C++ virtual method call 100000000 4.3 42.9
Objective-C message send 100000000 5.9 59.2
CF CFArrayGetValueAtIndex 10000000 1.0 97.9
Integer division 100000000 9.8 98.4
16 byte memcpy 10000000 1.1 109.3
Floating-point division 10000000 1.2 118.5
Objective-C objectAtIndex: 10000000 1.3 129.0
Float division with int conversion 10000000 1.4 142.6
16 byte malloc/free 10000000 7.5 748.6
NSInvocation message send 100000 0.1 806.0
NSObject alloc/init/release 100000 0.5 4793.1
NSAutoreleasePool alloc/init/release 100000 0.5 4953.1
16MB malloc/free 1000 0.0 17969.2
Zero-second delayed perform 1000 0.2 211840.4
pthread create/join 100 0.0 214742.5
1MB memcpy 100 0.3 3162774.6

iPhone 3G

ARM1176 ~412MHz / 2.4ns per cycle

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 3.9 38.6
C++ virtual method call 100000000 5.0 49.9
Floating-point division 10000000 0.8 81.3
Float division with int conversion 10000000 0.8 81.4
16 byte memcpy 10000000 1.4 136.0
Objective-C message send 100000000 14.9 148.6
Integer division 100000000 16.2 162.2
CF CFArrayGetValueAtIndex 10000000 2.0 201.7
Objective-C objectAtIndex: 10000000 4.2 418.3
NSInvocation message send 100000 0.2 1833.2
16 byte malloc/free 10000000 27.3 2729.8
NSObject alloc/init/release 100000 1.4 14179.1
NSAutoreleasePool alloc/init/release 100000 1.9 18956.7
16MB malloc/free 1000 0.0 47811.3
Zero-second delayed perform 1000 0.8 803419.3
pthread create/join 100 0.1 1085830.0
1MB memcpy 100 1.0 9902796.7

iPad (-fthumb)

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.7 7.1
C++ virtual method call 100000000 0.8 8.1
16 byte memcpy 10000000 0.3 27.7
Objective-C message send 100000000 3.2 32.3
Integer division 100000000 3.4 33.7
CF CFArrayGetValueAtIndex 10000000 0.6 58.8
Floating-point division 10000000 0.7 70.5
Objective-C objectAtIndex: 10000000 0.8 81.6
Float division with int conversion 10000000 0.8 83.1
16 byte malloc/free 10000000 3.6 357.8
NSInvocation message send 100000 0.0 470.8
NSAutoreleasePool alloc/init/release 100000 0.3 2957.0
NSObject alloc/init/release 100000 0.3 3080.2
16MB malloc/free 1000 0.0 14824.2
pthread create/join 100 0.0 127386.2
Zero-second delayed perform 1000 0.2 225271.3
1MB memcpy 100 0.1 1064566.2

iPad (-mno-thumb)

Apple A4 ARM Cortex A8 ~1GHz / 1 ns per cycle

Name Iterations Total time (sec) Time per (ns)
IMP-cached message send 100000000 0.8 8.1
C++ virtual method call 100000000 2.2 21.8
16 byte memcpy 10000000 0.3 28.2
Objective-C message send 100000000 3.2 32.5
Integer division 100000000 3.4 33.9
CF CFArrayGetValueAtIndex 10000000 0.6 55.8
Floating-point division 10000000 0.7 70.9
Objective-C objectAtIndex: 10000000 0.8 81.6
Float division with int conversion 10000000 0.8 82.8
16 byte malloc/free 10000000 3.6 358.3
NSInvocation message send 100000 0.0 473.4
NSAutoreleasePool alloc/init/release 100000 0.3 3017.6
NSObject alloc/init/release 100000 0.3 3071.8
16MB malloc/free 1000 0.0 14623.6
pthread create/join 100 0.0 128674.6
Zero-second delayed perform 1000 0.3 255627.5
1MB memcpy 100 0.1 1063407.5

Note that I did reduce the iterations from the original tests, so whilst the total times are significantly less, the iteration times are still a reflection of overall performance. Compared to Mike’s results, these show that the IMP method is indeed faster as expected, but this was only after I changed to a release build. I also compiled these with Thumb disabled unless otherwise specified. I’ve recently watched some iTunes U videos released by Apple on optimizing OpenGL ES 2.0 and a key takeaway was that the Cortext A8 architecture should always be compiled with thumb enabled. The Cortex CPU uses the newer Thumb-2 instruction set, which has native instructions for floating point. The benefit of Thumb is reduced code size and potentially better performance by utilising the I-cache.

Observations

Source code for this test is available here.