Perspective: Relative computing power
A brief exploration how far computing has come by comparing a Cray 1 supercomputer with an iPhone 6 ...
A brief exploration how far computing has come by comparing a Cray 1 supercomputer with an iPhone 6 ...
Update: Added iPhone 5. Update: Added iPhone 4s, iPad 3rd gen. Update: Added iPhone 4, iPad 1st gen. I follow the excellent weekly posts by Mike Ash, and entered a brief discussion in comments about toll free bridging. In particular, the difference between calling a method via Objective-C (objc_msgSend) and it’s equivalent CoreFoundation C call. Mike suggested adding it to his original suite of tests, which lead to the following results. iPhone 5 (-mno-thumb) Custom Apple A6 ARM Cortex A15, up to 1.2GHz Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.3 3.1 C++ virtual method call 100000000 0.3 3.3 Integer division 100000000 1.1 10.9 Objective-C message send 100000000 1.4 13.7 Float division with int conversion 10000000 0.2 24.7 Floating-point division 100000000 2.5 24.8 Objective-C objectAtIndex: 10000000 0.4 35.9 CF CFArrayGetValueAtIndex 10000000 0.5 50.9 16 byte memcpy 10000000 0.7 65.8 16 byte malloc/free 10000000 4.8 482.7 NSAutoreleasePool alloc/init/release 100000 0.1 533.4 NSObject alloc/init/release 100000 0.1 1169.0 NSInvocation message send 100000 0.1 1391.8 16MB malloc/free 1000 0.0 13331.8 Zero-second delayed perform 1000 0.1 99329.1 pthread create/join 100 0.0 120390.0 1MB memcpy 100 0.0 421517.1 iPhone 5 (thumb) Custom Apple A6 ARM Cortex A15, up to 1.2GHz Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.3 3.1 C++ virtual method call 100000000 0.4 4.0 Integer division 100000000 1.1 10.8 Objective-C message send 100000000 1.4 13.6 Float division with int conversion 10000000 0.2 24.9 Floating-point division 100000000 2.6 26.4 Objective-C objectAtIndex: 10000000 0.4 35.6 CF CFArrayGetValueAtIndex 10000000 0.5 51.0 16 byte memcpy 10000000 0.7 65.8 16 byte malloc/free 10000000 4.7 474.3 NSAutoreleasePool alloc/init/release 100000 0.1 513.2 NSObject alloc/init/release 100000 0.1 1183.1 NSInvocation message send 100000 0.1 1241.5 16MB malloc/free 1000 0.0 12979.7 Zero-second delayed perform 1000 0.1 83574.5 pthread create/join 100 0.0 121289.2 1MB memcpy 100 0.0 426971.7 iPad 3 (thumb) Apple A5x ARM Cortex A9 1000MHz Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 1.1 11.5 C++ virtual method call 100000000 1.3 12.7 Floating-point division 100000000 2.6 26.1 16 byte memcpy 10000000 0.3 26.2 Float division with int conversion 10000000 0.3 26.3 Integer division 100000000 2.9 28.7 Objective-C message send 100000000 3.6 35.5 Objective-C objectAtIndex: 10000000 0.7 69.0 CF CFArrayGetValueAtIndex 10000000 1.3 131.7 16 byte malloc/free 10000000 4.3 433.9 NSAutoreleasePool alloc/init/release 100000 0.1 600.2 NSObject alloc/init/release 100000 0.1 1235.4 NSInvocation message send 100000 0.3 2966.6 16MB malloc/free 1000 0.0 11633.0 Zero-second delayed perform 1000 0.1 121336.0 pthread create/join 100 0.0 130293.3 1MB memcpy 100 0.2 1662780.4 iPad 3 (-mno-thumb) Apple A5x ARM Cortex A9 1000MHz Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.9 9.0 C++ virtual method call 100000000 1.1 11.1 Integer division 100000000 2.4 24.1 Floating-point division 100000000 2.6 26.1 Float division with int conversion 10000000 0.3 26.1 16 byte memcpy 10000000 0.3 27.1 Objective-C message send 100000000 2.7 27.2 Objective-C objectAtIndex: 10000000 0.7 68.4 CF CFArrayGetValueAtIndex 10000000 1.0 103.3 16 byte malloc/free 10000000 4.3 432.5 NSAutoreleasePool alloc/init/release 100000 0.1 570.4 NSObject alloc/init/release 100000 0.1 1209.9 NSInvocation message send 100000 0.2 1682.2 16MB malloc/free 1000 0.0 10251.2 pthread create/join 100 0.0 118494.2 Zero-second delayed perform 1000 0.1 121578.2 1MB memcpy 100 0.2 1635983.3 iPhone 4s (thumb) Apple A5 ARM Cortex A9 ~800MHz Name Iterations Total time (sec) Time per (ns) C++ virtual method call 100000000 1.1 11.3 IMP-cached message send 100000000 1.3 12.6 Integer division 100000000 3.1 31.4 Float division with int conversion 10000000 0.3 32.6 Floating-point division 100000000 3.3 32.6 16 byte memcpy 10000000 0.3 32.6 Objective-C message send 100000000 3.4 33.8 Objective-C objectAtIndex: 10000000 0.9 85.9 CF CFArrayGetValueAtIndex 10000000 1.6 165.0 16 byte malloc/free 10000000 5.4 542.2 NSAutoreleasePool alloc/init/release 100000 0.1 753.2 NSObject alloc/init/release 100000 0.2 1511.7 NSInvocation message send 100000 0.2 2111.9 16MB malloc/free 1000 0.0 19033.7 pthread create/join 100 0.0 142817.5 Zero-second delayed perform 1000 0.1 146302.7 1MB memcpy 100 0.2 1787482.1 iPhone 4 (-fthumb) Apple A4 ARM Cortex A8 ~800MHz Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.9 9.0 C++ virtual method call 100000000 1.0 10.2 16 byte memcpy 10000000 0.4 36.2 Integer division 100000000 4.1 40.6 Objective-C message send 100000000 4.1 40.8 Floating-point division 10000000 0.9 89.4 Objective-C objectAtIndex: 10000000 1.1 105.8 Float division with int conversion 10000000 1.1 105.8 CF CFArrayGetValueAtIndex 10000000 1.7 168.1 NSInvocation message send 100000 0.1 550.8 16 byte malloc/free 10000000 6.6 656.3 NSAutoreleasePool alloc/init/release 100000 0.1 979.5 NSObject alloc/init/release 100000 0.4 4277.9 16MB malloc/free 1000 0.0 20406.7 pthread create/join 100 0.0 139971.2 Zero-second delayed perform 1000 0.2 243883.3 1MB memcpy 100 0.1 1150657.9 iPhone 3GS (-fthumb) ARM Cortex A8 ~600MHz / 1.66 ns per cycle Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 1.2 11.7 C++ virtual method call 100000000 1.3 13.5 16 byte memcpy 10000000 0.5 46.0 Objective-C message send 100000000 5.4 53.9 Integer division 100000000 6.3 62.9 Floating-point division 10000000 1.2 117.4 Float division with int conversion 10000000 1.4 138.2 Objective-C objectAtIndex: 10000000 1.4 140.1 CF CFArrayGetValueAtIndex 10000000 2.2 220.0 16 byte malloc/free 10000000 6.4 642.6 NSInvocation message send 100000 0.1 723.0 NSAutoreleasePool alloc/init/release 100000 0.1 1305.9 NSObject alloc/init/release 100000 0.6 5743.7 16MB malloc/free 1000 0.0 16104.0 pthread create/join 100 0.0 185759.2 Zero-second delayed perform 1000 0.4 353519.4 1MB memcpy 100 0.2 2170179.2 iPhone 3GS (no thumb) ARM Cortex A8 ~600MHz / 1.66 ns per cycle Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 1.2 11.8 C++ virtual method call 100000000 4.3 42.9 Objective-C message send 100000000 5.9 59.2 CF CFArrayGetValueAtIndex 10000000 1.0 97.9 Integer division 100000000 9.8 98.4 16 byte memcpy 10000000 1.1 109.3 Floating-point division 10000000 1.2 118.5 Objective-C objectAtIndex: 10000000 1.3 129.0 Float division with int conversion 10000000 1.4 142.6 16 byte malloc/free 10000000 7.5 748.6 NSInvocation message send 100000 0.1 806.0 NSObject alloc/init/release 100000 0.5 4793.1 NSAutoreleasePool alloc/init/release 100000 0.5 4953.1 16MB malloc/free 1000 0.0 17969.2 Zero-second delayed perform 1000 0.2 211840.4 pthread create/join 100 0.0 214742.5 1MB memcpy 100 0.3 3162774.6 iPhone 3G ARM1176 ~412MHz / 2.4ns per cycle Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 3.9 38.6 C++ virtual method call 100000000 5.0 49.9 Floating-point division 10000000 0.8 81.3 Float division with int conversion 10000000 0.8 81.4 16 byte memcpy 10000000 1.4 136.0 Objective-C message send 100000000 14.9 148.6 Integer division 100000000 16.2 162.2 CF CFArrayGetValueAtIndex 10000000 2.0 201.7 Objective-C objectAtIndex: 10000000 4.2 418.3 NSInvocation message send 100000 0.2 1833.2 16 byte malloc/free 10000000 27.3 2729.8 NSObject alloc/init/release 100000 1.4 14179.1 NSAutoreleasePool alloc/init/release 100000 1.9 18956.7 16MB malloc/free 1000 0.0 47811.3 Zero-second delayed perform 1000 0.8 803419.3 pthread create/join 100 0.1 1085830.0 1MB memcpy 100 1.0 9902796.7 iPad (-fthumb) Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.7 7.1 C++ virtual method call 100000000 0.8 8.1 16 byte memcpy 10000000 0.3 27.7 Objective-C message send 100000000 3.2 32.3 Integer division 100000000 3.4 33.7 CF CFArrayGetValueAtIndex 10000000 0.6 58.8 Floating-point division 10000000 0.7 70.5 Objective-C objectAtIndex: 10000000 0.8 81.6 Float division with int conversion 10000000 0.8 83.1 16 byte malloc/free 10000000 3.6 357.8 NSInvocation message send 100000 0.0 470.8 NSAutoreleasePool alloc/init/release 100000 0.3 2957.0 NSObject alloc/init/release 100000 0.3 3080.2 16MB malloc/free 1000 0.0 14824.2 pthread create/join 100 0.0 127386.2 Zero-second delayed perform 1000 0.2 225271.3 1MB memcpy 100 0.1 1064566.2 iPad (-mno-thumb) Apple A4 ARM Cortex A8 ~1GHz / 1 ns per cycle Name Iterations Total time (sec) Time per (ns) IMP-cached message send 100000000 0.8 8.1 C++ virtual method call 100000000 2.2 21.8 16 byte memcpy 10000000 0.3 28.2 Objective-C message send 100000000 3.2 32.5 Integer division 100000000 3.4 33.9 CF CFArrayGetValueAtIndex 10000000 0.6 55.8 Floating-point division 10000000 0.7 70.9 Objective-C objectAtIndex: 10000000 0.8 81.6 Float division with int conversion 10000000 0.8 82.8 16 byte malloc/free 10000000 3.6 358.3 NSInvocation message send 100000 0.0 473.4 NSAutoreleasePool alloc/init/release 100000 0.3 3017.6 NSObject alloc/init/release 100000 0.3 3071.8 16MB malloc/free 1000 0.0 14623.6 pthread create/join 100 0.0 128674.6 Zero-second delayed perform 1000 0.3 255627.5 1MB memcpy 100 0.1 1063407.5 Note that I did reduce the iterations from the original tests, so whilst the total times are significantly less, the iteration times are still a reflection of overall performance. Compared to Mike’s results, these show that the IMP method is indeed faster as expected, but this was only after I changed to a release build. I also compiled these with Thumb disabled unless otherwise specified. I’ve recently watched some iTunes U videos released by Apple on optimizing OpenGL ES 2.0 and a key takeaway was that the Cortext A8 architecture should always be compiled with thumb enabled. The Cortex CPU uses the newer Thumb-2 instruction set, which has native instructions for floating point. The benefit of Thumb is reduced code size and potentially better performance by utilising the I-cache. ...