How SomeGadgetGuy Tests Phones with Real-World Apps for Benchmarks and Reviews

How powerful is your phone?

If you are a gadget fan, you can probably tell me what’s on the spec sheet. You probably know what processor is inside and how much RAM you have. You might be able to tell me what GPU is in there. If you’re REALLY techie, maybe you can rattle off some synthetic benchmark scores.

But what does that mean?

Knowing your phone has a Qualcomm Snapdragon, Samsung Exynos, or an Apple A Bionic doesn’t tell you much about what your phone can REALLY do. Synthetic benchmarks are easy to manipulate, and gimmicky app “speed runs” are poor indicators of overall performance.

The only proper solution is to test devices using real apps that any consumer can install, and compare the results of consistent testing over several generations of gadgets. While I’ve produced benchmarking videos, I felt it was worth explaining my testing protocol in one handy blog post, for any folks who might be interested.

I’ve been running my phones like this for years now. Here’s how I do it!

Synthetic Benchmark – Geek Bench

I do like getting a baseline for how a phone might perform, and for a simple number score, I prefer the tests and layout from Geek Bench. Single core, multi core, and GPU testing.

It’s not as comprehensive as some graphics and storage benchmarks, but it’s a starting point for our conversation. It’s also handy to see when a phone “juices” a synthetic bench, but then does not provide that tier of performance to consumers.

Video Rendering – LumaFusion

The first real world test, I can think of few applications more taxing than 4K video rendering.

I take six clips from my Panasonic mirrorless camera (100Mbps UHD MP4 at 30fps), edit each clip down to 10 second snippets. Stitch those six clips together into a one-minute timeline with a one second fade transition in between each clip. Then I add a simple text watermark on top of the whole project, and I add an MP3 soundtrack.

That one-minute project is rendered in UHD at 30fps, at a good “YouTube bitrate” of around 50Mbps. I run a similar version of this test on a proper desktop workstation using Davinci Resolve Studio as a comparison point. LumaFusion allows for project exporting, so every test is based on the exact same project files and edits. When comparing against the desktop, I’ve tried to match edits and transitions as closely as I can within a couple of frames.

This test looks at a complex video project, something a content producer might create while out in the field.

Scores are compared by looking at the time it takes to finish the project render by each individual app. Generally, phones are “higher performance” if they can finish the render faster than the video would take to play.

Video Transcode – LumaFusion

While the Video Render test is for a more complex project, and is more for content creators, the video transcode test is a simpler test focusing on file compression.

The Transcode test uses a two minute and thirty second clip from my Panasonic mirrorless camera (100Mbps UHD MP4 at 30fps), and we simply re-encode the video at half the bitrate. There are no additional clips, filters, or elements.

Scores are compared by looking at the time it takes to finish the re-rendered video. Generally, phones are “higher performance” if they can finish the render faster than the video would take to play.

Video Trim Test – LumaFusion – Google Photos

The Video Render Test is focused on content creators, the Video Trim Test is a more “consumer” look at performance. Taking a three-minute UHD video (100Mbps bitrate) and cutting it down to a two-minute video, and timing the completion of saving a new video file.

These results are compared between LumaFusion (professional mobile video editing software) and Google Photos, which is often the default photo gallery app on android phones. Generally, phones are “higher performance” if they can finish the render faster than the video would take to play.

It’s worth noting, that the transcode, trim, and render tests utilize camera footage from a “third party” to avoid any preferential treatment. A phone might have an easier time editing footage from its own camera. Often, the scores improve when you use footage taken directly from the phone’s camera, but that’s difficult to account for, and keep consistent, across all phones.

Also, for creators, there may be times where they might want to shoot from one camera and edit on a phone, or mix footage from phones and cameras into a single project. Using the same footage, from a “regular” camera, has proven to be the fairest way to compare performance across phones, operating systems, and platforms.

 

Podcast Render – Audio Evolution

The Podcast Test takes two audio tracks, simulating a one hour interview. I add music intro and outro bumpers, and two “ad breaks” during the hour long test file. The bumpers and ads are 320Kbps MP3. The conversation samples are 16 bit, 48K WAV files. The entire project is then mixed down as a 128Kbps MP3 file.

At present, and unlike video editing, there is no audio editing app that has both an Android and an iOS option.

Scores are compared by timing the completion of the hour-long audio podcast mix down. Generally, a phone is “high performance” if it can finish the task faster than 1% every two seconds (or faster than three minutes to finish an hour long project render).

Stabilization – Google Photos

This is my voodoo test.

Google Photos has an excellent stabilization plug-in which is good at smoothing out handheld video. I currently have no idea what influences the performance of this plug-in, and scores can vary wildly between phones of all power tiers. I like keeping track of this test in the hopes that some day a pattern might emerge.

The video clip is a one-minute sample shot from my Panasonic mirrorless camera, with light movement from my hands holding the camera.

This test is not available for iOS.

Scores are compared by timing the completion of the stabilization preview. There are no consistent metrics at present for what constitutes “high performance”.

Compression Test – RAR Lab

As phones displace more laptop use, it’s worth looking at their ability to work with compressed files. RAR Lab offers a consistent test that compares performance between Android and Windows.

There are two components to this test, a built-in synthetic benchmark and a real world compression test.

The RAR app has its own benchmark, which can roughly tell you how fast the phone can process data, measured in KB per second. This is a good “fast” reference of how powerful a processor might be.

I also run an actual compression though, taking 3GB of WAV audio files, and squishing them into a single RAR container at “best compression” settings. This provides two benefits. We can see how far off our phones are from laptops, and we get a clearer sense of performance as the phone works harder over a longer period of time. A phone that scores well for the synthetic benchmark might not maintain that level of performance for the full duration of the compression task.

At present (in 2023), the best performing phones are roughly completing this 3GB file compression in under 10 minutes.

Batch Photo Editing – Photo Mate R3

This is now the most brutal test I run on the phones I test.

PhotoMate R3 is similar to LightRoom, allowing users to batch process RAW files. Happily, it’s one of the only apps currently available with this feature that is also a one-time purchase, instead of a monthly or yearly subscription service.

I take two hundred RAW files from a Sony A7iii, use the same presets to edit, and run all two hundred files through the same export. Each RAW file is roughly 45-48MB. The batch is timed in two chunks, split in half between the first one hundred RAW files and the second hundred RAW files.

This accomplishes two tests in one.

We can see which phones complete the whole test faster, but we can also see which phones slow down more over time as the phone approaches higher thermal limits. Some phones throttle sooner than others. It’s TWO tests in ONE!

At present (in 2023), a high performance phone should be able to process the first hundred RAW files in under eight minutes, and we should see minimal performance degradation for the second hundred RAW files.

Gaming – Alien Isolation

I don’t run extensive gaming benchmarks, and the tools for tracking frames per second have become inconsistent over the last year. I share more anecdotal experiences trying to drive a graphics rich game at higher than console frame rates. Alien Isolation is a fantastic port of an older AAA game to mobile, and can be adjusted in settings to run above 60fps.

Not all phones can properly monitor or display an accurate measure of the frame rate, but driving this game on newer premium devices, there’s a noticeable improvement moving from 30fps to 60fps. For 2024, phones are starting to break 100fps in average frame rate for 30 minute play sessions.

Other games I routinely use to examine performance are Battlechasers, Implosion, Doom3, Dead Cells, Neon Chrome, Tesla vs Lovecraft, and Undead Horde. These “tests” are also anecdotal until we can get a more consistent way to monitor performance and screen refresh across all devices.

Battery Tests – Video Streaming / Gaming / Recharging

I also don’t run the most extensive battery tests for my reviews. Every phone owner has specific needs, and uses different services, so I’ve never found “screen on time” to be that valuable of a metric for run time. My SoT likely involves VERY different apps than what’s on YOUR screen while YOU use YOUR phone.

I’ve settled on a handful of tests designed to simulate a specific interaction you MIGHT perform during your day. If you want to use your phone for ONE task during the day, what should you expect in terms of drain for that one task?

Video Streaming is exactly that. I stream one hour of HD video over WiFi at 190 Lux. Based on my old light meter, this was roughly half the brightness of a Galaxy S6, as a point of reference (also why I still track via Lux, because I’m cheap).

Test results are compared by how the phone meters its own battery drain over that hour. Comparisons between phones need to take into account the battery capacity of each phone, where a phone with a larger battery SHOULD drain a smaller percentage of that battery during this test.

For any battery examination I try to test when the phone is reporting between 50-70% of the total charge capacity. Tests are re-run if the phone can operate with different screen resolutions, and at different frame rates where applicable.

Battery recharge rates are tested once the phone reaches a critical battery warning, often around 15% remaining. The phone is powered down, and connected to the cable and charger included by the manufacturer for 30 minutes.

And there you have it!

That’s my current slate of phone tests and benchmarks.

There are certainly some issues with some comparisons, especially when trying to account for completely different platforms like Android vs iOS or Android vs Windows 10. The main conversation I’m trying to focus on is results based.

Instead of saying “How fast can one app complete this task on one platform”, then trying to compare another platform which might not be optimized for that specific task, I instead want to ask “How fast can each platform arrive at a similar result”.

This methodology was born out of video rendering.

I’m really not as concerned about which video editing program might be “The BESTEST!”. What I care about is the ability to upload a video at the best compromise between quality and file size for a streaming platform like YouTube.

The “laptop” on the right is actually an Honor 20, which is spanking my Huawei MateBook.

It bothers some in our community that I like to look at Windows, iOS, and Android, while discussing different solutions for video rendering, but if each video render is nearly identical, why should the platform matter?

If each video is one minute long, with the same edits, the same transitions, the same watermark, the same soundtrack, and rendered to similar bitrate and quality settings, I find the platform division immaterial. If I can produce the same style of video faster on a phone than I can on a laptop, I still made the same style of video. I’m not compromising the production on one platform to improve scores or make another platform look better.

When I compare mobile video editing to laptop video editing, I’m often bringing the quality DOWN on the PC to match the phone, so it should be an even easier task for the “real computer”.

It’s fair to be concerned about optimization when comparing platforms, but as a mobile focused content creator, we already know we can’t have perfect ‘Oranges to Oranges’ showdowns. Acting like we simply can’t compare different platforms is overly reductive, and betrays the growing sentiment that mobile gadgets can properly compete with more traditional computers for actual work. I don’t care about “being fair” to one platform. I care about what delivers the product I need.

The result matters more than a “perfect” fair comparison.

What to test next?

While there are certainly some issues with this kind of performance measuring, I believe this testing style is more practical. We arrive at a more realistic accounting of generational improvements. Consumers can focus on specific tasks which might be more relevant to their needs than a single “gooder” benchmark number score. Someone working heavily with documents might care more about file compression than gaming performance, as an example.

I’m always on the look out for other apps and services we can use to measure performance in a consistent way. I also hope to find tests that can be replicated by other reviewers. There’s nothing stopping a reviewer (or a motivated consumer) from picking up some of the same apps I test with to confirm or refute my findings. It’s easier to run one automated synthetic benchmark, but that rarely gives us an informed look at how a phone might actually perform when put to a real-world test.

I hope this post answered any questions you might have about my review testing, and if you have any suggestions on other apps or services to try, please drop a comment below!

One Reply to “How SomeGadgetGuy Tests Phones with Real-World Apps for Benchmarks and Reviews”

Comments are closed.