Speeding up processing with Goroutines
How I increased the speed of my image processing application by 8x
I realize I still need to do a deeper dive on my Go program that I used to make the images displayed here, but until then, let's chat concurrency!
All you need to know is, I have a program that performs actions on every pixel in an image. I'd heard that goroutines could be used to do some of this work in parallel. What if I could process ever pixel at once? While wasn't able to get to that level of speed, I cut total performance speed by about 4 times.
My original function
go
It takes in two images, resizes them to the same size, loops through the length and height ranges and passes the pixel at that coordinate of each image to modFunction
, which returns a new pixel as result. This pixel processing happens one at a time, which is where I saw the opportunity for using Goroutines.
The Goroutine Refactor
After much back and forth with ChatGPT, I landed on the following -
go
Basically it divides the pixels into 500 different chunks and passes them into the goroutine to process those pixels. I don't know what's happening under the hood, perse, but I can confirm that it dramatically sped up performance. But how can I know for sure?
Performance benchmarking with b *testing.B
I've never seen anything like this before Go, but Go makes it incredibly straightforward to test how fast a function executes.
I set up the following tests -
go
the value that returns from getTestDimension
dictates the size of the test squares - so a value of 1000
would mean a 1000px x 1000px
square.
Performance results
When I run with with getTestDimension
set to 1000
, I get the following results. In the test time period, the Concurrent function ran 63 times, while the linear function ran 8 - just shy of 8 times faster. Though when I do the division of 128556984
nano seconds / 23612317
nano seconds I get 5.44 - so... either way it's faster?
Though when I drop drop the test image size to 300x300
, results become
Suddenly the concurrent implementation runs 43x faster? And this time, both the cycle count and the nano seconds per operation are much closer to the same ratio.
How many coroutines should I run?
I don't know! I landed on 500 because that's when the code ran the fastest at 300x300
. I look forward to developing a deeper understanding of what's going on. Above 500 the performance started to dip - presumably due to the overhead of orchestrating the goroutines themselves.
In conclusion
This exercise has inspired more questions than answers - namely -
- why is the goroutine function so much faster at smaller image sizes, but not as much faster at larger file sizes?
- Can I graph out speed performer by goroutine count for a bunch of image sizes to determine if there is an optimal goroutine count for different size bands?
- Does the optimal goroutine count vary by what computer is running the program?
Questions aside, I'm thrilled my application runs so much faster!!
made with the Go program - now at least 8ish to 44ish times faster