We’ve all heard it before: “Can you enhance that for me?” - whilst looking at some crappy CCTV footage during a crime investigation. Hollywood never lacks fantasy. In this blog post, we might just turn this into a reality.
As a final-year student in AI, I came across Brainjar in the search for an internship. Doing an internship in the middle of a pandemic wasn’t ideal, but luckily, the team welcomed me with open arms. Well, metaphorically at least, as my internship would end up being 100% remote due to government restrictions.
Thankfully though, the amazing team at Brainjar did a great job at making me feel at home, without me ever actually leaving my home.
For the subject of the internship, we mutually agreed to an assignment around image enhancing. This was the perfect combination of my two greatest passions: AI and photography. What enhancing means was up to me to decide.
I decided to break down the problem into three sections. Super-Resolution being the first. This is what typically is referred to as ‘enhance’ in TV shows. Secondly, I wanted to build an AI algorithm to colorize black-and-white images. And lastly, one that can remove noise from images without degrading detail.
Becoming a Hollywood movie star (super-resolution)
Okay, maybe the iconic CCTV “Enhance!” on TV shows are a bit of an exaggeration sometimes. That is because until just a few years ago, it was complete science-fiction. In recent years, however, super-resolution has seen a massive increase in interest in the AI community. An ever-increasing amount of new techniques are being published, each of them a bit better than the last one.
To solve the problem, I used a GAN architecture. In over-simplified terms, we set up two neural networks competing against each other to learn. If one loses over the other, it will try to understand why that was the case and improve for the next round. It is more complicated than that, of course, in the real world. You can get more details on GANs here.
Low-level, I have also implemented some further improvements to this architecture. The most notable one being the concept of perceptual loss. This is a technique to calculate the error of the output to how realistic it is perceptually, rather than pixel-by-pixel comparisons with the ground truth.
Saving you the galore of roadblocks, fallbacks, and struggles, the result can be seen here. It’s quite amazing just how much detail can be generated out of nowhere thanks to deep learning.
For the second part of my image enhancer, I decided to try to colorize black and white photos. Essentially, I will try to enhance old photos and bring them back to life.
Almost all digital photos are represented in the RGB-color space. Here, each pixel has three values containing the intensity for Red, Green, and Blue. This means that all three channels contain information about the color but also the luminance of that pixel. This is not ideal for colorization. I essentially wanted to give the luminance as an input through a B&W image and have the color as the output of a neural network.
This led me to convert the RGB inputs into the LAB-color space. Here, the L channel contains the luminosity of the pixel and the A and B channels contain color balance information. Essentially, I give the L value as an input for each pixel and want to predict the A and B channels.
I started with some basic convolutional networks and spent weeks perfecting the output of my AI. Unfortunately, the results were mediocre at best. The AI colored almost everything brown as a ‘safe bet’. It seemingly failed to learn patterns during training and just ended up confused.
Digging a bit deeper
This is when I decided to start over from scratch. This time utilizing the knowledge I gained from the super-resolution about GANs and perceptual loss. After all, what I learned from the paper is that these are far better learning methods for stylistic purposes, therefore making it ideal for colorizing as well.
The ‘G’ in GAN stands for Generative, which means it can generate new outputs out of random input noise. This means a GAN will try to generate something realistic. Realistic colors, in this case. It doesn’t matter if it paints a t-shirt red or blue, as there is no way for us to tell the original color either. But it should learn that trees are usually green, skies and water are blue, etc. This makes GANs better suited for these stylistic purposes, as traditional machine learning algorithms will try to do 1-on-1 linking patterns with color, which is impossible and will only confuse the AI. Now, after implementing these concepts of GAN and perceptual loss into the colorizer, the results were far better, as you can see below.
Where do I even begin with this one? I never thought noise could be this complicated. When researching denoising, I fell into a (very) deep rabbit hole about electronic sensor-read circuits and image processing pipelines. This led me to a paper that I used as the backbone for this part of the application.
In essence, the biggest problem with learned denoising is a lack of good data. Ideally, we would have a dataset that contains a clean noise-free image and one that is naturally noisy. Like taking a picture in low-light. It is not a good idea to simply add noise to a clean image as this artificial noise is not representative of real noise.
That brings me to the aforementioned paper. The idea they describe is to ‘unprocess’ a clean image into raw-sensor data. During this process, we will add realistic noise in each step. This way, we can generate noise that is artificial but also representative.
Now that the data problem is sorted, a convolutional network gets trained to denoise the images. A test result can be seen below and the performance is pretty good.
Putting it all together
To make the AI models I made accessible to end-users, I made a web application in Flask. Here, a user can upload an input image and choose one or more of the enhance-AI methods.
I would have loved to host this application publicly and let you have a play around with it yourself. Unfortunately, the AI-backend is quite resource-intensive. Therefore, it would be too cost-inefficient to host for extended periods of time. Besides, the app does not have sufficient optimizations for production deployment, as this was not the focus of the internship.
So that’s it then. The end of my internship at Brainjar. I would like to thank every member of the team - and my two mentors Brecht and Kurt in particular - for the amazing support, even during these non-ideal working conditions. I honestly could not have asked for a better place to run my internship during this weird year.
Now there is just one more thing for me left to do: