Playing with the Fourier Transform

The beauty of the Fourier Transform never ceases to amaze me. And since several effects in ArionFX are based on it, I have had to play with it a lot in recent times.

As explained in a previous post, diffraction patterns (e.g., the glare simulated in ArionFX for Photoshop) come from the Fourier Transform of the lens aperture. I will use the FT of an aperture mask for visualization in this post.

I will use pow-2 square sizes (my FT implementation is an FFT). Let’s start by this Aperture x Obstacle map output directly from ArionFX for Photoshop v3.5.0.

Aperture x Obstacle mask, rasterized @ 512×512

The Fourier Transform of that image, rasterized at that size in said 512×512 buffer, is the following.

FT of the mask above

The faux-color is done on the magnitude of each complex number in the FT. All the FT images in this post are normalized equally, and offset to look “centered” around the mid-pixel.

Such diffraction patterns and some heavy convolution magic are what ArionFX uses to compute glare on HDR images:

Resulting glare in ArionFX

Now, let’s focus on what happens to the FT (frequency space) when one does certain operations on the source data (image space). Or, in this exemplification: what happens to the diffraction pattern, when one plays with the rasterized aperture mask.

Note that we’re speaking of the Discrete Fourier Transform, so sampling (rasterization, pixelization) issues are mostly ignored.

Rotation about the center

A rotation of the source buffer about its center doesn’t change the frequencies present in the data; only their orientation. So a rotation in the source data rotates the FT rotates in the exact same way.

As we will see next, this property holds true regardless of the center of rotation, because the FT is invariant with respect to translations.

Rotation about the center

Translation (with warp-around)

Frequencies arise from the relative position of values in the data field, and not from their absolute position in said field. For this reason, shifting (warp-around included) the source data does not affect the corresponding Fourier Transform in any way.

Invariance to translation

Let’s recall that the idea behind the FT is that “any periodic function can be rewritten as a weighted sum of sines and cosines of different frequencies”. Periodic being the keyword there.

Repetition (tiling)

Tiling the data buffer NxM times (e.g., 2×2 in the example below) produces the same FT, but with frequencies “exploded” every NxM cells, canceling out everywhere else.

This is because no new frequencies are introduced, since we are transforming the same source data. However, the source data is NxM times smaller proportional to the data buffer size (i.e., the frequencies become NxM times higher).

Exploded frequencies on tiling

Data scaling

Normalization and sampling issues aside, scaling the data within the source buffer scales the FT inversely.

This is because encoding smaller data requires higher frequencies, while encoding a larger version of the same data requires lower frequencies.

Inverse effect on scaling

In the particular case of glare (e.g., ArionFX) this means that the diffraction pattern becomes blurry if the iris is sampled small. Or, in other words, for a given iris, the sharpest diffraction pattern possible is achieved when the iris is sampled as large as the data buffer itself.

Note, however, that “large” here means “with respect to the data buffer”, being the size of the data buffer irrelevant as we will see next.

Filling of missing image pixels

Here’s what we could call a mean pyramid of the 512×512 Lena image. i.e., a sequence of progressive 1:2 downscalings, where each pixel in the i-th downscaled level is the average of the corresponding 4 pixels in the previous level. For people familiar with OpenGL and such, this is what happens when the mipmaps for a texture map are computed:

The 9+1 mipmap levels in the 512×512 Lena image

There are many smart, efficient, and relatively simple algorithms based on multi-resolution image pyramids. Modern GPUs can deal with upscaling and downscaling of RGB/A images at the speed of light, so usually such algorithms can be implemented at interactive or even real-time framerates. This paper here is a nice dissertion on the subject.

Here is the result of upscaling all the mip levels of the Lena image by doing progressive 2:1 bi-linear upscalings until the original resolution is reached:

Upscaled mipmap levels

Note the characteristic “blocky” (bi-linear) appearance, specially evident in the images of the second row.

Lately, I have been doing some experimentation with pixel extrapolation algorithms that “restore” the missing pixels in an incomplete image. After figuring out a couple of solutions of my own, I found the paper above, which explains pretty much what my algorithm had become at that point.

The pixel extrapolation algorithm works in 2 stages:

1- The first stage (analysis) prepares the mean pyramid of the source image by doing progressive 1:2 downscalings. Only the meaningful (not-a-hole) pixels in each 2×2 packet are averaged down. If a 2×2 packet does not have any meaningful pixels, a hole is passed to the next (lower) level.

2- The second stage (synthesis) starts at the smallest level and goes up, leaving meaningful pixels intact, while replacing holes by upscaled data from the previous (lower) level.

Mean pyramid (with holes)

Note that the analysis stage can stop as soon as a mip level doesn’t have any holes.

Here is the full algorithm, successfully filling the missing pixels in the image.

Mean pyramid (filled)

Conclusions:

This algorithm can be implemented in an extremely efficient fashion on the GPU, and allows for fantastic parallelization on the CPU as well. The locality of color/intensity is preserved reasonably well, although holes become smears of color “too easily”.

A small (classic) inconvenience is that the source image must be pre-padded to a power-of-2 size for the progressive 1:2 downscalings to be well defined. I picked an image that is a power-of-2 in size already in the above examples.

So: given the ease of implementation and the extreme potential in terms of efficiency, the results are decent, but there is some room for improvement in terms of detail preservation.

Some procedural action

I am (fortunately!) always very busy, but lately it’s been a long chain of little-sleep, endless work sessions. I am working on new technology for Arion 3 and I couldn’t be more excited.

Here’s some image where all the materials are strictly procedural done by Erwann Loison in Arion/ArionFX. Not a single image texture was used:

Procedural action by Erwann Loison

Here is his original post on his blog.

Stuff like this, and much, much more, will be news in the Arion world soon (when it’s done).

Happy New 2015!

Sobel operator

The Sobel operator is a simple way to approximate the gradient of the intensity in an image. This, in visual terms, can be used for edge detection. The purpose of edge detection is to significantly reduce the amount of data in the image, while preserving the structural properties to be used for further image processing.

In practice, the Sobel operator is simply a pair of 3×3 (separable) convolution kernels. One highlights the horizontal gradient/edges, and the other one highlights the vertical gradient/edges.

S_x = \left[ \begin{array}{ccc} -1 & \phantom{+} 0 & +1 \\ -2 & \phantom{+} 0 & +2 \\ -1 & \phantom{+} 0 & +1 \end{array} \right]

S_y = \left[ \begin{array}{ccc} +1 & +2 & +1 \\ \phantom{+} 0 & \phantom{+} 0 & \phantom{+} 0 \\ -1 & -2 & -1 \end{array} \right]

In non-formal terms, and under certain theoretical assumptions, this is conceptually equivalent to computing the partial derivatives of the image with respect to x an y.

For the examples below, I am using the same image featured by Wikipedia in the Sobel operator page:

Sobel operator

This grid presents:

1- The input image (luminance).
2- The absolute magnitude of the result of the Sobel filter.
3- The result of the Sobel (x) filter.
4- The result of the Sobel (y) filter.
5- Same as (2), but in faux color.
6- The gradient vectors, normalized and displayed as an RGB-encoded normal map.

The 3-tap Sobel convolution kernels have a 1px radius, so they have a very limited edge detection ‘range’. This makes the filter quite shaky as soon as the image presents fine details or noise. For this reason, one may want to pre-pass the input image with a subtle Gaussian blur.

This has the effect of diminishing edges in general (as expected), but the resulting gradient images are equivalent, yet much cleaner.

Sobel operator (Gaussian with sigma=2)

The Sobel operator is one of the most fundamental building blocks in feature detection in the field of Computer Vision.

Note that the Sobel operator does not characterize edges or detects features in any way. It simply produces a filtered image where pixels that most likely belong to an area of high gradient (such as an edge) are highlighted.

Bonus remark: Sobel filtering is very similar to what a render engine such as Arion does to transform a height map (bump map) into a normal map.

The Error function (erf)

Here is the 1D Gaussian function:

G( x , \sigma ) = \frac{ 1 }{ \sqrt{ 2 \pi } \sigma } e^{ \frac{ - x^2 }{ 2 \sigma ^2 } }

Put in short, the Error function is the integral of the Gaussian function from 0 to a certain point x:

\text{erf}( x ) = \frac{ 2 }{ \pi } \int_0^x \! e^{ -t^2 } \, \mathrm{ d }t

At least, that is the way the formula is presented by Wikipedia and Wolfram|Alpha. But as soon as you try to work with it you find out that in order to really match the above Gaussian function, normalization and axis scaling must be taken care of:

\text{erf}( x , \sigma ) = \int_0^{ \frac{ x }{ \sqrt{ 2 } \sigma } } \! e^{ -t^2 } \, \mathrm{ d }t

The plots below display G(x,sigma) (bell-shaped) in blue, and erf(x,sigma) (S-shaped) in yellow.

A very typical use for G(x,sigma) that I’ve been talking about on this blog lately, is to build convolution kernels for Gaussian blur. An image convolution kernel, for example, is a pixel-centered discretization of a certain underlying function (a Gaussian, in this case). Said discretization splits the x axis in uniform unit-length bins (a.k.a. taps, or intervals) centered at x=0.

For each bin, this is pretty much the definition of the integral of G(x,sigma) along the bin. That is, the increment of erf(x,sigma) between both end-points of the bin.

Discretized 1D Gaussian (sigma=0.5)

Discretized 1D Gaussian (sigma=1.0)

Discretized 1D Gaussian (sigma=1.5)

Doing it wrong:

Most implementations of Gaussian convolution kernels simply evaluate G(x,sigma) at the mid-point of each bin. This is a decent approximation that is also trivial to implement:

\text{bin}( a , b , \sigma ) = G( \frac{ a + b }{ 2 } , \sigma )

This value is represented in blue in the above plots.

Implementing erf:

While G(x,sigma) has a trivial explicit formulation, erf is an archetypical non-elementary function.

Some development environments provide an erf implementation. But most (the ones I use) don’t. There are some smart example implementations out there if you Google a bit. Usually they are either numerical approximations, or piece-wise interpolations.

Assuming that one has an erf implementation at his disposal, the correct value for the bin [a..b] is:

\text{bin'}( a , b , \sigma ) = \text{erf}( b , \sigma ) - \text{erf}( a , \sigma )

This value is represented in red in the above plots.

A poor-man approximation:

If you are feeling lazy, an alternative approximation is to super-sample G(x,sigma) along the bin. The amount of samples can be chosen to be proportional to how small sigma is.

This method is easy to implement. However, it is far less elegant and also less efficient as more evaluation calls are required.

Conclusion:

When discretizing a Gaussian function, as sigma becomes small (close to or smaller than (b-a)), the approximation bin(a,b) (in blue above) deviates significantly from the correct bin'(a,b) (in red above).

So if you are going to Gaussian-blur at radius 1px or smaller, and precision is important, then it is necessary to use erf(x,sigma) or at least super-sample G(x,sigma).