Playing with the Fourier Transform

The beauty of the Fourier Transform never ceases to amaze me. And since several effects in ArionFX are based on it, I have had to play with it a lot in recent times.

As explained in a previous post, diffraction patterns (e.g., the glare simulated in ArionFX for Photoshop) come from the Fourier Transform of the lens aperture. I will use the FT of an aperture mask for visualization in this post.

I will use pow-2 square sizes (my FT implementation is an FFT). Let’s start by this Aperture x Obstacle map output directly from ArionFX for Photoshop v3.5.0.

Aperture x Obstacle mask, rasterized @ 512×512

The Fourier Transform of that image, rasterized at that size in said 512×512 buffer, is the following.

The faux-color is done on the magnitude of each complex number in the FT. All the FT images in this post are normalized equally, and offset to look “centered” around the mid-pixel.

Such diffraction patterns and some heavy convolution magic are what ArionFX uses to compute glare on HDR images:

Resulting glare in ArionFX

Now, let’s focus on what happens to the FT (frequency space) when one does certain operations on the source data (image space). Or, in this exemplification: what happens to the diffraction pattern, when one plays with the rasterized aperture mask.

Note that we’re speaking of the Discrete Fourier Transform, so sampling (rasterization, pixelization) issues are mostly ignored.

A rotation of the source buffer about its center doesn’t change the frequencies present in the data; only their orientation. So a rotation in the source data rotates the FT rotates in the exact same way.

As we will see next, this property holds true regardless of the center of rotation, because the FT is invariant with respect to translations.

Translation (with warp-around)

Frequencies arise from the relative position of values in the data field, and not from their absolute position in said field. For this reason, shifting (warp-around included) the source data does not affect the corresponding Fourier Transform in any way.

Invariance to translation

Let’s recall that the idea behind the FT is that “any periodic function can be rewritten as a weighted sum of sines and cosines of different frequencies”. Periodic being the keyword there.

Repetition (tiling)

Tiling the data buffer NxM times (e.g., 2×2 in the example below) produces the same FT, but with frequencies “exploded” every NxM cells, canceling out everywhere else.

This is because no new frequencies are introduced, since we are transforming the same source data. However, the source data is NxM times smaller proportional to the data buffer size (i.e., the frequencies become NxM times higher).

Exploded frequencies on tiling

Data scaling

Normalization and sampling issues aside, scaling the data within the source buffer scales the FT inversely.

This is because encoding smaller data requires higher frequencies, while encoding a larger version of the same data requires lower frequencies.

Inverse effect on scaling

In the particular case of glare (e.g., ArionFX) this means that the diffraction pattern becomes blurry if the iris is sampled small. Or, in other words, for a given iris, the sharpest diffraction pattern possible is achieved when the iris is sampled as large as the data buffer itself.

Note, however, that “large” here means “with respect to the data buffer”, being the size of the data buffer irrelevant as we will see next.

Scrambled Halton

The Halton sequence, which is one of my favourite algorithms ever, can be used for efficient stratified multi-dimensional sampling. Some references:

It is possible to do stratified sampling of hyper-points in the s-dimensional unit hyper-cube by picking one consecutive dimension of the Halton series for each component. A convenient way to do so is to use the first s prime numbers as the basis for each Halton sequence.

It is well-known, however, that while this approach works great for low dimensions, high dimensions often exhibit a great degree of undesired correlation. The following image displays a grid where each cell combines two components of the 32-dimensional Halton hyper-cube.

Raw 32-dimensional Halton hyper-cube

One can easily spot how some pairs exhibit an obvious pattern, while others fill their corresponding 2D area very densely. This happens more aggressively for higher dimensions (i.e., to the right/bottom in the image) and for pairs formed with close components (i.e., near the diagonal in the image). Click to enlarge!

A successful approach to dissolve this problem without losing the good properties of stratification is to do “random digit scrambling” (a.k.a. rds). During the construction of a Halton number, digits in the range [0..base[ are combined. Given a Pseudo-Random Permutation of length=base, all that one must do is use PRP(digit) instead of digit directly. This somewhat shuffles Halton pairs in rows and columns in a strong way so the correlation disappears. However, since the PRP is a bijection, the good properties of stratification are generally preserved.

How to build a strong and efficient randomised PRP of an arbitrary length is an interesting subject which details I won’t get into here.

Here’s the scrambling strategy in action:

Scrambled 32-dimensional Halton hyper-cube

Now all the blocks in the grid look equally dense. Magic!

As long as one picks good PRPs, it is possible to generate any number of different samplings, all with the same good properties.

Uniform vs. stratified

This is a classic subject in numerical (Monte Carlo) integration.

Uniform 2D distribution vs. Halton series for the first 2 dimensions

To the left: 32768 points in a 512×512 image using a uniform random number generator (Mersenne Twister). To the right, the first 32768 pairs in the Halton series, using dimensions #0 and #1. Click to enlarge!

Sobel operator

The Sobel operator is a simple way to approximate the gradient of the intensity in an image. This, in visual terms, can be used for edge detection. The purpose of edge detection is to significantly reduce the amount of data in the image, while preserving the structural properties to be used for further image processing.

In practice, the Sobel operator is simply a pair of 3×3 (separable) convolution kernels. One highlights the horizontal gradient/edges, and the other one highlights the vertical gradient/edges.

$S_x = \left[ \begin{array}{ccc} -1 & \phantom{+} 0 & +1 \\ -2 & \phantom{+} 0 & +2 \\ -1 & \phantom{+} 0 & +1 \end{array} \right]$

$S_y = \left[ \begin{array}{ccc} +1 & +2 & +1 \\ \phantom{+} 0 & \phantom{+} 0 & \phantom{+} 0 \\ -1 & -2 & -1 \end{array} \right]$

In non-formal terms, and under certain theoretical assumptions, this is conceptually equivalent to computing the partial derivatives of the image with respect to x an y.

For the examples below, I am using the same image featured by Wikipedia in the Sobel operator page:

Sobel operator

This grid presents:

1- The input image (luminance).
2- The absolute magnitude of the result of the Sobel filter.
3- The result of the Sobel (x) filter.
4- The result of the Sobel (y) filter.
5- Same as (2), but in faux color.
6- The gradient vectors, normalized and displayed as an RGB-encoded normal map.

The 3-tap Sobel convolution kernels have a 1px radius, so they have a very limited edge detection ‘range’. This makes the filter quite shaky as soon as the image presents fine details or noise. For this reason, one may want to pre-pass the input image with a subtle Gaussian blur.

This has the effect of diminishing edges in general (as expected), but the resulting gradient images are equivalent, yet much cleaner.

Sobel operator (Gaussian with sigma=2)

The Sobel operator is one of the most fundamental building blocks in feature detection in the field of Computer Vision.

Note that the Sobel operator does not characterize edges or detects features in any way. It simply produces a filtered image where pixels that most likely belong to an area of high gradient (such as an edge) are highlighted.

Bonus remark: Sobel filtering is very similar to what a render engine such as Arion does to transform a height map (bump map) into a normal map.

The Error function (erf)

Here is the 1D Gaussian function:

$G( x , \sigma ) = \frac{ 1 }{ \sqrt{ 2 \pi } \sigma } e^{ \frac{ - x^2 }{ 2 \sigma ^2 } }$

Put in short, the Error function is the integral of the Gaussian function from 0 to a certain point x:

$\text{erf}( x ) = \frac{ 2 }{ \pi } \int_0^x \! e^{ -t^2 } \, \mathrm{ d }t$

At least, that is the way the formula is presented by Wikipedia and Wolfram|Alpha. But as soon as you try to work with it you find out that in order to really match the above Gaussian function, normalization and axis scaling must be taken care of:

$\text{erf}( x , \sigma ) = \int_0^{ \frac{ x }{ \sqrt{ 2 } \sigma } } \! e^{ -t^2 } \, \mathrm{ d }t$

The plots below display G(x,sigma) (bell-shaped) in blue, and erf(x,sigma) (S-shaped) in yellow.

A very typical use for G(x,sigma) that I’ve been talking about on this blog lately, is to build convolution kernels for Gaussian blur. An image convolution kernel, for example, is a pixel-centered discretization of a certain underlying function (a Gaussian, in this case). Said discretization splits the x axis in uniform unit-length bins (a.k.a. taps, or intervals) centered at x=0.

For each bin, this is pretty much the definition of the integral of G(x,sigma) along the bin. That is, the increment of erf(x,sigma) between both end-points of the bin.

Discretized 1D Gaussian (sigma=0.5)

Discretized 1D Gaussian (sigma=1.0)

Discretized 1D Gaussian (sigma=1.5)

Doing it wrong:

Most implementations of Gaussian convolution kernels simply evaluate G(x,sigma) at the mid-point of each bin. This is a decent approximation that is also trivial to implement:

$\text{bin}( a , b , \sigma ) = G( \frac{ a + b }{ 2 } , \sigma )$

This value is represented in blue in the above plots.

Implementing erf:

While G(x,sigma) has a trivial explicit formulation, erf is an archetypical non-elementary function.

Some development environments provide an erf implementation. But most (the ones I use) don’t. There are some smart example implementations out there if you Google a bit. Usually they are either numerical approximations, or piece-wise interpolations.

Assuming that one has an erf implementation at his disposal, the correct value for the bin [a..b] is:

$\text{bin'}( a , b , \sigma ) = \text{erf}( b , \sigma ) - \text{erf}( a , \sigma )$

This value is represented in red in the above plots.

A poor-man approximation:

If you are feeling lazy, an alternative approximation is to super-sample G(x,sigma) along the bin. The amount of samples can be chosen to be proportional to how small sigma is.

This method is easy to implement. However, it is far less elegant and also less efficient as more evaluation calls are required.

Conclusion:

When discretizing a Gaussian function, as sigma becomes small (close to or smaller than (b-a)), the approximation bin(a,b) (in blue above) deviates significantly from the correct bin'(a,b) (in red above).

So if you are going to Gaussian-blur at radius 1px or smaller, and precision is important, then it is necessary to use erf(x,sigma) or at least super-sample G(x,sigma).