Playing with the Fourier Transform

The beauty of the Fourier Transform never ceases to amaze me. And since several effects in ArionFX are based on it, I have had to play with it a lot in recent times.

As explained in a previous post, diffraction patterns (e.g., the glare simulated in ArionFX for Photoshop) come from the Fourier Transform of the lens aperture. I will use the FT of an aperture mask for visualization in this post.

I will use pow-2 square sizes (my FT implementation is an FFT). Let’s start by this Aperture x Obstacle map output directly from ArionFX for Photoshop v3.5.0.

Aperture x Obstacle mask, rasterized @ 512×512

The Fourier Transform of that image, rasterized at that size in said 512×512 buffer, is the following.

FT of the mask above

The faux-color is done on the magnitude of each complex number in the FT. All the FT images in this post are normalized equally, and offset to look “centered” around the mid-pixel.

Such diffraction patterns and some heavy convolution magic are what ArionFX uses to compute glare on HDR images:

Resulting glare in ArionFX

Now, let’s focus on what happens to the FT (frequency space) when one does certain operations on the source data (image space). Or, in this exemplification: what happens to the diffraction pattern, when one plays with the rasterized aperture mask.

Note that we’re speaking of the Discrete Fourier Transform, so sampling (rasterization, pixelization) issues are mostly ignored.

Rotation about the center

A rotation of the source buffer about its center doesn’t change the frequencies present in the data; only their orientation. So a rotation in the source data rotates the FT rotates in the exact same way.

As we will see next, this property holds true regardless of the center of rotation, because the FT is invariant with respect to translations.

Rotation about the center

Translation (with warp-around)

Frequencies arise from the relative position of values in the data field, and not from their absolute position in said field. For this reason, shifting (warp-around included) the source data does not affect the corresponding Fourier Transform in any way.

Invariance to translation

Let’s recall that the idea behind the FT is that “any periodic function can be rewritten as a weighted sum of sines and cosines of different frequencies”. Periodic being the keyword there.

Repetition (tiling)

Tiling the data buffer NxM times (e.g., 2×2 in the example below) produces the same FT, but with frequencies “exploded” every NxM cells, canceling out everywhere else.

This is because no new frequencies are introduced, since we are transforming the same source data. However, the source data is NxM times smaller proportional to the data buffer size (i.e., the frequencies become NxM times higher).

Exploded frequencies on tiling

Data scaling

Normalization and sampling issues aside, scaling the data within the source buffer scales the FT inversely.

This is because encoding smaller data requires higher frequencies, while encoding a larger version of the same data requires lower frequencies.

Inverse effect on scaling

In the particular case of glare (e.g., ArionFX) this means that the diffraction pattern becomes blurry if the iris is sampled small. Or, in other words, for a given iris, the sharpest diffraction pattern possible is achieved when the iris is sampled as large as the data buffer itself.

Note, however, that “large” here means “with respect to the data buffer”, being the size of the data buffer irrelevant as we will see next.

Downsampling and Gaussian blur

I talked about several strategies to optimize convolutions in some of my previous posts. I still got to talk about how to approximate a Gaussian blur using a multi-step Box blur in a future post. However, there is yet another good technique to optimize a Gaussian blur that may come handy in some cases.

This post is inspired by a need that I had some days ago: Say that you need to do a 3D Gaussian blur on a potentially humongous 3D data buffer. Working with downsampled data sounds ideal in terms of storage and performance. So that’s what I am going to talk about here:

What happens when downsampling is mixed with a Gaussian blur?

The idea:

Here’s the 0-centered and un-normalized 1D Gaussian function:

G( x , \sigma ) = e^\frac{ - x^2 }{ 2 \sigma ^2 }

The sigma parameter in the Gaussian function stretches the bell shape along the x axis. So it is quite straightforward to understand that if one downsamples the input dataset by a scale factor k, then applies a (smaller) Gaussian where sigma’=s/k, and finally upscales the result by the same scale factor k, the result will approximate a true Gaussian on the original dataset where sigma=s.

In cleaner terms: if one has an input dataset (e.g., an image) I and wants to have it blurred by a Gaussian where sigma=s:

1- I’<=I downsampled by a certain scale factor k.
2- I”<=I' blurred by a small Gaussian where s’=s/k.
3- I”’<=I'' upscaled by a scale factor k.

How good is this approximation?

The Sampling Theorem states that sampling a signal at (at least) twice its smallest wavelength is enough. Which means that downsampling cuts frequencies above the Nyquist limit (half the sampling rate). In other words: Downsampling means less data to process, but at the expense of introducing an error.

Fortunately, a Gaussian blur is a form of low-pass frequency filter. This means that blurring is quite tolerant to alterations in the high part of the frequency spectrum.

Visual evaluation:

In the examples below I am downsampling with a simple pixel average, and I am upscaling with a simple bilinear filter. The 2×2 grids below compare:

1- Top-left – The original image I.
2- Top-rightI downsampled and upscaled by k (note the blocky bilinear filter look).
3- Bottom-left – The resulting image I”’.
4- Bottom-rightI blurred by a true Gaussian where sigma=s.

In these examples, I chose k=sigma for simplicity. This means that the small Gaussian uses sigma’=1.

Gaussian blur where sigma=4

Gaussian blur where sigma=16

Gaussian blur where sigma=64

Conclusion:

As shown, the approximation (bottom-left vs. bottom-right) is pretty good.

The gain in speed depends on multiple implementation factors. However, as I explained above, this post was inspired by a need to cope with a cubic memory storage problem when doing Gaussian blurs on a 3D buffer. Working with a heavily downsampled buffer clearly helps in that sense. And it is needless to say that decreasing the amount of data to process by k^3 also brings a dramatic speed boost, making it possible to use tiny separable convolutions along the 3 (downsampled) axes.

Note that one might pick any downsampling scale factor k>=1. The higher the value of k, the higher the approximation error and the smaller and faster the convolution.

The choice k=sigma offers a good trade-off between approximation error and speed gain as shown above.

CIE, XYZ, Yxy, RGB, and gamuts

Despite digital images are often given in RGB, RGB is not always a convenient space for image processing. Splitting RGB colors into their luminance and chrominance, and doing the opposite, are very common operations when it comes to image filtering.

The Yxy encoding is a very good solution due to its strong physical/perceptual background. One can go from RGB to XYZ (selecting a certain color-space transform matrix), and then go from XYZ to Yxy using the following formulas:

x = \frac{ X }{ X + Y + Z }

y = \frac{ Y }{ X + Y + Z }

The inverse transform to go from Yxy to XYZ is given by the following formulas:

X = Y \cdot \frac{ x }{ y }

Z = \frac{ X }{ x } - X - Y

The color-space transform matrix that turns RGB values into XYZ values (and its inverse) is a simple 3×3 affine transform matrix. The exact values of the matrix depend on the color-space you are working in. For example, the matrix for sRGB can be found in the Wikipedia sRGB page. Actually, one can build a custom RGB-to-XYZ matrix by defining the xy coordinates of the three primary colors (R, G, and B) and the position of the white-point. There is an excellent explanation of this on this page by Ryan Juckett.

XYZ colors, and the Yxy encoding have some very interesting practical properties. Below are some of them.

1- All human-visible colors have positive X, Y, and Z values.

This means that matrices to go from RGB to XYZ can only have positive coefficients, and a valid (positive) RGB color can only map to a positive XYZ triplet. The opposite is generally not true.

2- The Y value of an XYZ color represents the relative luminance of the color as percieved by the human eye.

So if one computes the Y (luminance) field of an RGB image, one gets a grayscale version of the original image.

3- Y (luminance) and xy (chrominance) are fully independent.

This means that if one encodes an RGB color in Yxy, then amplifies or diminishes the luminance Y’=k·Y, and then go back to RGB from Y’xy, the result is k·RGB.

This is a fundamental property in tonemapping algorithms, which typically do range compression on the luminance, and then just re-plug the original chrominance field to the compressed luminance field.

This property also means that one can do any alteration on the chrominance values (xy) and then go back to RGB while preserving the original luminance (Y). This comes handy when implementing image filters such as Tint (xy rotation around the whitepoint) or Saturation (xy amplification away from the whitepoint).

4- The range of Y is [0..INF), but the range of xy is constrained to [0..1]x[0..1].

Since the chrominance values (xy) are normalized they are independent of the luminance value, and hence they have a limited range. Here’s what happens when one selects a fixed luminance (Y=1, for example) and draws the RGB colors corresponding to all the chrominances in x=[0..1] and y=[0..1]:

sRGB triangle (gamut)

The darkened colors are invalid (not plausible) results where at least one of the RGB components is negative. So the only valid values that xy can get are actually constrained to this triangle, which is, in fact, much smaller than the unit square. As a matter of fact, the maximum distance between two chrominance values in Yxy (in sRGB as displayed above) is around 0.6, derived from the length of the longest edge of the gamut triangle.

Different color-space transforms have larger or smaller gamut triangles, as depicted in this CIE 1931 xy gamut comparison.

Note that the gamut triangle is the same regardless of the color luminance.