Playing with the Fourier Transform

The beauty of the Fourier Transform never ceases to amaze me. And since several effects in ArionFX are based on it, I have had to play with it a lot in recent times.

As explained in a previous post, diffraction patterns (e.g., the glare simulated in ArionFX for Photoshop) come from the Fourier Transform of the lens aperture. I will use the FT of an aperture mask for visualization in this post.

I will use pow-2 square sizes (my FT implementation is an FFT). Let’s start by this Aperture x Obstacle map output directly from ArionFX for Photoshop v3.5.0.

Aperture x Obstacle mask, rasterized @ 512×512

The Fourier Transform of that image, rasterized at that size in said 512×512 buffer, is the following.

The faux-color is done on the magnitude of each complex number in the FT. All the FT images in this post are normalized equally, and offset to look “centered” around the mid-pixel.

Such diffraction patterns and some heavy convolution magic are what ArionFX uses to compute glare on HDR images:

Resulting glare in ArionFX

Now, let’s focus on what happens to the FT (frequency space) when one does certain operations on the source data (image space). Or, in this exemplification: what happens to the diffraction pattern, when one plays with the rasterized aperture mask.

Note that we’re speaking of the Discrete Fourier Transform, so sampling (rasterization, pixelization) issues are mostly ignored.

A rotation of the source buffer about its center doesn’t change the frequencies present in the data; only their orientation. So a rotation in the source data rotates the FT rotates in the exact same way.

As we will see next, this property holds true regardless of the center of rotation, because the FT is invariant with respect to translations.

Translation (with warp-around)

Frequencies arise from the relative position of values in the data field, and not from their absolute position in said field. For this reason, shifting (warp-around included) the source data does not affect the corresponding Fourier Transform in any way.

Invariance to translation

Let’s recall that the idea behind the FT is that “any periodic function can be rewritten as a weighted sum of sines and cosines of different frequencies”. Periodic being the keyword there.

Repetition (tiling)

Tiling the data buffer NxM times (e.g., 2×2 in the example below) produces the same FT, but with frequencies “exploded” every NxM cells, canceling out everywhere else.

This is because no new frequencies are introduced, since we are transforming the same source data. However, the source data is NxM times smaller proportional to the data buffer size (i.e., the frequencies become NxM times higher).

Exploded frequencies on tiling

Data scaling

Normalization and sampling issues aside, scaling the data within the source buffer scales the FT inversely.

This is because encoding smaller data requires higher frequencies, while encoding a larger version of the same data requires lower frequencies.

Inverse effect on scaling

In the particular case of glare (e.g., ArionFX) this means that the diffraction pattern becomes blurry if the iris is sampled small. Or, in other words, for a given iris, the sharpest diffraction pattern possible is achieved when the iris is sampled as large as the data buffer itself.

Note, however, that “large” here means “with respect to the data buffer”, being the size of the data buffer irrelevant as we will see next.

CIE, XYZ, Yxy, RGB, and gamuts

Despite digital images are often given in RGB, RGB is not always a convenient space for image processing. Splitting RGB colors into their luminance and chrominance, and doing the opposite, are very common operations when it comes to image filtering.

The Yxy encoding is a very good solution due to its strong physical/perceptual background. One can go from RGB to XYZ (selecting a certain color-space transform matrix), and then go from XYZ to Yxy using the following formulas:

$x = \frac{ X }{ X + Y + Z }$

$y = \frac{ Y }{ X + Y + Z }$

The inverse transform to go from Yxy to XYZ is given by the following formulas:

$X = Y \cdot \frac{ x }{ y }$

$Z = \frac{ X }{ x } - X - Y$

The color-space transform matrix that turns RGB values into XYZ values (and its inverse) is a simple 3×3 affine transform matrix. The exact values of the matrix depend on the color-space you are working in. For example, the matrix for sRGB can be found in the Wikipedia sRGB page. Actually, one can build a custom RGB-to-XYZ matrix by defining the xy coordinates of the three primary colors (R, G, and B) and the position of the white-point. There is an excellent explanation of this on this page by Ryan Juckett.

XYZ colors, and the Yxy encoding have some very interesting practical properties. Below are some of them.

1- All human-visible colors have positive X, Y, and Z values.

This means that matrices to go from RGB to XYZ can only have positive coefficients, and a valid (positive) RGB color can only map to a positive XYZ triplet. The opposite is generally not true.

2- The Y value of an XYZ color represents the relative luminance of the color as percieved by the human eye.

So if one computes the Y (luminance) field of an RGB image, one gets a grayscale version of the original image.

3- Y (luminance) and xy (chrominance) are fully independent.

This means that if one encodes an RGB color in Yxy, then amplifies or diminishes the luminance Y’=k·Y, and then go back to RGB from Y’xy, the result is k·RGB.

This is a fundamental property in tonemapping algorithms, which typically do range compression on the luminance, and then just re-plug the original chrominance field to the compressed luminance field.

This property also means that one can do any alteration on the chrominance values (xy) and then go back to RGB while preserving the original luminance (Y). This comes handy when implementing image filters such as Tint (xy rotation around the whitepoint) or Saturation (xy amplification away from the whitepoint).

4- The range of Y is [0..INF), but the range of xy is constrained to [0..1]x[0..1].

Since the chrominance values (xy) are normalized they are independent of the luminance value, and hence they have a limited range. Here’s what happens when one selects a fixed luminance (Y=1, for example) and draws the RGB colors corresponding to all the chrominances in x=[0..1] and y=[0..1]:

sRGB triangle (gamut)

The darkened colors are invalid (not plausible) results where at least one of the RGB components is negative. So the only valid values that xy can get are actually constrained to this triangle, which is, in fact, much smaller than the unit square. As a matter of fact, the maximum distance between two chrominance values in Yxy (in sRGB as displayed above) is around 0.6, derived from the length of the longest edge of the gamut triangle.

Different color-space transforms have larger or smaller gamut triangles, as depicted in this CIE 1931 xy gamut comparison.

Note that the gamut triangle is the same regardless of the color luminance.

Diaphragm and f-stop

This is just another image taken from the RCSDK Unit Testing system. The chart displays different polygonal diaphragms at different f-stop values. Doubling the f-stop number, halves the surface light can pass through.

Polygonal diaphragms and f-stop

Glare patterns

Glare in photography is due to Fraunhofer diffraction as light from distant objects passes through the camera diaphragm.

There is a magical connection between Fraunhofer diffraction (physics) and the Fourier Transform (math). As a matter of fact, the intensity of the Fraunhofer diffraction pattern of a certain aperture is given by the squared modulus of the Fourier Transform of said aperture.

Assuming a clean and unobstacled camera, the aperture is the diaphragm shape. Here you have the diffraction patterns that correspond to some basic straight-blade (polygonal) diaphragms.

Glare patterns

Interestingly, the Fourier Transform produces one infinite decaying streak perpendicular to each polygon edge. When the number of edges is even, the streaks overlap in pairs. That is why an hexagonal diaphragm produces 6 streaks, and an heptagonal diaphragm produces 14.

The leftmost pattern happens to be the Airy disk. The Airy disk is a limit case where the number of polygon edges/streaks is infinite.

The examples above were generated at 256×256. The visual definition of the pattern naturally depends on the resolution of the buffers involved in the computation of the Fourier Transform. However, note that the FT has an infinite range. This means that for ideal polygonal shapes, the streaks are infinitely long.

In the practical case, buffers are far from infinite, and you hit one property of the Fourier Transform that is often nothing but an annoyance: the FT is cyclic. The image below depicts what happens when one pumps up the intensity of one of the glare patterns obtained above: the (infinite) streaks, warp-around the (finite) FT buffer.

Cyclic glare pattern

Bonus: Here’s some real-life glare I captured this evening at the European Athletics Championships.

Real-life glare