CIE, XYZ, Yxy, RGB, and gamuts

Despite digital images are often given in RGB, RGB is not always a convenient space for image processing. Splitting RGB colors into their luminance and chrominance, and doing the opposite, are very common operations when it comes to image filtering.

The Yxy encoding is a very good solution due to its strong physical/perceptual background. One can go from RGB to XYZ (selecting a certain color-space transform matrix), and then go from XYZ to Yxy using the following formulas:

x = \frac{ X }{ X + Y + Z }

y = \frac{ Y }{ X + Y + Z }

The inverse transform to go from Yxy to XYZ is given by the following formulas:

X = Y \cdot \frac{ x }{ y }

Z = \frac{ X }{ x } - X - Y

The color-space transform matrix that turns RGB values into XYZ values (and its inverse) is a simple 3×3 affine transform matrix. The exact values of the matrix depend on the color-space you are working in. For example, the matrix for sRGB can be found in the Wikipedia sRGB page. Actually, one can build a custom RGB-to-XYZ matrix by defining the xy coordinates of the three primary colors (R, G, and B) and the position of the white-point. There is an excellent explanation of this on this page by Ryan Juckett.

XYZ colors, and the Yxy encoding have some very interesting practical properties. Below are some of them.

1- All human-visible colors have positive X, Y, and Z values.

This means that matrices to go from RGB to XYZ can only have positive coefficients, and a valid (positive) RGB color can only map to a positive XYZ triplet. The opposite is generally not true.

2- The Y value of an XYZ color represents the relative luminance of the color as percieved by the human eye.

So if one computes the Y (luminance) field of an RGB image, one gets a grayscale version of the original image.

3- Y (luminance) and xy (chrominance) are fully independent.

This means that if one encodes an RGB color in Yxy, then amplifies or diminishes the luminance Y’=k·Y, and then go back to RGB from Y’xy, the result is k·RGB.

This is a fundamental property in tonemapping algorithms, which typically do range compression on the luminance, and then just re-plug the original chrominance field to the compressed luminance field.

This property also means that one can do any alteration on the chrominance values (xy) and then go back to RGB while preserving the original luminance (Y). This comes handy when implementing image filters such as Tint (xy rotation around the whitepoint) or Saturation (xy amplification away from the whitepoint).

4- The range of Y is [0..INF), but the range of xy is constrained to [0..1]x[0..1].

Since the chrominance values (xy) are normalized they are independent of the luminance value, and hence they have a limited range. Here’s what happens when one selects a fixed luminance (Y=1, for example) and draws the RGB colors corresponding to all the chrominances in x=[0..1] and y=[0..1]:

sRGB triangle (gamut)

The darkened colors are invalid (not plausible) results where at least one of the RGB components is negative. So the only valid values that xy can get are actually constrained to this triangle, which is, in fact, much smaller than the unit square. As a matter of fact, the maximum distance between two chrominance values in Yxy (in sRGB as displayed above) is around 0.6, derived from the length of the longest edge of the gamut triangle.

Different color-space transforms have larger or smaller gamut triangles, as depicted in this CIE 1931 xy gamut comparison.

Note that the gamut triangle is the same regardless of the color luminance.

Glare patterns

Glare in photography is due to Fraunhofer diffraction as light from distant objects passes through the camera diaphragm.

There is a magical connection between Fraunhofer diffraction (physics) and the Fourier Transform (math). As a matter of fact, the intensity of the Fraunhofer diffraction pattern of a certain aperture is given by the squared modulus of the Fourier Transform of said aperture.

Assuming a clean and unobstacled camera, the aperture is the diaphragm shape. Here you have the diffraction patterns that correspond to some basic straight-blade (polygonal) diaphragms.

Glare patterns

Interestingly, the Fourier Transform produces one infinite decaying streak perpendicular to each polygon edge. When the number of edges is even, the streaks overlap in pairs. That is why an hexagonal diaphragm produces 6 streaks, and an heptagonal diaphragm produces 14.

The leftmost pattern happens to be the Airy disk. The Airy disk is a limit case where the number of polygon edges/streaks is infinite.

The examples above were generated at 256×256. The visual definition of the pattern naturally depends on the resolution of the buffers involved in the computation of the Fourier Transform. However, note that the FT has an infinite range. This means that for ideal polygonal shapes, the streaks are infinitely long.

In the practical case, buffers are far from infinite, and you hit one property of the Fourier Transform that is often nothing but an annoyance: the FT is cyclic. The image below depicts what happens when one pumps up the intensity of one of the glare patterns obtained above: the (infinite) streaks, warp-around the (finite) FT buffer.

Cyclic glare pattern

Bonus: Here’s some real-life glare I captured this evening at the European Athletics Championships.

Real-life glare

Fast convolutions (III)

These are some more remarks about the performance of the convolution methods described so far. I will be leaving the brute-force algorithm out for obvious reasons.

The plot below represents input size (in pixels) in the x axis, and convolution time (in seconds) in the y axis. So, if you go to x=4096 that means “a convolution of a 4096×4096 image by a 4096×4096 kernel”.

Even competition between FFT-based vs. separable convolution methods

Two conclusions can be made from the plot, which confirm what was explained in my previous post:

1- For large kernels (as large as the image itself) the separable convolution method is O(n^3) and times get to absurd levels very quickly. If you are dealing with large generic images/kernels, the FFT-based method is the way to go.

2- The FFT-based method uses the Fast Fourier Transform, which is O(n^2·log(n)) thanks to some decomposition technique that requires the size of the input data to be (padded to) a power-of-2. For this reason, it takes the same amount of time to do a convolution on a 257×257 image/kernel than on a 512×512 image/kernel, because both cases operate on 512×512 buffers after all. This is why the graph for the FFT method is stepped. When x crosses a power-of-2, the running time goes up all of a sudden and stays stable until the next power-of-2.

The plot was generated with my current implementation of both convolution methods in RCSDK. My FFT uses the Cooley-Tukey algorithm, and everything (i.e., FFT, IFFT, point-wise products, and 1D separable convolutions) makes use of multi-threading. There’s always room for improvement, but the running times seem pretty decent, as we’re speaking of <2s for images up to 4096×4096 in a 16-thread CPU. An implementation in CUDA would be (orders of magnitude) faster, though. 😛

A couple more remarks:

1- The graph of the FFT-based method wouldn’t change if smaller kernels were used, as the algorithm requires the kernel to be padded to the size of the image. However, the graph of the separable method becomes less steep when very small kernels are used:

Separable convolution with a fixed 11×11 kernel (in orange)

2- The running time of the FFT/IFFT is not exactly a trivial subject. Speed in FFT/IFFT algorithms depends not only on the size of the data, but also on the data itself. While generating the above plots, I came across some ‘anomalies’ in the time measurements. Some kernels or some images produced faster or slower results. Some combinations would even produce non-flat steps in the staircase-looking graph. That’s normal, but worth mentioning.