Scrambled Halton

The Halton sequence, which is one of my favourite algorithms ever, can be used for efficient stratified multi-dimensional sampling. Some references:

Van der Corput sequence
Halton sequence

It is possible to do stratified sampling of hyper-points in the s-dimensional unit hyper-cube by picking one consecutive dimension of the Halton series for each component. A convenient way to do so is to use the first s prime numbers as the basis for each Halton sequence.

It is well-known, however, that while this approach works great for low dimensions, high dimensions often exhibit a great degree of undesired correlation. The following image displays a grid where each cell combines two components of the 32-dimensional Halton hyper-cube.

Raw 32-dimensional Halton hyper-cube

One can easily spot how some pairs exhibit an obvious pattern, while others fill their corresponding 2D area very densely. This happens more aggressively for higher dimensions (i.e., to the right/bottom in the image) and for pairs formed with close components (i.e., near the diagonal in the image). Click to enlarge!

A successful approach to dissolve this problem without losing the good properties of stratification is to do “random digit scrambling” (a.k.a. rds). During the construction of a Halton number, digits in the range [0..base[ are combined. Given a Pseudo-Random Permutation of length=base, all that one must do is use PRP(digit) instead of digit directly. This somewhat shuffles Halton pairs in rows and columns in a strong way so the correlation disappears. However, since the PRP is a bijection, the good properties of stratification are generally preserved.

How to build a strong and efficient randomised PRP of an arbitrary length is an interesting subject which details I won’t get into here.

Here’s the scrambling strategy in action:

Scrambled 32-dimensional Halton hyper-cube

Now all the blocks in the grid look equally dense. Magic!

As long as one picks good PRPs, it is possible to generate any number of different samplings, all with the same good properties.

Uniform vs. stratified

This is a classic subject in numerical (Monte Carlo) integration.

Uniform 2D distribution vs. Halton series for the first 2 dimensions

To the left: 32768 points in a 512×512 image using a uniform random number generator (Mersenne Twister). To the right, the first 32768 pairs in the Halton series, using dimensions #0 and #1. Click to enlarge!