Randomly Reordering an Array - Beware of Bias
So I came upon the following blog post and found that I (just as the author has done) have been using a naive algorithm for randomly reordering arrays (mostly notably used when shuffling a deck of cards):
http://scottonwriting.net/sowblog/posts/13356.aspx
I agree I have seen this simple iterate-and-swap reordering algorithm all over the place, even in my education at college. I can think of a few applications that I am using it in right now. Here is some example C# code demonstrating the faulty algorithm:
Random rnd = new Random(); for (int i = 0; i < deck.Length; i++) { // Set swapWithPos to a random position, // such that 0 <= swapWithPos < deck.Length int swapWithPos = rnd.Next(deck.Length); // Swap the value at the "current" // position (i) with value at swapWithPos int tmp = deck[i]; deck[i] = deck[swapWithPos]; deck[swapWithPos] = tmp; }
Apparently, there is some bias introduced by this algorithm while reordering the array. For example, with a 3 item array of values [0, 1, 2], there are 27 outcomes, but only 6 unique permutations. Half of the permutations ([0, 2, 1], [1, 0, 2], and [1, 2, 0]) each appear 5 times in the 27 outcomes, while the other half of the permutations ([0, 1, 2], [2, 0, 1], and [2, 1, 0]) each appear only 4 times each. There is a slight bias toward half of the permutations in this example.
With a few modifications, this algorithm can be morphed into the Fisher-Yates Shuffle (also known as the Knuth Shuffle), which provides for an unbiased way to guarantee each permutation to be equally likely. Here is some example C# code showing the more correct reordering algorithm:
Random rnd = new Random(); // The number of items left to shuffle (loop invariant). int i = deck.length; while (i > 1) { // Set swapWithPos to a random position, // such that 0 <= swapWithPos < i int swapWithPos = rnd.Next(i); // Decrement i in order to make it the last pertinent index --i; // Swap the value at the "current" // position (i) with value at swapWithPos int tmp = deck[i]; deck[i] = deck[swapWithPos]; deck[swapWithPos] = tmp; }
This algorithm has built into it some method for setting aside a growing set of already shuffled items as it iterates through the array. Perhaps it is this feature that eliminates the bias toward a certain subset of the permutations (I can’t really tell from my readings).
FOLLOWUP (01/05/09): I found this article today on the same subject, but with a lot more graphs and detailed explanations: The Danger of Naïveté