Randomly Reordering an Array - Beware of Bias

Comments

So I came upon the following blog post and found that I (just as the author has done) have been using a naive algorithm for randomly reordering arrays (mostly notably used when shuffling a deck of cards):

http://scottonwriting.net/sowblog/posts/13356.aspx

I agree I have seen this simple iterate-and-swap reordering algorithm all over the place, even in my education at college.  I can think of a few applications that I am using it in right now.  Here is some example C# code demonstrating the faulty algorithm:

Random rnd = new Random();

for (int i = 0; i < deck.Length; i++)
{
   // Set swapWithPos to a random position,
   //    such that 0 <= swapWithPos < deck.Length
   int swapWithPos = rnd.Next(deck.Length);

   // Swap the value at the "current"
   //    position (i) with value at swapWithPos
   int tmp = deck[i];
   deck[i] = deck[swapWithPos];
   deck[swapWithPos] = tmp;
}

Apparently, there is some bias introduced by this algorithm while reordering the array.  For example, with a 3 item array of values [0, 1, 2], there are 27 outcomes, but only 6 unique permutations.  Half of the permutations ([0, 2, 1], [1, 0, 2], and [1, 2, 0]) each appear 5 times in the 27 outcomes, while the other half of the permutations ([0, 1, 2], [2, 0, 1], and [2, 1, 0]) each appear only 4 times each.  There is a slight bias toward half of the permutations in this example.

With a few modifications, this algorithm can be morphed into the Fisher-Yates Shuffle (also known as the Knuth Shuffle), which provides for an unbiased way to guarantee each permutation to be equally likely.  Here is some example C# code showing the more correct reordering algorithm:

Random rnd = new Random();

// The number of items left to shuffle (loop invariant).
int i = deck.length;

while (i > 1)
{
    // Set swapWithPos to a random position,
    //    such that 0 <= swapWithPos < i
    int swapWithPos = rnd.Next(i);

    // Decrement i in order to make it the last pertinent index
    --i;

    // Swap the value at the "current"
    //    position (i) with value at swapWithPos
    int tmp = deck[i];
    deck[i] = deck[swapWithPos];
    deck[swapWithPos] = tmp;
}

This algorithm has built into it some method for setting aside a growing set of already shuffled items as it iterates through the array.  Perhaps it is this feature that eliminates the bias toward a certain subset of the permutations (I can’t really tell from my readings).

FOLLOWUP (01/05/09):  I found this article today on the same subject, but with a lot more graphs and detailed explanations:  The Danger of Naïveté