Use a different algorithm with the low entropy source for field trials.

The new algorithm maps the original 13-bit low entropy source to a
new 13-bit entropy value using a mapping that is shuffled using the
trial name as a seed.

The algorithm is roughly as follows:
Take the low entropy source as an integer between 0-8191.
Generate an identity mapping of size 8192 where mapping[i] == i.
Seed a Mersenne Twister random number generator with the hash of the field trial name.
Use the seeded random number generator to shuffle the mapping array.
Map the low entropy source using the mapping array, i.e. entropy' = mapping[entropy].
Divide the resulting entropy' by 8192 to produce a double in the range of [0, 1) that
will be used for bucketing in field_trial.cc.

The above algorithm improves uniformity over the existing entropy provider when
the 13-bit entropy source is used while still providing very little overlaps of
buckets between different field trials.

Adds third_party library mt19937ar, an implementation of Mersenne Twister, for
the seeded random number generation. This is needed until C++11 becomes available
for use in Chromium, at which point C++11's <random> could be used.

BUG=143239
TEST=Unit tests. Additionally, verified that the new algorithm produces uniform results
with very little overlap of buckets between different field trials.

Review URL: https://chromiumcodereview.appspot.com/10830318

git-svn-id: svn://svn.chromium.org/chrome/trunk/src@153322 0039d316-1c4b-4281-b951-d872f2087c98
diff --git a/chrome/common/metrics/entropy_provider.cc b/chrome/common/metrics/entropy_provider.cc
new file mode 100644
index 0000000..ac7c0639
--- /dev/null
+++ b/chrome/common/metrics/entropy_provider.cc
@@ -0,0 +1,124 @@
+// Copyright (c) 2012 The Chromium Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+#include "chrome/common/metrics/entropy_provider.h"
+
+#include <algorithm>
+#include <limits>
+#include <vector>
+
+#include "base/logging.h"
+#include "base/rand_util.h"
+#include "base/sha1.h"
+#include "base/sys_byteorder.h"
+
+namespace metrics {
+
+namespace internal {
+
+SeededRandGenerator::SeededRandGenerator(uint32 seed) {
+  mersenne_twister_.init_genrand(seed);
+}
+
+SeededRandGenerator::~SeededRandGenerator() {
+}
+
+uint32 SeededRandGenerator::operator()(uint32 range) {
+  // Based on base::RandGenerator().
+  DCHECK_GT(range, 0u);
+
+  // We must discard random results above this number, as they would
+  // make the random generator non-uniform (consider e.g. if
+  // MAX_UINT64 was 7 and |range| was 5, then a result of 1 would be twice
+  // as likely as a result of 3 or 4).
+  uint32 max_acceptable_value =
+      (std::numeric_limits<uint32>::max() / range) * range - 1;
+
+  uint32 value;
+  do {
+    value = mersenne_twister_.genrand_int32();
+  } while (value > max_acceptable_value);
+
+  return value % range;
+}
+
+uint32 HashName(const std::string& name) {
+  // SHA-1 is designed to produce a uniformly random spread in its output space,
+  // even for nearly-identical inputs.
+  unsigned char sha1_hash[base::kSHA1Length];
+  base::SHA1HashBytes(reinterpret_cast<const unsigned char*>(name.c_str()),
+                      name.size(),
+                      sha1_hash);
+
+  uint32 bits;
+  COMPILE_ASSERT(sizeof(bits) < sizeof(sha1_hash), need_more_data);
+  memcpy(&bits, sha1_hash, sizeof(bits));
+
+  return base::ByteSwapToLE32(bits);
+}
+
+void PermuteMappingUsingTrialName(const std::string& trial_name,
+                                  std::vector<uint16>* mapping) {
+  for (size_t i = 0; i < mapping->size(); ++i)
+    (*mapping)[i] = static_cast<uint16>(i);
+
+  SeededRandGenerator generator(HashName(trial_name));
+  std::random_shuffle(mapping->begin(), mapping->end(), generator);
+}
+
+}  // namespace internal
+
+SHA1EntropyProvider::SHA1EntropyProvider(const std::string& entropy_source)
+    : entropy_source_(entropy_source) {
+}
+
+SHA1EntropyProvider::~SHA1EntropyProvider() {
+}
+
+double SHA1EntropyProvider::GetEntropyForTrial(
+    const std::string& trial_name) const {
+  // Given enough input entropy, SHA-1 will produce a uniformly random spread
+  // in its output space. In this case, the input entropy that is used is the
+  // combination of the original |entropy_source_| and the |trial_name|.
+  //
+  // Note: If |entropy_source_| has very low entropy, such as 13 bits or less,
+  // it has been observed that this method does not result in a uniform
+  // distribution given the same |trial_name|. When using such a low entropy
+  // source, PermutedEntropyProvider should be used instead.
+  std::string input(entropy_source_ + trial_name);
+  unsigned char sha1_hash[base::kSHA1Length];
+  base::SHA1HashBytes(reinterpret_cast<const unsigned char*>(input.c_str()),
+                      input.size(),
+                      sha1_hash);
+
+  uint64 bits;
+  COMPILE_ASSERT(sizeof(bits) < sizeof(sha1_hash), need_more_data);
+  memcpy(&bits, sha1_hash, sizeof(bits));
+  bits = base::ByteSwapToLE64(bits);
+
+  return base::BitsToOpenEndedUnitInterval(bits);
+}
+
+PermutedEntropyProvider::PermutedEntropyProvider(
+    uint16 low_entropy_source,
+    size_t low_entropy_source_max)
+    : low_entropy_source_(low_entropy_source),
+      low_entropy_source_max_(low_entropy_source_max) {
+  DCHECK_LT(low_entropy_source, low_entropy_source_max);
+  DCHECK_LE(low_entropy_source_max, std::numeric_limits<uint16>::max());
+}
+
+PermutedEntropyProvider::~PermutedEntropyProvider() {
+}
+
+double PermutedEntropyProvider::GetEntropyForTrial(
+    const std::string& trial_name) const {
+  std::vector<uint16> mapping(low_entropy_source_max_);
+  internal::PermuteMappingUsingTrialName(trial_name, &mapping);
+
+  return mapping[low_entropy_source_] /
+         static_cast<double>(low_entropy_source_max_);
+}
+
+}  // namespace metrics