Use a huffman trie for top domain storage in url formatter

UrlFormatter uses ICU's spoof checker to determine lookalike domains that contain unicode confusables. It does this by extracting a skeleton string from the given domain representing its visual appearance. For example, google.com and googlé[.]com have the same skeleton string (google.corn).

In addition to this, we want to display a "Did you mean to go to..." UI for navigations involving IDN if the domain name matches a top 10K domain. In order to do that, we need to store the domains associated with ICU skeletons.

UrlFormatter currently uses a DAFSA to store the list of the skeletons of the top 10K domains. It doesn't and cannot store the actual domain in this list. To support this, this CL changes the underlying storage from DAFSA to the Huffman Trie used by net's preload list code.

It
- Generates the huffman trie from top domain list during compile time.
- Decodes the huffman trie during runtime in IDNSpoofChecker::SimilarToTopDomains.

The design doc for the preload list migration is here: https://docs.google.com/document/d/11rqIozUDaK6DvNeu436SL3Coj65J5vhD9HVftOi-RrA/edit

As mentioned in the doc, micro benchmarks indicate that the binary size and speed is minimally impacted by this change (51KB additional size, 4 microseconds of additional time for each lookup).

Bug: 843361
Change-Id: If98b8161bf836fec6ba74e68587bd2159f4eb3d5
Reviewed-on: https://chromium-review.googlesource.com/1106539
Commit-Queue: Mustafa Emre Acer <[email protected]>
Reviewed-by: Nick Harper <[email protected]>
Reviewed-by: Peter Kasting <[email protected]>
Cr-Commit-Position: refs/heads/master@{#572055}
21 files changed