Add support for additional languages in the typo detector

This checkin adds typo databases for six additional languages, as well
as several fixes to the infrastructure.

First, it now supports "globbing", since for example the German typo
database contains glob patterns of the form
"asymetrisch*->asymmetrisch*".

Second, it supports multiword typos (such as "all zu->allzu") which
caused some complications (since this means that the typo detector
can match beyond the word boundary it was passed in).

Third, it adds a bunch of validation code to the type dictionaries,
which uncovered a bunch of inconsistencies (duplicate entries, using
"-" instead of "->" for some separators, etc). There's now a unit test
which produces a cleaned up version of each dictionary file, as well
as tests to ensure that the ASCII and the UTF-8 comparison methods are
in sync (and this uncovered some bugs which were fixed.)

Finally, it fixes the HTML reporter such that it properly handles
UTF-8.

Change-Id: Ie32cbbe489687a7b50184696a027f87c2e21c409
20 files changed