crypto: powerpc - Add POWER8 optimised crc32c

Use the vector polynomial multiply-sum instructions in POWER8 to
speed up crc32c.

This is just over 41x faster than the slice-by-8 method that it
replaces. Measurements on a 4.1 GHz POWER8 show it sustaining
52 GiB/sec.

A simple btrfs write performance test:

    dd if=/dev/zero of=/mnt/tmpfile bs=1M count=4096
    sync

is over 3.7x faster.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 62fcbb9..a9377be 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -437,6 +437,17 @@
 	  gain performance compared with software implementation.
 	  Module will be crc32c-intel.
 
+config CRYPT_CRC32C_VPMSUM
+	tristate "CRC32c CRC algorithm (powerpc64)"
+	depends on PPC64
+	select CRYPTO_HASH
+	select CRC32
+	help
+	  CRC32c algorithm implemented using vector polynomial multiply-sum
+	  (vpmsum) instructions, introduced in POWER8. Enable on POWER8
+	  and newer processors for improved performance.
+
+
 config CRYPTO_CRC32C_SPARC64
 	tristate "CRC32c CRC algorithm (SPARC64)"
 	depends on SPARC64