64ffef0b0e5ae74b639c68c9ffa4d83ef899c4ba - platform/external/zlib

commit	64ffef0b0e5ae74b639c68c9ffa4d83ef899c4ba	[log] [tgz]
author	Noel Gordon <noel@chromium.org>	Fri Dec 08 11:39:34 2017 +0000
committer	Commit Bot <commit-bot@chromium.org>	Fri Dec 08 11:39:34 2017 +0000
tree	b44c98765bd4c8dbb37900fad4c8978cbe5e1f30
parent	0f473a1d95364a4390d368bd5c1c1457133bc28a [diff]

Improve zlib inflate speed by using SSE2 chunk copy Using SSE2 chunk copies improves the decoding rate of the PNG 140 corpus by an average 17%, giving a total 37% performance increase when combined with SIMD adler32 code (https://crbug.com/772870#c3 for details). Move the arm-specific code back into the main chunk copy code and generalize the SIMD parts of chunkset_core() with inline function calls for ARM, and Intel SSE2 devices. This removes the TODO from arm/chunkcopy_arm.h, and that file can be deleted as a result. Add SSE2 vector load / store SSE helpers for chunkset_core(). The existing NEON load code had alignment issues, as noted in review. Fix that: use unaligned loads in the ARM helper code. Change chunkcopy.h to use __builtin_memcpy if it's available, use zmemcpy otherwise such as on MSVC. Also call x86_check_features() in inflateInit2_() to keep the adler32 SIMD code path enabled. Update BUILD.gn to conditionally compile the SIMD chunk copy code on Intel SSE2 and ARM NEON devices. Update names.h to add the new symbol defined by the inflate chunk copy code path. Code had various comment styles; pick one and use it consistently everywhere. Add inffast_chunk.h TODO(cblume). Bug: 772870 Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342 Reviewed-on: https://chromium-review.googlesource.com/708834 Commit-Queue: Noel Gordon <noel@chromium.org> Reviewed-by: Chris Blume <cblume@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#522764} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: c293a3255eb27dee8879f85f2c45dedff58e2452