firewire: ohci: Asynchronous Reception rewrite

Move the AR DMA descriptors out of the buffer pages, and map the buffer
pages linearly into the kernel's address space.  This allows the driver
to ignore any page boundaries in the DMA data and thus to avoid any
copying around of packet payloads.

This fixes the bug where S800 packets that are so big (> 4080 bytes)
that they can be split over three pages were not handled correctly.

Due to the changed algorithm, we can now use arbitrarily many buffer
pages, which improves performance because the controller can more easily
unload its DMA FIFO.

Furthermore, using streaming DMA mappings should improve perfomance on
architectures where coherent DMA mappings are not cacheable.  Even on
other architectures, the caching behaviour should be improved slightly
because the CPU no longer writes to the buffer pages.

v2: Detect the last filled buffer page by searching the descriptor's
    residual count value fields in order (like in the old code), instead
    of going backwards through the transfer status fields; it looks as
    if some controllers do not set the latter correctly.

v3: Fix an old resume bug that would now make the handler run into
    a BUG_ON, and replace that check with more useful error handling.
    Increase the buffer size for better performance with non-TI chips.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>

Maxim Levitsky writes:
    Works almost perfectly.  I can still see RCODE_BUSY errors
    sometimes, not very often though.  64K here eliminates these errors
    completely.  This is most likely due to nouveau drivers and lowest
    perf level I use to lower card temperature.  That increases
    latencies too much I think.  Besides that the IO is just perfect.

Tested-by: Maxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2 files changed