Highly efficient and predictable histogramming for GPUs
By Cedric NugterenAlgorithm mapping
Histogramming on a GPU:
Histogramming has been mapped on a GPU prior to this work. Although significant research effort has been spent in optimizing the mapping, we show that the performance and performance predictability of existing methods can still be improved. We present two novel histogramming methods, both achieving a higher performance and predictability than existing methods.
The first novel method (warp private) gives an average performance increase of 33% over existing methods for non-synthetic benchmarks. The second novel method (thread private) gives an average performance increase of 56% over existing methods and guarantees to be fully data independent. While the second method is specifically designed for the Fermi architecture, the first method is also suitable for older architectures.