RSS feed
IEEE TC Editor's Pick 2016
Posted on 31-12-2016 by Gert-Jan van den Braak Tags: GPU, scratchpad memory, hash functions

On the last day of 2016 we are happy to announce that our paper Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs has been selected as one of the four editor’s picks of the IEEE Transactions of Computers of 2016.

The other three editor’s picks are:
Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms by Heechul Yun,
Gang Yao, Rodolfo Pellizzoni, Marco Caccamo and Lui Sha
Optimised Multiplication Architectures for Accelerating Fully Homomorphic Encryption by Xiaolin Cao, Ciara
Moore, Maire O’Neill, Elizabeth O’Sullivan and Neil Hanley
A New Design of In-Memory File System Based on File Virtual Address Framework by Edwin H.-M. Sha, Xianz-hang Chen, Qingfeng Zhuge, Liang Shi and Weiwen Jiang

IEEE TC July 2016 Spotlight Paper
Posted on 15-6-2016 by Gert-Jan van den Braak Tags: GPU, scratchpad memory, hash functions

IEEE Transactions on Computers (TC) has selected our paper Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs as the July 2016 Spotlight Paper. This month you can download the article for free from the IEEE TC website.

We also made a short video presentation which can be watched on YouTube (English). The video presentation is also available in Spanish and Chinese.

Microserver has landed
Posted on 24-11-2015 by Gert-Jan van den Braak Tags: Bones, microserver

The first microserver from IBM/Astron has arrived at the TU/e. We have started benchmarking the microserver. Next, we plan to update our Bones source-to-source compiler and add the microserver as a target in the coming weeks.

More news on the microserver: IBM and ASTRON provide microserver prototypes to three Dutch partners, Nieuwe microserver van ASTRON kan Noord-Nederland honderden banen opleveren (in Dutch) and Data niet meer naar computer, maar computer naar de data (in Dutch).

test setup: microserver in a box

Bones v1.6.0
Posted on 1-4-2015 by Gert-Jan van den Braak Tags: compiler, skeletons, A-Darwin, Bones

Today we released version 1.6.0 of our A-Darwin and Bones tools. In this release it is now possible to have multiple scops in a single source file. Also some bugs have been fixed, including processing of empty scops and a skeleton argument mismatch. The source code is available on Github, the documentation can be found online or in PDF.

If you have any questions, suggestions or bug reports, feel free to contact us.

About Bones
Bones is a source-to-source compiler based on algorithmic skeletons and a new algorithm classification. The compiler takes C-code annotated with class information as input and generates parallelized target code. Targets include NVIDIA GPUs (through CUDA), AMD GPUs (through OpenCL) and x86 CPUs (through OpenCL and OpenMP). More information on the Bones project page or in the paper Bones: An Automatic Skeleton-Based C-to-CUDA Compiler for GPUs.

Best of EuroPar '14
Posted on 1-9-2014 by Cedric Nugteren Tags: GPU, programming, compiler, conference

This year’s EuroPar was held in Porto, a city on the mouth of the river Douro in the north of Portugal.

The conference started with 2 full days of workshops, including the 12th HeteroPar, the 7th MuCoCoS, and the 7th UCHPC. Some related highlights of the workshops are:

  • “A visual programming model to implement coarse-grained DSP applications on parallel and heterogeneous clusters”. An image-processing language to create flows of kernels.
  • “An Empirical Evaluation of GPGPU Performance Model”. A summary of several existing GPU models, including a couple of test cases.
  • A Study of the Potential of Locality-Aware Thread Scheduling for GPUs. This is my own work on optimising thread scheduling for multi-threaded architectures such as the GPU.
  • “Exploiting Hidden Non-uniformity of Uniform Memory Access on Manycore CPUs”. The point of this work was to demonstrate the non-uniformity effects in the Xeon Phi co-processors. Some effects where not fully understood.

The main program included the following interesting talks:

Oh, and here is a picture of my presentation:

A Detailed GPU Cache Model Based on Reuse Distance Theory
Posted on 28-2-2014 by Gert-Jan van den Braak Tags: GPU, cache model

Last week we presented our paper on a GPU cache model at the 20th IEEE International Symposium On High Performance Computer Architecture in Orlando, Florida. The slides of the presentation are now available. Also the source-code of the cache model is available on GitHub. You can find the full publication at our publications page.

As modern GPUs rely partly on their on-chip memories to counter the imminent off-chip memory wall, the efficient use of their caches has become important for performance and energy. However, optimising cache locality systematically requires insight into and prediction of cache behaviour. On sequential processors, stack distance or reuse distance theory is a well-known means to model cache behaviour. However, it is not straightforward to apply this theory to GPUs, mainly because of the parallel execution model and fine-grained multi-threading. This work extends reuse distance to GPUs by modelling: 1) the GPU’s hierarchy of threads, warps, threadblocks, and sets of active threads, 2) conditional and non-uniform latencies, 3) cache associativity, 4) miss-status holding-registers, and 5) warp divergence. We implement the model in C++ and extend the Ocelot GPU emulator to extract lists of memory addresses. We compare our model with measured cache miss rates for the Parboil and PolyBench/GPU benchmark suites, showing a mean absolute error of 6% and 8% for two cache configurations. We show that our model is faster and even more accurate compared to the GPGPU-Sim simulator.

Download attachment: Pdf
Computing Laws: Origins, Standing, and Impact
Posted on 10-1-2014 by Zhenyu Ye Tags: architecture, computing laws

In the last group meeting, we had a casual discussion about the unreasonably effectiveness of simple laws in computing. It turns out that IEEE Computer Dec. 2013 has a special section on Computing Laws: Origins, Standing, and Impact. It covers several classic laws:

Three Fingered Jack: Productively Addressing Platform Diversity
Posted on 29-11-2013 by Zhenyu Ye Tags: CPU, GPU, FPGA, programming, architecture, compiler, vision, OpenCL, multicore, SIMD, High Level Synthesis

Three Fingered Jack: Productively Addressing Platform Diversity, the PhD thesis of David Sheffield from ParLab. This thesis addresses the issue of implementing computer vision applications (among other applications) on different targets, including multicore processor, data-parallel processor, custom hardware, etc. This work is related to some of our on-going research projects.