The proceeding of ASPLOS 2011 is available.
Data structures in the multicore age, by Nir Shavit, is an interesting article in Communications of the ACM 2011. Optimizing even a simple data structure, e.g. a stack, is shown to require mind-blowing endeavor. The stack example is comparable to optimizing the histogramming on GPU. Although algorithms and data structures are highly coupled, the author’s emphasis on data structures seems to motivate the data structure driven exploration for parallel algorithm.
The proceeding of FPGA 2011 was released. There is a pre-conference workshop on The Role of FPGAs in a Converged Future with Heterogeneous Programmable Processors, where ALTERA describes its OpenCL initiatives.
The PIPS framework performs source-to-source transformation. The input can be C or Fortran. The output can be OpenMP, SSE, and CUDA (with limited optimization). The team is working on OpenCL output support and improving quality of the generated CUDA code. An overview of the framework is well described in the PIPS tutorial in PPoPP 2010.
The Jan-Feb 2011 issue of IEEE Micro has selected 11 best papers published in top computer architecture conferences (5 from ISCA, 3 from Micro, 2 from ASPLOS, 1 from HPCA) in 2010. In the introduction of this special issue, Yale Patt and Onur Mutlu summarise a few observations regarding future conference reviewing. The number one observation is more focus on insights over quantitative results.
ISCA 2010 website posts additional slides of keynotes and oral presentations. There is a motivating keynote on the rebirth of neural networks.
NVIDIA just announced the release of the first release candidate of CUDA 4.0 to registered developers next Friday (March 4th). The main improvement of this new CUDA version is better multi-GPU support. With the new release multiple GPUs can be controlled from a single thread, and multiple threads can work on the same GPU. This will make it a lot easier to create multi-GPU programs. Luckily we already have a multi-GPU setup, I can hardly wait to test this new features.
Another nice feature is unified virtual addressing, which puts all CPU and GPU data in the same address space. Also GPU-to-GPU memory copies are now supported, which will also help in making multi-GPU applications.
Update: NVIDIA placed a presentation on their developers website: CUDA Toolkit 4.0 overview
The mv5sim simulator is an extended version of the m5sim. The mv5sim, developed in-house, has been used for several many-core related papers (listed at the bottom of its tutorial website). From the past experience, m5sim has a steep learning curve, compared to gpgpu-sim.