The limitations of the standard collective relocalization operations justify the need for extended versions which provide more flexibility to the programmer. The MPI library provides a set of collective operations as well as their variants. The proposed extensions for UPC closely follow MPI's approach. However, there are subtle differences in the programming models, which naturally translates to syntax differences. It is important to note that MPI does not provide anything analogous to UPC's permute operation; nor is synchronization as big of an issue with collective operations in MPI as it is in UPC. For each standard UPC collective function, there can be two variants. In the "vector" variant, each block of data can be of different size; and it allows the function to pick distinct non-contiguous data-blocks. The second variant is a further generalization of the first variant. In this case, the programmer may explicitly specify each data-block and their size.

Notes on implementation

The first version of this implementation was written as part of the last paper on the list below and extended to include the MYSYNC synchronization mode for the first paper below.

Please look over the Makefile to make sure in contains the correct paths to the compiler before compiling. Run 'make try_all' to compile and execute a test program to make sure everything was compiled properly.



Implementing UPC's MYSYNC Synchronization Mode Using Pairwise Synchronization of Threads

Technical Report 05-07, Michigan Technological University, Department of Computer Science (2005)
Author(s): P. Dhamne and S. Seidel
( pdf | ps )

High Performance Unified Parallel C (UPC) Collectives for Linux/Myrinet Platforms

Technical Report 04-05, Michigan Technological University, Department of Computer Science (August, 2004)
Author(s): A. Mishra and S. Seidel
Abstract: Unified Parallel C (UPC) is a partitioned shared memory parallel programming language that is being developed by a consortium of academia, industry and government. UPC is an extension of ANSI C. In this project we implemented a high performance UPC collective communications library of functions to perform data relocalization in UPC programs for Linux/Myrinet clusters. Myrinet is a low latency, high bandwidth local area network. The library was written using Myrinet's low level communication layer called GM. We implemented the broadcast, scatter, gather, gather all, exchange and permute collective functions as defined in the UPC collectives specification document. The performance of these functions was compared to a UPC-level reference implementation of the collectives library. We also designed and implemented a micro-benchmarking application to measure and compare the performance of collective functions. The collective functions implemented in GM usually ran two to three times faster than the reference implementation over a wide range of message lengths and numbers of threads.
( pdf | ps )
Last modified 12/8/4