Reduction in UPC

Phil Merkey

Introduction

Given an array of elements and particular operation, say addition, we often need to compute that result of applying that operation to all the elements. In UPC, if one had an array with more elements than threads then one would first compute the reduction thread wise. So the challenge is to find a fast way to finish the problem. That is, we start with each thread having a value and we want to compute the sum of all those values. We have done this by serializing the the problem with barriers and locks.

The following code is an example of how one could do it with a tree.

<reducetree.c>=
#include <upc.h>

shared int sum[THREADS];

main ()
{
  int m;

  sum[MYTHREAD] = MYTHREAD ;
  upc_barrier;

  upc_forall( m=1; m<THREADS; m=(m<<1); &sum[MYTHREAD] ){
    if( !(MYTHREAD & m) && (MYTHREAD+m < THREADS)) {
       sum[MYTHREAD] += sum[MYTHREAD+m]; 
    }
    upc_barrier;
  }
  if( MYTHREAD == 0 ) {
    printf ("OUT: %d %d %d\n",loop, MYTHREAD, THREADS, sum[MYTHREAD] );
  }
}