# Reduction in UPC

Phil Merkey

## Introduction

Given an array of elements and particular operation, say addition, we
often need to compute that result of applying that operation to all
the elements.
In UPC, if one had an array with more elements than threads then one
would first compute the reduction thread wise. So the challenge is
to find a fast way to finish the problem. That is,
we start with each thread having a value and we want to compute
the sum of all those values. We have done this by serializing the
the problem with barriers and locks.

The following code is an example of how one could do it with a tree.

<reducetree.c>=
#include <upc.h>
shared int sum[THREADS];
main ()
{
int m;
sum[MYTHREAD] = MYTHREAD ;
upc_barrier;
upc_forall( m=1; m<THREADS; m=(m<<1); &sum[MYTHREAD] ){
if( !(MYTHREAD & m) && (MYTHREAD+m < THREADS)) {
sum[MYTHREAD] += sum[MYTHREAD+m];
}
upc_barrier;
}
if( MYTHREAD == 0 ) {
printf ("OUT: %d %d %d\n",loop, MYTHREAD, THREADS, sum[MYTHREAD] );
}
}