MuPC cache is a noncoherent, direct mapped, write back cache. Each UPC thread caches remote references. The total cache size scales with the number of threads: Each thread has THREADS blocks of cache, each block holds cache lines only from the corresponding remote thread, and each block contains CACHE_TABLE_SIZE (default 256) cache lines of length CACHE_LINE_LENGTH (default 1024) bytes. (No thread makes use of its own cache block.) These values must be 0 or a power of 2. CACHE_TABLE_SIZE may range from 0 to 4096. CACHE_LINE_LENGTH may range from 0 to 65536. Setting either value to 0 turns caching off.
- Read miss
A cache line containing the desired data is fetched from the shared memory of the thread with which the data have affinity.
- Write miss
A cache line from the area of shared memory that contains the target location is fetched. The modification is made to the fetched cache line. The modified data are not written back until invalidation.
- Conflict miss
If a newly fetched cache line is to replace an existing line in cache, the replaced cache line is stored into a victim cache (in the hope that it will be reused soon). The replaced cache line is written back if it is dirty, or discarded if it is clean, when it is replaced in the victim cache or at an invalidation.
The cache is invalidated whenever there is a fence, barrier, strict reference, lock, unlock, or a successful lock attempt. At invalidation all dirty cache lines within a thread that are destined for the same thread are combined and then written back. If a race condition occurs because more than one thread writes back a modified value in a cache line to the same remote location, the relaxed memory mode allows any thread to win. Clean cache lines are discarded.
- False sharing
False sharing is avoided by associating an extra bit with every byte in a cache line. Each bit indicates whether the associated byte has been modified. When a cache line is written back, these bits are sent along.
Only modified bytes are written back to main (shared) memory.
The cost of cache invalidation can be high. All cache transactions are implemented using MPI point-to-point communication. Under the extreme situation in which each thread must write back cache lines to all other threads, invalidation is equivalent to an all-to-all exchange.