Everyday Collective Communication

What's New

Overview

Publications

MPI Resources

LAM Team

Other Implementations

Documentation

Tutorials

NCSA tutorial

ND tutorial

Getting started with LAM

Quick: LAM quick start

Quick: EZ MPI start

Quick: MPI datatypes

Quick: MPI collectives

MPI-1.2 bindings

FAQ

Getting Help

LAM Tools

MPI Applications

Supported Versions

Download

Installation Process

Test Suite

Beta Releases

Source Code Access

Mailing Lists

Performance Data

Mirrors

Contact

License

Collective communication means all processes within a communicator call the same routine. Portable applications should assume that collective routines include a global synchronization.

The following simple code fragment employs four basic collective routines to manipulate a statically partitioned regular domain (one-dimensional in this case). The full domain length is broadcast from a root process to all others. The initial dataset is distributed (scattered) among the processes. After each compute step, a global maximum is determined for use by the root. The root then gathers the final dataset.

#include <mpi.h>

{
    int		i;
    int		myrank;
    int		size;
    int		root;
    int		full_domain_length;
    int		sub_domain_length;
    double	global_max;
    double	local_max;
    double	*full_domain;
    double	*sub_domain;

    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);

    root = 0;
/*
 * Root obtains full domain and broadcasts its length.
 */
    if (myrank == root) {
	get_full_domain(&full_domain, &full_domain_length);
    }

    MPI_Bcast(&full_domain_length, 1, MPI_INT, root, MPI_COMM_WORLD);
/*
 * Allocate subdomain memory.
 * Scatter the initial dataset among the processes.
 */
    sub_domain_length = full_domain_length / size;
    sub_domain = (double *) malloc(sub_domain_length * sizeof(double));

    MPI_Scatter(full_domain, sub_domain_length, MPI_DOUBLE,
   	    sub_domain, sub_domain_length, MPI_DOUBLE,
    	    root, MPI_COMM_WORLD);
/*
 * Loop computing and determining max values.
 */
    for (i = 0; i < NSTEPS; ++i) {
    	compute(sub_domain, sub_domain_length, &local_max);

    	MPI_Reduce(&local_max, &global_max, 1, MPI_DOUBLE,
		MPI_MAX, root, MPI_COMM_WORLD);
    }
/*
 * Gather final dataset.
 */
    MPI_Gather(sub_domain, sub_domain_length, MPI_DOUBLE,
	    full_domain, sub_domain_length, MPI_DOUBLE,
	    root, MPI_COMM_WORLD);
}

Broadcast

    MPI_Bcast(void *buffer, int count, MPI_Datatype datatype,
	    int root, MPI_Comm comm);

All processes use the same count, datatype, root, and communicator. Before the operation, the root buffer contains a message. After the operation, all buffers contain the message from the root process.

Scatter

    MPI_Scatter(void *sndbuf, int sndcnt, MPI_Datatype sndtype,
	    void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype,
	    int root, MPI_Comm comm);

All processes use the same send and receive counts, datatypes, root and communicator. Before the operation, the root send buffer contains a message of length `sndcnt * N', where N is the number of processes. After the operation, the message is divided equally and dispersed to all processes (including the root) following rank order.

Reduce

    MPI_Reduce(void *sndbuf, void *rcvbuf, int count,
	    MPI_Datatype datatype, MPI_Op op,
	    int root, MPI_Comm comm);

All processes use the same count, datatype, reduction operation, root and communicator. After the operation, the root process has in its receive buffer the result of the pair-wise reduction of the send buffers of all processes, including its own. MPI predefines reduction operations, including: MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR.

Gather

    MPI_Gather(void *sndbuf, int sndcnt, MPI_Datatype sndtype,
	    void *rcvbuf, int rcvcnt, MPI_Datatype rcvtype,
	    int root, MPI_Comm comm);

All processes use the same send and receive counts, datatypes, root and communicator. This routine is the reverse of MPI_Scatter(): after the operation the root process has in its receive buffer the concatenation of the send buffers of all processes (including its own), with a total message length of `rcvcnt * N', where N is the number of processes. The message is gathered following rank order.