Report ID
1994-25
Report Authors
Klaus E. Schauser and Chris J. Scheiman
Report Date
Abstract
Active messages provide a low latency communication architecture which onmodern parallel machines achieves more than an order of magnitude performanceimprovement over more traditional communication libraries. It is used bylibrary and compiler writers to obtain the utmost performance and has been usedto implement the novel parallel language Split-C. This paper discusses theexperience we gained while implementing active messages on the Meiko CS-2, anddiscusses implementations for similar architectures. The CS-2 is aninteresting experimental platform, as it resembles a cluster of Sparcworkstations, each equipped with a dedicated communication co-processor.During our work we have identified two mismatches between the requirements ofactive message and the Meiko CS-2 architecture. First, architectures whichonly support efficient remote write operations (or DMA transfers as in the caseof the CS-2) make it difficult to transfer both data and control as required byactive messages. Traditional network interfaces avoid this problem becausethey have a single point of entry which essentially acts as a queue. Toefficiently support active messages on modern network communicationco-processors, hardware primitives are required which support this queuebehavior. We overcame this problem by producing specialized code which runs onthe communications co-processor and supports the active messages protocol. Wealso identify hardware primitives which are required to efficiently supportactive messages. The second mismatch is that active messages do not provide anon-blocking form of send, which is required to achieve the highest possiblebandwidth while allowing the overlap of communication and computation when acommunications co-processor is present. We propose to extend the currentactive message definition to include a non-blocking form of send. Ourimplementation of active messages results in a one-way latency of 12.3 us andachieves up to 39 MB/s for bulk transfers. Both numbers are close to optimalfor the current Meiko hardware and are competitive with performance of activemessages on other hardware platforms.
Document
1994-25.ps126.19 KB