Abstract:
An embodiment of the present invention provides a method and apparatus for sorting very large data sets using a parallel merge sort. Given sorted work files S ,. . . , S , produced by P processes, the described embodiment of the method effectively implements a parallel merge onto respective output partitions O ,. . . , O of the processes P. Because each of these output partitions O has a finite size, the invention must quickly determine âsplitting keysâ for each output partition O in such a way that the data in the work files will be split between the multiple output partitions O without overrunning the size of any of the partitions O. Once the splitting keys for each partition are determined, the processes exchange data so that the output partitions of each process contains data between the splitting keys associated with that output partition.