1. Sequential

For the sequential radix sorting, deem the integers as binary bits may be simpler than decimal digits. It can be done by partitioning the 0s and 1s for every bit, so that each integer in the i^th-bit sorting is relocated based on the counts of 0 and 1. For example, to compute the new position of an integer in the i^th-bit sorting, if the i^th bit is 0, it should be placed after the previous integers with 0 for the i^th bit; otherwise if the i^th bit is 1, it should be placed after all the integers that have 0 for the i^th bit and the previous integers valued 1 in the i^th bit. The peudocode is shown below:

FOR the i^th bit (32 bits in total){

FOR the j^th integer I[j] (N in total) {

count the 1s before I[j] (exclude I[j]): T_before[j];

}

compute the count of 0s: F_total = N – T_before[N-1];

FOR the j^th integer I[j] {

IF the i^th bit is 1: new position p[j] = F_total + T_before[j];

ELSE: new position p[j] = j – T_before[j];

}

}

2. Coalesced

To coalesce the radix sorting in CUDA, every block should take a sequence of integers and sorts them in each bit loop. Within every loop, a global reordering is executed after the blocks finishing the sortings. The peudocode in the host function is shown below:

FOR the i^th bit (32 bits in total){

call the radix sorting kernel function;

device synchronize;

call the reordering kernel function;

device synchronize;

}