Proposed by Bouyack on 10/13/2007
Suppose you want to sort N integers using P processors. The best you could hope for would be Nlog(N)/P for your running time. In reality this isn't quite possible. The question has 3 parts.
P == N?P == N^2?N) for which you can achieve constant time sorting?
Each processor has two additional registers that contain an identifying number for that processor and the total number of processors. The PIDs
start at 0 for the first processor up to P-1 for the Pth processor. Or if you prefer to think in C rather than assembly, you can think of them as two global constants: PID and NUM_PROC or something like that.
There are no semaphores or anything like that but there is an additional assembly instruction, sync, that will cause the processor to wait until
all processors have called sync. Once again if you prefer, you can think of it as a function call in C. I don't think this setup changes the
running times at all from using semaphores, etc. but makes the problem a little simpler.
If more than one processor reads from the same location at the same time they all get the correct value. If one processor reads a value at the same time as another is writing to that location the write is guaranteed to succeed but the result of the read is undefined. If multiple processors write to the same location at the same time the result is undefined unless all the processors are trying to write the same value, then the write succeeds.
For example, here's code for finding the average value of an array of floats in log(N) where P == N/2.
float find_average(float* array, unsigned int size) { float* array_copy; unsigned int a = 1; float ret; if(PID == 0) array_copy = (int*)malloc(size*sizeof(int)); sync(); array_copy[2*PID] = array[2*PID]; array_copy[2*PID+1] = array[2*PID+1]; while(a < size) { if((2*PID % a) || (2*PID + a >= size)) { sync(); continue; } array_copy[2*PID] += array_copy[2*PID+a]; if(PID == 0) a*=2; sync(); } if(PID == 0) { ret = array_copy[0]/size; free((void*)array_copy); } return ret; }
All local variables are shared. In my previous example there is only one instance of a, ret, and array_copy so make sure you don't have
different processors writing to the same local variable at the same time. It seems that there can only be one function stack so you can't have processors running different functions at the same time unless they have no local variables. I suppose you could, but you would have to make
sure that the functions return in the reverse order as they were called otherwise your stack would get pwnd. Along that same line, you can assume for-loops are compiled to behave nicely in a multiprocessor environment. That is, you don't have to write
if(PID == 0) i=0; sync(); for(;i</*whatever*/;) { //stuff if(PID == 0) i++; sync(); }
The compiler will make sure i is only initialized once and only incremented once per loop. Also, assume standard functions like malloc
have been recompiled to work correctly.
You DO need to worry about things like the line where I say a*=2; in the previous example. If you don't check to make sure that only one
processor executes that line you will end up with a*=pow(2,P);
If this is too confusing or restrictive feel free to design your own "compiler."