====== Sorting N integers on P processors ====== //Proposed by Bouyack on 10/13/2007// Suppose you want to sort ''N'' integers using ''P'' processors. The best you could hope for would be ''Nlog(N)/P'' for your running time. In reality this isn't quite possible. The question has 3 parts. - What is the best running time you can get with ''P == N''? - What is the best running you can get with ''P == N^2''? - Is there a number of processors (in terms of ''N'') for which you can achieve constant time sorting? Each processor has two additional registers that contain an identifying number for that processor and the total number of processors. The PIDs start at ''0'' for the first processor up to ''P-1'' for the ''Pth'' processor. Or if you prefer to think in C rather than assembly, you can think of them as two global constants: ''PID'' and ''NUM_PROC'' or something like that. There are no semaphores or anything like that but there is an additional assembly instruction, ''sync'', that will cause the processor to wait until all processors have called ''sync''. Once again if you prefer, you can think of it as a function call in C. I don't think this setup changes the running times at all from using semaphores, etc. but makes the problem a little simpler. If more than one processor reads from the same location at the same time they all get the correct value. If one processor reads a value at the same time as another is writing to that location the write is guaranteed to succeed but the result of the read is undefined. If multiple processors write to the same location at the same time the result is undefined unless all the processors are trying to write the same value, then the write succeeds. For example, here's code for finding the average value of an array of floats in ''log(N)'' where ''P == N/2''. float find_average(float* array, unsigned int size) { float* array_copy; unsigned int a = 1; float ret; if(PID == 0) array_copy = (int*)malloc(size*sizeof(int)); sync(); array_copy[2*PID] = array[2*PID]; array_copy[2*PID+1] = array[2*PID+1]; while(a < size) { if((2*PID % a) || (2*PID + a >= size)) { sync(); continue; } array_copy[2*PID] += array_copy[2*PID+a]; if(PID == 0) a*=2; sync(); } if(PID == 0) { ret = array_copy[0]/size; free((void*)array_copy); } return ret; } All local variables are shared. In my previous example there is only one instance of ''a'', ''ret'', and ''array_copy'' so make sure you don't have different processors writing to the same local variable at the same time. It seems that there can only be one function stack so you can't have processors running different functions at the same time unless they have no local variables. I suppose you could, but you would have to make sure that the functions return in the reverse order as they were called otherwise your stack would get pwnd. Along that same line, you can assume ''for''-loops are compiled to behave nicely in a multiprocessor environment. That is, you don't have to write if(PID == 0) i=0; sync(); for(;i The compiler will make sure ''i'' is only initialized once and only incremented once per loop. Also, assume standard functions like ''malloc'' have been recompiled to work correctly. You DO need to worry about things like the line where I say ''a*=2;'' in the previous example. If you don't check to make sure that only one processor executes that line you will end up with ''a*=pow(2,P);'' If this is too confusing or restrictive feel free to design your own "compiler."