I think that Python should use multi-processing and/or multi-threading to take advantage of as many opportunities for parallel execution as possible. To this end, I've written a drop-in replacement for map() that runs across as many processes as requested. It should be otherwise identical in every way the built-in version (and if it's not please let me know!).
I also wrote a version based on Parallel Python that is a lot simpler but not quite identical to the original. In particular, it returns a generator instead of a list of values so that program execution doesn't block until the results are fetched.
Drop me a line if you find this interesting or useful or just plain dumb.
Trackback URL for this post:
http://honeypot.net/trackback/38
| Attachment | Size |
|---|---|
| forkmap.py | 4.39 KB |






Optimized forkmap
Your forkmap is pretty neat. I made an optimized forkmap, which runs with less overhead, and I modified the API so that the number of processors is a keyword arg to forkmap.map(..., n=nprocessors), defaulting to the number of processors on the box, because this seemed to be slightly more terse and less confusing than the decorator (to me, at least). Feel free to recycle any of that code into your forkmap. Anyway, neat idea. - Connelly
Thanks!
Very cool, and thanks! By the way, you can use something like this to get the number of CPUs on a BSD system, including OS X:
try:int(os.popen('sysctl 2>/dev/null -n hw.ncpu').read()[:-1])
except ValueError:
return 1
The popen().read() should return an integer + '\n'. If it returns anything else, either sysctl is missing or you're not on a BSD.
Optimization...
I should've noted that my forkmap pre-allocates each part of the work to each of the processors, and doesn't try to actively reschedule work if a few processors finish early; instead those processors just idle. This can be an optimization if the list being mapped is long, as the communication overhead of scheduling which processor should do which work can become significant. But it can also slow things down, when processors are idling in the "endgame." Smarter code could probably use the advantages of both of these methods, by doing communication only in the endgame, and only if worthwhile. - Connelly
Threadmap
And here's a multithreaded map:
http://www.connellybarnes.com/code/python/threadmap
I coded it slightly differently than Andrey Nordin's thread-map ( http://abstracthack.wordpress.com/2007/09/05/multi-threaded-map-for-pyth... ), because I'm calling CPU-bound C programs from Python, so by default, I set the code to map across a number of processors equal to the number of cores.
can't get it working under my environment (Windows Vista + cygwi
Hi Kirk,
thank you for the forkmap.
I can't get it working under my environment (Windows Vista + cygwin + cygwin's python 2.5.1).
Please find a error below:
D:\Temp\d>python forkmap.py
[16, 20, 24, 28]
Traceback (most recent call last):
File "forkmap.py", line 194, in
print map(busybeaver, range(27))
File "forkmap.py", line 137, in map
sendmessage(toparent, (childnum, index, excvalue))
UnboundLocalError: local variable 'index' referenced before assignment
and hangs here :(
In my application the error is different though. It is similar to this:
>>> forkmap.map(lambda x:x*10, [1,2,3,4])
Traceback (most recent call last):
File "", line 1, in
File "forkmap.py", line 81, in map
return __builtins__.map(function, *sequence)
AttributeError: 'dict' object has no attribute 'map'
You need to add @parallelizable() to your function def
like so:
@parallelizable(4)
def descramble(scrambled):
...
I was getting the AttributeError: 'dict' object has no attribute 'map' error as well until I did that.
Also, I think the underlying problem is a bug in the code. Adding "import __builtin__" and changing __builtins__ to __builtin__ made it work correctly when parallelization is not in use.
Might be a Windows thing
The forkmap module uses two Unix-native functions: fork() and pipe(). I haven't used Python on Windows enough to know whether those are implemented there or if they work the same way.
It might also be a Cygwin glitch, because that AttributeError exception seems really odd to me. Have you tried installing the official Python for Windows from http://www.python.org/download/ ?
Multi-threaded map()?
The idea of multi-processing map() is quite nice. And what about multi-threaded one? Threads usually cause less overhead than processes. If a mapping function is quite side-effect free (even if it does some HTTP GETs — they are idempotent), you don't rely on a parallel execution model you've selected. And when it isn't, then such an approach is error-prone. See also my blog entry.
Big locks
From my reply on your blog:
The reason I wrote that using processes and not threads is that Python uses a global lock around object access, so the current implementation might be a bit lacking in performance.
For even better results, consider using something like NetWorkSpaces to farm out requests to machines on your network. I wrote "servers" that accept an image filename and a list of operations to perform on it, pull that image from the fileserver, run the operations, and return the result (as a string) via a NWS variable. Performance improvements scaled linearly with the number of cores running servers on our network. Need it to run faster? Launch a few more instances. I have big dreams of farming out certain processes to the mostly-idle desktop machines sitting throughout the office.
GIL info
You might be interested in learning more about GIL, especially in connection with the recent Guido's post. Here is one of the latest resources on this subject.
Although I see Guido's point...
I understand the problems Guido describes and I could see why it might be (even impossibly) hard to remove GIL from CPython. Still, that doesn't change my mind that GIL is going to massive hamper it on big-SMP machines. Other natively threaded languages can do great things on that hardware, while a threaded Python app has to slog along on a single core.
Here's to hoping that he changes his mind.
Post new comment