YMMV but for my Python web app executing mostly OO Python 2.5.x code, I got a 10% performance increase by using jemalloc compared to the malloc in RHEL 5. It's as simply as LD_PRELOAD=/path/to/libjemalloc.so -- memory usage is also better (millions of objects allocated than eventually released in a long running process ended up with smaller amount of memory used when using jemalloc).