Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you are seriously worried about the performance implications of a binary tree vs. the Python hashes, you are working in the wrong language. Whatever performance implications you are worried about are utterly dwarfed by the Python penalty.

I am not being sarcastic or cynical. There really isn't much point in really nailing some of these things to the wall in Python or Perl or Ruby or the other interpreted languages hanging out at the bottom of the shootout[1] because you've already thrown performance out the window as a consideration just by cracking open your editor and starting a new file. This is an argument to take up with languages much higher up the shootout.

[1]: http://shootout.alioth.debian.org/ , if you don't know what I mean by that. The benchmarks there may not see all and know all, but they aren't meaningless, either.



I really hate arguments along these lines. Python programs might not be be as fast in the average case as, say, a C program, but that doesn't mean you don't care about performance at all. Yes, I will take a several-microsecond slowdown for the convenience of using Python over C. That's a fair tradeoff, as it really isn't noticeable from my perspective — it's the difference between, like, 10% CPU utilization and 1% utilization, and still done in the blink of an eye. But the difference between 56 seconds to process some data and 0.8 seconds? That matters.


I agree. But a binary tree vs. a hash table isn't going to be that difference. That's my point. Python won't excuse an O(n^3) when you should have been using an O(log n) algorithm, but the constant factors of Python are so huge they dominate a lot of the choices you'd make between two different O(log n) algorithms. You'd never notice the difference between the existing built-in Python dict and a hypothetical built-in binary tree, it would just be buried by some of the fundamentally-cache-hostile stuff Python does as it is bouncing all around memory with all the indirection it has on every line that has nothing to do with either structure.


> But the difference between 56 seconds to process some data and 0.8 seconds? That matters.

This must be a purely theoretical argument for you, because you wouldn't be arguing this point if you were consistently analyzing data in Python and C/C++.

I routinely analyze gigabytes of aggregated data in both Python and C++, and have occasionally rewritten the Python in C++ to be faster (as an aside, for basic processing it's surprising how simple the translation is). I routinely get between 10x and 100x improvement in computation speed, and often see similar reductions in memory used.

Python is not slower than C/C++ by a few microseconds; it's slower for most operations by two or more orders of magnitude. Whether you have a dictionary implemented by a hash table or by a balanced binary tree is not going to have any significant impact on your analysis speed.


It's not purely theoretical — it's just not universally applicable. Even Ruby is fast enough to do most things faster than I can perceive, but I know from practical experience that there are fast and slow ways to do those things even in a "slow" language. I've tweaked Ruby algorithms before to make programs go from "OMG Ruby is so slow!" to perceptually the same as a C program (though a few tenths of a second slower in absolute terms).

What you're arguing here is that because there are some cases where pure Python is simply not fast enough, there is no point in thinking about performance at all in a Python program. It's a false dichotomy. There's a wide range of performance options between "just forget about performance altogether" and "rewrite the entire application in C++."


> What you're arguing here is that because there are some cases where pure Python is simply not fast enough, there is no point in thinking about performance at all in a Python program. It's a false dichotomy.

No, it's a very real dichotomy. If your complexities are right (e.g., you're not using a quadratic algorithm where you could be using a linearithmic algorithm) and Python still isn't fast enough for your task, switching from a hash table to a binary tree is not going to make it fast enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: