This is great! Continuum Analytics are doing cool stuff for numeric / big data Python and it's great to see the language getting more and more traction in this domain.
Depends on what you call scientific computing, Continuum Analytics appears to have a data/stats focus (I now work as a statistician and people talks good about their products, I never used them), there I believe Python has the possibility to displace R and Octave/SciLab/MatLab as the language for light academic stuff, for CFD (Computational Fluid Dynamics) and CEM (Computational Electrodynamics) guys generally run on C, C++ and Fortran, or whatever runs Numerical Linear Algebra faster, I doubt Python will take this post anytime soon.
> for CFD (Computational Fluid Dynamics) and CEM (Computational Electrodynamics) guys generally run on C, C++ and Fortran, or whatever runs Numerical Linear Algebra faster, I doubt Python will take this post anytime soon.
You are welcome to harbor your doubts. :) There will continue to be a place for scientists who want to write software that takes advantage of the intricacies of low-level memory transfer. (Fundamentally, this is the control that C gives you.)
However, as hardware becomes more heterogenous (Xeon Phi, GPU, SSD/Hybrid drives, Fusion IO, 40gb/100gb interconnects, etc.), it's going to get harder and harder for a programmer to learn the knowledge required to truly optimize data transfer and compute within a single node or in a cluster, and still have any time left to do real science. Newer approaches to software development are needed, to maximize the potential of all this great new hardware, while also minimizing the pain and the knowledge level required of the programmer trying to utilize this hardware.
The key saving grace is that much of these hardware innovations are really geared for data-parallel problems, and it just so happens that parallel data structures are also easy for scientists and analysts to reason about. The goal of Blaze is to extend the conceptual triumphs of FORTRAN, Matlab, APL, etc. and build an efficient programming model around this data parallelism.
If you look at what's already been achieved with NumbaPro (the Python-to-CUDA compiler) in just a few months' work, I think the future is very promising.
The underlying premise for all of Continuum's work is: "Better performance through appropriate high-level abstractions".
I work with Python but mainly with light statistics (numpy, scipy, pandas and some other libs), SAS is used to work with large datasets (mainly because that's the policy of the bank I work).
I do not work with CFD anymore, 5 years ago I left the field and returned to Brazil, CUDA/OpenCL was just a promise back them, it was fast but no one had included the technology in their solvers.
The problem for CFD is that generally some simulations could take weeks to run, using python here would just add some more weeks just because of the overhead that the language have, if time or energy consumption is a problem people would just stay with what they have right now, for small stuff people use whatever they want (and I used Python back then, these days they use OpenFoam), for performance it will be C, Fortran or C++ compiled with the best optimizing compilers for at least a decade more. In these large problems data transportation and storage was a big problem.
Generally people use C++ these days, Fortran is only important in old codebases (although Coarray Fortran was a buzzword just like CUDA when I left the field).
I doubt, that you will find it easy to beat well written numeric Python code with naive C, C++ or Fortran approach.
Python is not inherently slower than C, C++ or Fortran. It is just a language after all. If you have a fast implementation of computational engine, like Numpy, you can reach speeds of C++ in computational tasks. If you have an optimizer that simplifies your math expressions, before running them - you can get better than naive C++ approach. An optimizer and computational engine that optimally offloads work from CPU to GPU can give an order of magnitude advantage over naive C++/blas code.
And to give you a concrete example - there is a nice Python library called Theano, that is doing just than.
>I doubt, that you will find it easy to beat well written numeric Python code with naive C, C++ or Fortran approach.
Huh? People do it all the time. Naive, but not memory-leaking or improper complexity using, C/C++ etc code, beats Python hands town -- and can be more than 10-20 times faster.
>Python is not inherently slower than C, C++ or Fortran. It is just a language after all.
An interpreted language, with a not-that-good interpreter and garbage collector, non primitive integers and other such things holding it back.
>If you have a fast implementation of computational engine, like Numpy, you can reach speeds of C++ in computational tasks.
That's because the "fast implementation" is NOT written in Python.
Let's go down to some example and consider a typical numeric problem, for example estimation of a cross entropy gradient of some function on your data.
With Python (and Theano computational engine) you can write something along the lines:
And then just apply that function to a matrix containing your data. That's it.
When you apply a function, it will be interpreted, converted into a computation graph, this computation graph will be optimized and parts of the computation will be offloaded to GPU with memory transfers between the host and GPU taken care of, and you will get a result in a user friendly and efficient Numpy array.
The resulting computation will be nearly optimal and limited by memory bandwith, CPU - GPU bus bandwidth and GPU FOPS rate. With luck you can get close to theoretical maximum of your GPU floating point performance. And all done in a few lines of Python.
Now consider the same in C++. Yes, it can be done. But there are just no open source libraries available that can do that. Closest open-source implementation that I know of is gpumatrix, a port of C++ Eigen library to GPU. And it doesn't even come close to what is available in Python. So with C++, if you want to match the performance of these few lines of Python code, good luck studying Cuda or OpenCL and implementing the computation engine right, from the first time.
(disclaimer) I'm not in any way affiliated with OP and I actually use (and like) C/C++ a lot.
With vectorization and compiler optimizations* I doubt pure Python, well CPython running pure Python, would match Fortran and some C open source solvers that are the state of art like Lapack for dense matrices and superlu or plastix for sparse ones. It's not much because of the language, but mainly because python has more overhead than C or Fortran.
Maybe as you said python can work well in solving a single linear system using GPU. I worked with this before GPUs came to be used for that, so I can't comment on this one.
* Generally compilers optimize mathematical expressions unless they involve complicated pointer arithmetic, this is why Fortran is still used in this area, Fortran does not have this feature.
In numeric Python packages computations are never done inside the CPython runtime, they're done in specialized kernels written in C, Fortran and sometimes fine-tuned assembly including a sampling of the various BLAS/LAPACK flavors... so yes NumPy can often be competitive with or outperform libraries written in lower-level languages.
So you still link to functions in compiled language code to do the actual leg work and all threading has to be done in C extensions, or am I missing something?
yes you are. NumPy is much more than a wrapper around C libraries - if it were just that, it would exist in most languages, and yet few general programming languages have something like numpy.
The whole point is that it gives you abstractions on top of those libraries, like broadcasting, ufunc, fancy indexing, etc...