I cannot shake the feeling that this is just yet another iteration of the usual ...

zalman · on April 7, 2014

This comment completely misses the fundamental innovation in Halide, which is a separation of "algorithm," a technical term denoting the part of the program that defines the computation, and "schedule," the part of the program which defines the mapping to hardware resources.

The end result is not like OpenCL, CUDA/PTX, or RenderScript. (All of which we regard as suitable target languages for Halide. PTX (CUDA's low-level assembly language) and OpenCL are currently supported.) As a concrete example, there is no explicit memory allocation in Halide and loops are often implicit. The overall character of Halide is that of a functional programming language.

As another example, segmentation violations due to errors in Halide code are impossible, barring bugs in the compiler implementation or errors in user provided C++ code, which Halide has no control over. Some set of bounds errors are reported at compile time and the rest turn into (efficient) assertion checks at runtime.

Schedules do get pretty low-level and writing them is difficult. Algorithms are generally a fair bit easier to write and while there is plenty to be done in improving the language, it is already a step up in productivity over e.g. OpenCL or even straight C++ for a lot of tasks within the domain of data parallel processing.

linhat · on April 7, 2014

  I'm waiting for the first system that is equivalent of C++.

Not gonna happen anytime soon. C++ is much more than just a programming language, it is an entire "ecosystem". You have toolchains, libraries, drivers, APIs, existing software stacks, all written in C++ and able to interface with C/C++ directly, i.e. no translation layer needed. You're not just gonna replace that with a new language.

Sure, Halide may be seen as just some syntactic sugar (much like quite a few bits and pieces of C++11), but it actually provides you with a different level of abstraction than say OpenCl, MPI or OpenMP where you are very specific about e.g. level of concurrency (which heavily impacts the design of your algorithm) while Halide tries to almost completely separate algorithm and scheduling.

sharpneli · on April 7, 2014

I maybe wrote it badly. Skrebbel said it a lot better. It's about providing something that's on truly higher level.

It's nice that Halide attempts to separate the algorithm and scheduling, however as of now it's also possible in OpenCL too. The implementations are just so bad at it that it's not really useful.

skrebbel · on April 7, 2014

> I'm waiting for the first system that is equivalent of C++. Something that really brings new concepts to the table instead of just a different syntax.

Obligatory HN reaction: I'd wait it out for the first Lisp to appear.

iamsalman · on April 7, 2014

I haven't yet given it a spin but it won't be fair to call it just syntactic sugar. The value it provides, besides a little easier to write syntax is abstracting kernel level code. You don't need to specify block/grid size, move memories around CPU/GPU but I doubt if it can match the performance of hand-written OpenCL/CUDA kernels when it becomes to bleeding-edge Imaging performance.

sharpneli · on April 7, 2014

You don't also need to specify the block size in OpenCL either. It's just extremely useful to do that.

frozenport · on April 7, 2014

Did you watch the video? Did you read the article?

This is about optimizing memory locality and parralelism. Its not about getting access to the underlying hardware such as Microsoft C++AMP, NV Cuda, OpenCL.

sharpneli · on April 7, 2014

I did watch the video. It is a neat idea. However it is something that is already made explicitly possible by OpenCL.

If you write a kernel which doesn't use local memory nor doesn't use the local_id it produces a kernel that is effectively a Halide pipeline stage. The points can be evaluated in arbitrary order (spec says everything is implementation defined).

If we look at the blur example on the video the OpenCL implementation is also free to effectively merge the stages like in Halide. It's because the spec only defines that whatever the previous kernel invocation has written must be visible for the next kernel. Nothing more and nothing less.

Sure OpenCL allows you to fiddle around with low level details, but it also allows you to write completely platform neutral code that is then the responsibility of the platform to actually optimize.

I do agree that Halide allows you to easily explore the different scheduling options, that's something OpenCL is not capable of. In OpenCL you either do it manually or leave it as totally defined by the implementation.

zalman · on April 7, 2014

"OpenCL implementation is also free to effectively merge the stages like in Halide."

I gave up relying on magic compilers a long time ago. And having worked in this domain for a long time I'm actually offended by people who write off the problem as simply a matter of a good enough optimizer. This has significantly held back both performance and portability in imaging. (And likely other areas.)

Halide is not magic, it is just a better slicing of the problem backed up by a good implementation. As always there is no free lunch but when it comes down to actually shipping this kind of code across a wide variety of platforms with great performance, it is a lot more productive than anything else out there.

sharpneli · on April 7, 2014

I do agree that relying on compiler optimizations is a waste of time. They never seem to appear. I simply wished to point out that you can write Halide like code also in OpenCL. And I'd love to see an implementation which would attempt similar style of optimizations what Halide allows.

Halide is extremely domain specific, which is a good thing, it allows them to focus on the problem at hand, namely on how to easily write image processing filters that can be made performant with relative ease. However I would not wish to write a bitonic sort or anything like that in Halide.

zalman · on April 7, 2014

Andrew Adams did write bitonic sort: https://github.com/halide/Halide/blob/master/test/performanc...

As I wrote in another comment, the domain of problems for which Halide works is broader than imaging. I usually present it as "data parallel problems." In fact, I'd say the difference in domain between what Halide is good at and what OpenCL and CUDA are good at is not that significant in practice because those languages are basically C/C++ outside of kernel parallelism. (They are each adding some task parallelism facilities as well.)