Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CTEs are defined as being optimisation boundaries and databases are not allowed to optimise across CTEs.

Wait, no. I know for a fact that SQL Server will happily optimize across CTEs. They are effectively syntactic sugar, and handled much like a view or derived table.

So, either the SQL standard defines this optimization boundary and SQL Server ignores it, or this is an idiosyncrasy of the Postgres implementation, or this is an idiosyncrasy of some other implementation that you have assumed applies to all implementations.

I honestly don't know. I wouldn't mind a syntactic construct that did define optimization boundaries, but to my mind CTEs are not that construct.



Unfortunately I haven't read the SQL standard(s) and I also seem to have trouble finding them in full on the internet.

However I was told on IRC in #postgres that not optimising across CTEs was something mandated by the standard.

Of course I might have been told something that's not quite correct or I might have misunderstood.

However in the context of postgres (which is what the article is talking about), I know for a fact that CTEs are first fully resolved. This is what I'm seeing in my query plans and it's what the manual says: http://www.postgresql.org/docs/9.3/static/queries-with.html (at the end of 7.8.1)

If I'm wrong what the standard is concerned, then I apologise.


Yes, as far as the standard is concerned you are wrong. See http://dba.stackexchange.com/questions/27425/is-the-optimisa.... But it is a very common misconception you can hardly be faulted for because it is the way postgres does it and many people claims it is due to the sql standard.

Of the only annoying features in postgres, imho. You have a large query and you try to break it up into smaller self-contained cte parts and voila, performance goes crap.


As per discussons on the email lists, PostgreSQL does it this way because this is the safest way to adhere to the standard. Presumably over time cross-CTE optimizations will be added. The problem though is guaranteeing the stability the standard requires without doing so. That's a technical issue, not a standards issue though and you are quite right for pointing that out.


Per the PostgreSQL email list discussion on the subject:

1. The standards define CTE's as being stable within a statement. I.e. they don't change as the query progresses.

2. The way this is implemented in PostgreSQL currently is to have CTE's as optimization fences.

Presumably some cross-CTE optimization may be added in the future but in usual cases, will be slowly added with a lot of care.


"Of course I might have been told something that's not quite correct or I might have misunderstood."

I think it was a partial truth. In many cases, you can't optimize because you need to produce the same results as if you had evaluated the CTE in full exactly once.

CTEs are also a convenient place to have an optimization fence, which are sometimes useful. Yes, this conflates the logical and the physical, but that's the way it is in postgres.


Yes, it's quite apparent in practice that WHERE constraints are not being propagated back into CTEs in cases where they clearly could be, so that's something to look out for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: