Hacker Newsnew | past | comments | ask | show | jobs | submit | michae2's commentslogin

Nice one! Thanks for trying it out!



Something I’ve wondered about Datalog is whether integers can be added to the language without losing guarantees about termination of query evaluation. It seems like as soon as we add integers with successor() or strings with concat() then we can potentially create infinite relations. Is there a way to add integers or strings (well, really basic scalar operations on integer or string values) while preserving termination guarantees?

This bit at the end of the article seems to imply it’s possible, maybe with some tricks?

> We could also add support for arithmetic and composite atoms (like lists), which introduce some challenges if we wish to stay “Turing-incomplete”.


Not without a termination checker. Take a look at Twelf, it is a logic programming language and proof assistant based on the dependently typed LF logical framework. You can use general algebraic types in relations, and in general queries can be non-terminating. However, the system has a fairly simple way of checking termination using moded relations and checking that recursive calls have structurally smaller terms in all arguments.

Twelf is quite elegant, although not as powerful as other proof assistants such as Coq. Proofs in Twelf are simply logic programs which have been checked to be total and terminating.

Edit: Here's a link to a short page in the manual which shows how termination checking works: https://twelf.org/wiki/percent-terminates/

The syntax of Twelf is a bit different from other logic languages, but just note that every rule must have a name and that instead of writing `head :- subgoal1, subgoal2, ..., subgoaln` you write `ruleName : head <- subgoal1 <- subgoal2 <- ... <- subgoaln`.

Also note that this approach only works for top-down evaluation because it still allows you to define infinite relations (e.g. the successor relation for natural numbers is infinite). Bottom-up evaluation will fail to terminate unless restricted to only derive facts that contribute to some ground query. I don't know if anyone have looked into that problem, but that seems interesting. It is probably related to the "magic sets" transformation for optimizing bottom-up queries, but as far as I understand that does not give any hard guarantees to performance, and I don't know how it would apply to this problem.


By total coincidence, today I've been looking at Datalog implementations. The Datalog(-adjacent) Soufflé tutorial says this right near the start:

> For practical usage, Soufflé extends Datalog to make it Turing-equivalent through arithmetic functors. This results in the ability of the programmer to write programs that may never terminate.

https://souffle-lang.github.io/tutorial


Here’s a quite recent interesting paper about this: https://dl.acm.org/doi/abs/10.1145/3643027

> In this article, we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.

It’s quite neat since this allows them to represent linear regression, gradient decent, shortest path (APSP) within a very similar framework as regular Datalog.

They have a whole section on the necessary condition for convergence (i.e. termination).


Hey, author here. The one time I go straight to bed after posting on hackernews is the one time I get a bunch of comments hahaha.

Yes you can add support for integers in various ways where termination is still guaranteed. The simplest trick is to distinguish predicates (like pred(X, 42)) from constraints (like X > 7). Predicates have facts, constraints do not. When checking that every variable in the head of a rule appears in the body, add the condition that it appears in a predicate in the body.

So if you have a predicate like age(X:symbol, Y:int), you can use its facts to limit the set of integers under consideration. Then, if you write:

age(X, A), A + 1 >= 18.

You'd get everyone that becomes an adult next year. Fancier solutions are also possible, for example by employing techniques from finite domain constraint solving.


> Prolog is honestly kind of jank, and I’m not talking about good vintage jank like C.

Respectfully I would encourage you to consider this comment is confusing bad Prolog with all Prolog, and you are really missing out on some elegant and powerful concepts if you believe Prolog stops here.

I would love this to be an invitation for you to reevaluate modern Prolog and see what the language is really capable of.

I'll throw you right in the deep end with Prolog Definite Clause Grammars[1] and meta-interpreters[2].

There are a few things which are very special and magical to Prolog which are quite mind expanding. I hope you enjoy the resources.

[1]: https://youtu.be/CvLsVfq6cks

[2]: https://youtu.be/nmBkU-l1zyc


DCGs are cute, I like that writing something that looks like a grammar allows you to very easily generate sentences for that grammar. That's amazing for things like property-based testing.

But why wouldn't you use Mercury instead? No extra-logical nonsense like cuts, IO done properly with linear types, much better performance, etc. DCGs aren't exclusive to Prolog, why suffer the janky bits?

And while they're not built-in to it, Soufflé Datalog supports algebraic datatypes (so you can model difference lists), and you could preprocess a DCG into the native clause form plus a length tracking argument to prevent it from creating an infinitely sized database.

I've never worked with meta-interpreters so I can't comment on those, but I will try to watch the video you posted later.


That's a great question. The answer to "why wouldn't you use Mercury instead?" should hopefully be more clear when you watch the meta-interpeter video. The syntax of Prolog is not accidental, and this is a case where it really matters. You sacrifice a lot of power by introducing things like dot access notation.

Additionally, there is nothing stopping you from writing pure monotonic Prolog, which does not involve cuts and uses pure IO.

Circling back to DCGs, the combination of DCGs plus interpreters, which can be expressed in a handful of lines of code, allows for powerful language crafting.

While constraint solvers are not unique to Prolog, CS+DCG+MI is a killer combination I've only ever seen in Prolog that when combined allow you to solve some wicked problems rather elegantly.

That being said, thanks for the reminder about Mercury, that looks like it could come in handy on some Java-only projects.

This is DCG+CS only, still quite powerful and I have adapted it for GOAP and Hierarchical Task Networks quite successfully: https://www.metalevel.at/zurg/

Finally, I will agree with you that there is a LOT of jank Prolog, but not ALL Prolog is jank, and some Prolog is truly exquisite-- I hope you will keep an eye out for it.


Don't know if you're still reading this thread but just in case I finally got around to watching the meta-interpreter video. I can definitely see why Prolog shines for this.

Though even in that video we see a display of "jank", e.g., needing to ensure that rules don't overlap, being careful with introducing redundant choicepoints, permission errors, the trick with +, etc. in the vanilla meta-interpreter and the workarounds to get efficient tail recursion in the list-based meta-interpreter.

However, I don't think there's any way to avoid such "jank" when implementing something this powerful. The "jank" is the price to be able to do this. So fair point that this is something Prolog, and only Prolog, can do.


You made my day. I am a long time lisp developer who held a similar view of Prolog for a very long time until for various reasons I stumbled on these videos, and then realized there was a whole aspect to the language I was unaware of that went beyond the logic. Really great community developing around this stuff particularly with Scryer and Trealla that are very active if you are interested in discussing any of this further.

The fact that we were able to have this exchange was great. I appreciate your points and look forward to incorporating your work into mine!


Thanks, this is really helpful! And great article.


Yes, for some kinds of operations on some kinds of data structures. The keyword/property is "monotonicity". Monotonic functions are guaranteed to terminate under fixed-point semantics.

Look into Datafun: A total functional language that generalizes Datalog. Also be sure to watch Datafun author Michael Arntzenius's Strangeloop talk.

https://www.rntz.net/datafun/

https://www.youtube.com/watch?v=gC295d3V9gE


It’s all about the terms. As soon as rules can create an infinite sequence of new terms for a single relation, e.g. by addition, you’ve got non-termination.



Your link didn't work for me hence sharing https://archive.is/IUfhW


Oh, thank you!


To put it somewhat irreverently: running large production databases is not for the squeamish.


> If that is a team level, it should be in the team table, not the player table.

It's all contrived, of course, but the reason I would consider skill level to be a player attribute rather than a team attribute is that there could be free agents with a skill level but no team:

INSERT INTO player VALUES (10, 'Pavlo', 'AAA', NULL);

Then with enough free agents, you could imagine building a new team out of free agents that are all at the same skill level:

UPDATE player SET team = 'Otters' WHERE level = 'AAA' AND team IS NULL ORDER BY id LIMIT 3;


PG only uses EvalPlanQual under read committed isolation. Under repeatable read the first update fails with a "could not serialize" error, just as it does under serializable.


It's a good question. For simple UPDATEs, CockroachDB always executes in a deterministic, serial order and so it's likely the rows will be locked in the same order by any competing updates. (This can be confirmed by looking at the query plans.) Complex UPDATEs using joins and subqueries will need explicit ORDER BY to always lock in the same order.

If an UPDATE has to retry halfway through, locks are held across the retry to help the system make progress. But as you point out, this could cause lock acquisition to happen in an unexpected order if new rows qualify during a retry. So far we haven't run into this, but we might need to provide an option for an UPDATE to drop locks on retry if deadlock turns into a bigger problem than livelock. It depends on the workload.


> So far we haven't run into this, but we might need to provide an option for an UPDATE to drop locks on retry if deadlock turns into a bigger problem than livelock.

I'm not sure what you mean by that: the design can deadlock but you just have not seen it happening yet?

Edit: oh i see in a comment bellow that deadlocks are detected and abort the transaction.


The timing of this example is tricky because the two update statements execute concurrently (which is only possible under read committed isolation; under serializable isolation it's much more like what you're describing).

Here's a full timeline in PG (U1 for first update, U2 for second update):

0. U1 begins executing, establishes read snapshot, starts pg_sleep(5).

1. U2 runs to completion.

2. U1 wakes up after 5 sec, scans `player` using snapshot from step 0.

3. U1 filters `team = Gophers`, gets 4, 5, 6.

4. U1 locks 4, 5, 6.

5. U1 performs EvalQualPlan: re-scans latest version of those locked rows, which sees U2's write to 4 but not to 3.

6. U1 performs EvalQualPlan: re-filters those locked rows using latest version, gets 5, 6.

7. U1 writes new versions.

CRDB is easier to reason about: after U1 wakes up from the sleep, it sees that it conflicts with U2 and simply retries the entire statement.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: