I come from a background using Common Lisp for system and web development so I may see things differently than people who were introduced to it academically. Your code looks great, mine just needed to compile :)
The cons pair is a a really powerful primitive data structure, it is the linked list of Lisp. You can build a lot of powerful structures given this great base. I really don't it matters what architecture your running on.
Do they really confuse people new to Lisp? car+cdr is a pretty simple concept, when talking to new Lispers they usually don't complain about this.
Hash Tables are part of the Common Lisp Hyperspec so that is a non issue, they have been first class citizens since before Clojure existed (and are nicely integrated into things like loop)
I think the parent was saying car and cdr are confusing because they don't actually map to registers (i.e. address register and decrement register). On the other hand you don't have to think too hard about the composed versions of them (caddr, cadar, etc...) but they can easily make your code seem obfuscated. Racket offers both that and first/rest anyway so you can choose which style you prefer.
I don't mind car, cdr, and their composed versions at all. `caddr` is much easier for me to read than `(head (tail (tail ...`
I think their usage is sometimes a code smell (the ad-hoc structs I mentioned), but they're really useful when you want to deal with an "unlispy" list of tokens (say, a parser for a language that isn't Lisp).
I mean, it's neat that you can build everything out of cons, in the sense that it's neat that Turing machines and the lambda calculus can perform any computation, but that doesn't mean it's a good idea to so in your code, any more than encoding numbers with Church numerals. Why use alists when you can use hash tables, why use... weird SICP-style struct list things when you could just use structs/vectors, etc.
>Hash Tables are part of the Common Lisp Hyperspec so that is a non issue, they have been first class citizens since before Clojure existed (and are nicely integrated into things like loop)
I'm not that familiar with Common Lisp, but a quick search didn't reveal a syntax for hash table literals, which I would consider to be a prerequisite for calling them first class citizens. There are other funny uses of lists (not improper ones, but still) in Common Lisp, like named procedure arguments[1]. I guess my point is, why encourage these weird constructs in the first place? Sexps are good at representing recursive structure, tables are good at mapping names to values, why not just have both and let them do what they're good at? Obviously Common Lisp is very old and can't be changed after the fact, I'm just trying to imagine the "ideal Lisp," whatever that is.
>Do they really confuse people new to Lisp? car+cdr is a pretty simple concept, when talking to new Lispers they usually don't complain about this.
I wasn't, but I've seen it confuse other people. Either way, it's a wart on the syntax that seems out of place in Scheme (not so in Common Lisp, which has lots of other warts ;))
>I come from a background using Common Lisp for system and web development so I may see things differently than people who were introduced to it academically.
Well, I was introduced to Scheme through SICP, but my motivation for picking it up was more practical than theoretical; I'd heard all the message board talk, Paul Graham essays, etc. proclaiming Lisp to be the most powerful, productive family of programming languages that will turn you into an expert programmer and so on, and wanted in on that ;) Maybe it's more practical to be Getting Shit Done™ instead of worrying about syntactic warts, but I think choosing to not have dotted pairs runs a bit deeper, by forcing the language to promote at least tables to a first class syntactic citizen.
[1] For the unfamiliar, it's something like:
(message "Text" :style bold :color red)
Reads pretty well, but it still seems wrong to me (ymmv), like it's another spot in the language begging for first class hash tables. In Lua, you'd emulate named arguments by passing a table to the function. I wonder what Clojure programmers do here. This?
(message "Text" {style: bold, color: red})
Which doesn't make a big deal for readability, but I bet if you want to use a macro or whatever accepts named arguments, it'd be much easier to deal with the one with the real hash table.
I don't understand the argument for using a hash-table for kw-args. Hash tables only win against plists when you don't need to read every single element; when you always need to read every single element, then it's a losing proposition to use a hash-table instead of a plist.
Yes I agree with your later comment about syntactic blessing and all, but I think you picked a terrible example by choosing the single place in lisp where plists make more sense than hash tables.
Maybe it was a bad example, but in either case, I would expect a "sufficiently smart compiler" to transform
(defun fun (a &key (b 0) (c 0)) ...)
(fun 5 :b 3)
into something like
(defun fun (a b c) ...)
(fun 5 3 0)
Whether it does so as a regular macro or as a special form of the compiler doesn't matter, there's no need for there to be any difference in efficiency between representing keyword args with tables or plists.
As I later mentioned in another comment, it was a mistake to talk about hash tables when I was really more interested in the abstract map datatype.
Named arguments are there to be interfaces to procedures. With hash-tables you know nothing about them. With named arguments, we can ask for argument lists, check for missing arguments, complete arguments, prompt for arguments, ... Much of that can be done in the IDE or at compile time.
Named arguments had been introduced to Lisp with MDL (a Lisp dialect, brought to Lisp Machine Lisp and then to Common Lisp).
Using hash-tables for it is a step back from the view of development support.
; CL keyword arg syntax, taken from Practical Common Lisp
(defun foo (&key (a 0) (b 0)) (+ a b))
(foo :a 1) -> 1
(foo :a 1 :b 2) -> 3
; hypothetical table keyword arg syntax
; clojure defines commas to be whitespace, you can pretend they aren't there if you want
(defun foo ({a: 0, b: 0}) (+ a b))
(foo {a: 1}) -> 1
(foo {a: 1, b: 2}) -> 3
I fail to see why the table syntax would be any less usable or introspectable by compilers or editors. They'd have the advantage (which would be shared with alists, if Common Lisp had used them for keyword args) of sharing the syntax for keyword args with a syntax for optional structure properties (think XML's attributes)
> Generally I think it is a slight mistake to use specific data types in arglists - basically mixing syntax and data types.
That's kinda what alists and plists do! They mix up the abstract "map" or "table" data structure (something that maps keys to values) with some concrete implementation of it. '((key . value) (key . value)) and '(key value key value) are literal representations of two particular map-like data structures, whereas {key: value} or {key = value} could have any concrete representation that the language implementer desires.
Maybe I should have dropped the "hash" from "hash table" in my above posts, so that it was clear that I was interested in the syntactic benefit of using one syntax for mapping names to values, and not some imagined efficiency gains from having (read keyword-function-call) include a hash table instead of an alist.
In Common Lisp an implementation can implement a call like (foo :bar 10) how it wants. There is nothing said about lists, hash tables, property lists or assoc lists.
Whereas having a syntax for hash table {} and using this syntax in definitions/calls clearly indicates a conflict. What is it? Coincidence? Purpose?
Mind you - reader macros should be used with a degree of care otherwise you basically create completely new language. I remember working on a project where I got a drop of some code from one of the other organizations in the project (by tape, this was a long time ago) opening the file and wondering what language the code was written in and taking a while to realize that it was indeed Lisp - but one that had used reader macros to do strange and terrible things.
Mind you, once I learned about reader macros I went on to do my own crazy stuff with them - they are almost too powerful a feature (almost!).
Hash-tables are part of the standard, and available in conforming implementations out-of-the-box. (make-hash-table) produces an instance of a table which can be passed as argument to function, be assigned to variables, ... I don't think syntax defines what feature is or isn't first-class, even though it has an impact on it's usage.
Adding support for litteral hash-tables is possible, as it was already said: define a reader macro; that macro could even simply use the utility function "plist-hash-table", from Alexandria, which is used like this:
(alexandria:plist-hash-table '(:a 1 :b 2))
So that you could write #H{:a 1 :b 2} and get what you want.
On the first hand, people say that CL is bloated, but on the other hand, they want to have everything available by default. Do you expect Python or C++ implementations to come with regular expression built-in, or is it okay if you need to add an 'import' statement or link with Boost?
CL allows you to define libraries that can change even the basic surface syntax, for your convenience. Considering that there is more than enough defined in the specification, what benefits would be in having an "official" hash litteral syntax? consistency accross different projects? many people use the "cl-ppcre" regex library without problem.
Yes, regular expressions come in a library even though they could easily be labelled as a fundamental feature of a language, and hence be expected to be "first-class", or "built-in"; but again, that is not the case in CL, which is neither Perl nor Lua.
And about keyword parameters, this is just the surface syntax, which is on purpose based on a simple, basic representation of the syntax tree as lists of expressions.
Then, the compiler will work with the argument list in your function definition and take one or another approach to represent them in the object code. And I doubt that they are stored in a hash-table, which is unlikely an appropriate way to handle a couple of keyword arguments.
Similarly, when you define a class, everything is declared by a simple list of slots, where slots are lists of parameters: but in fact, the class might store the actual instance slots in a hash-table, an array, a list, a database or a memory mapped file.
The fact that you are mainly manipulating lists does not mean that everything is eventually represented as supposedly inefficient linked-list at runtime.
Fair enough, pretend I said "first class syntactic object."
> CL allows you to define libraries that can change even the basic surface syntax, for your convenience. Considering that there is more than enough defined in the specification, what benefits would be in having an "official" hash litteral syntax?
Reader macros are powerful, no doubt. The benefit of building this into a language would mainly be syntactic regularity. CL as it stands has keyword args, alists, property lists, and real hash tables (probably amongst other things I don't know about), that all fill basically the same purpose of mapping names to values. If the creators of Common Lisp and its parent lisps had started with hash tables with standard reader syntax, they probably wouldn't have felt the need to introduce so many subtly different but conceptually identical concepts. Lua, for instance, uses its single table data structure both as syntax and as an internal representation for pretty much any case where it needs to map some names to some values; from keyword arguments to environments to "metatables" that provide fairly powerful metaprogramming support, they're all just tables that you can write literally and use the same functions to manipulate. You could say that alists fill this purpose and are even more syntactically regular since they're still sexps, but then why doesn't CL use alists for keyword args, and why don't Scheme's keyword arg extensions use alist syntax? Historical accident, or are alists just ugly? If you can admit that they're a little too ugly and verbose to include in function calls, then maybe you can see why I don't like writing alist literals.
> The fact that you are mainly manipulating lists does not mean that everything is eventually represented as supposedly inefficient linked-list at runtime.
I'm well aware that a compiler can easily expand keyword arguments into regular ones. I strongly doubt your compiler will transform all the literal alists in your program into hash tables, and replace all the alist-ref functions and so on operating on them with the equivalent hash table functions. As long as alists are more syntactically blessed than hash tables, people will use them where another data structure would be more appropriate.
Assoc lists make little sense as argument lists. Named arguments are basically like property lists. The point of argument lists for functions in Common Lisp is that they enable a contract. This requires special built-in support in the language. I want to see during compile time if an arg is missing or additional.
Example: the function WRITE takes several keyword arguments, but not :BAR. The compiler complains about wrong use.
* (defun test () (write "foo" :bar 10))
; in: DEFUN TEST
; (WRITE "foo" :BAR 10)
;
; caught WARNING:
; :BAR is not a known argument keyword.
Now if you allow a hashtable or some other data structure to be passed, then it would also be great, if one could specify at definition time which keys it takes, their default values, etc. We also may want to find out which values were default values, and which were actually passed.
The keyword argument facility for functions in Common Lisp provides all of that.
This allows you compile-time checks for code and makes interfaces easier to use.
Common Lisp also allows access to arguments as lists. This is simpler and more Lispy than using hash tables. For most purposes lists are useful enough and hash-tables would just add overhead.
> I strongly doubt your compiler will transform all the literal alists in your program into hash tables,
Time for experiment.
SBCL doesn't, as far as I know. But you seem to imply that it is always better to use hash-tables, and I don't think so (even though hash-table can be implemented in a smart way, like in Lua).
Under roughly 10 elements, alist result in shorter code.
The function with an hash-table has a constant size relatively to the number of elements in the table, which is fully known at compile-time. The size of the code with a case/alist grows linearly with the number of elements.
Now, let's compare speed.
In the following versions, I have added more elements in the alist, just to be sure the code with an hash-table is the shortest (from 'a to 'l).
(time
(dotimes (i 10000000)
(dolist (x '(a b c d e f g h i j k l))
(my-fun-hash x))))
(time
(dotimes (i 10000000)
(dolist (x '(a b c d e f g h i j k l))
(my-fun-case x))))
(time
(dotimes (i 10000000)
(dolist (x '(a b c d e f g h i j k l))
(my-fun x))))
Results:
HASH:
2.491 seconds of real time
33,072 bytes consed
ALIST:
0.852 seconds of real time
0 bytes consed
CASE:
0.778 seconds of real time
0 bytes consed
Even though in terms of footprint, the code tend to be larger with alist quite rapidly (10 elements), the resulting code is faster and does not allocate memory.
Also, just to clarify, alist and property lists have a different behavior than hash-tables, namely that the sequential access allows you to shadow values from another list: if you write (cons (cons 'a b) older-alist), you have a new list where the value for key 'a is b, and where values for other keys are those found in older-list (even though older-list also contains key 'a).
I don't quite remember what is my point anyway ;-) but it was fun to test the different behaviors.
Microbenchmarks are fun but silly. I tried to write a similar one for Lua, but LuaJIT ultimately recognized the program was useless and boiled it down to a 3 instruction loop:
loop:
add ebp, 1
cmp ebp, 10000000
jle loop
Anyway, in regards to your results, I could be wrong, but I think I read somewhere that LuaJIT uses linear search for tiny tables for this reason. Dunno what the threshold is, if there is one. The performance then would be similar to an alist, but saving a few bytes and cycles by not needing to deal with the extra list pointers and indirection.
> Also, just to clarify, alist and property lists have a different behavior than hash-tables, namely that the sequential access allows you to shadow values from another list: if you write (cons (cons 'a b) older-alist), you have a new list where the value for key 'a is b, and where values for other keys are those found in older-list (even though older-list also contains key 'a).
Oh yes, doesn't emacs make good use of this trick for its configuration variables? You can get the same effect with Lua's metatables, however, with a little elbow grease:
Not quite as easy as "cons", but very flexible; __index can be another table to searched if the lookup on the child fails (which in turn can have its own parent and so on), but it can also be a "metamethod" that is called whenever a lookup fails and can then do arbitrary things. A neat example off the top of my head: OpenGL has the peculiarity that you do not know the address of any of its functions until runtime, requiring a program to call a lookup function for each function to get a usable pointer. Declaring FFI function prototypes for many OpenGL functions is not such a big deal, but looking up hundreds of functions that you will never use can add significantly to startup time. So someone (possibly an HN user?) wrote an OpenGL FFI library that uses the __index metamethod in a clever way; the first time a function like "GL.CreateShader" is called, the lookup fails, and the __index metamethod mangles the index name a bit and in turn calls (on Windows) the C function wglGetProcAddress to look up its address, which it then stores in the original table. Using this library, you can write code that uses GL functions willy-nilly, and their addresses will automatically be looked up at runtime the first time they are used.
Sure, you can do the same thing in any language with macros or a preprocessor, but is that cool or what?
> ... recognized the program was useless and boiled it down to a 3 instruction loop
The loop itself is quite useless, why wasn't it also removed ? (Just kidding)
I don't have anything agains Lua or hash tables in principle. And of course tables are used in practice in CL code, but they aren't the primary data-structure.
> __index can be another table to searched if the lookup on the child fails (which in turn can have its own parent and so on)
> the first time a function like "GL.CreateShader" is called, the lookup fails, and the __index metamethod mangles the index name a bit and in turn calls (on Windows) the C function wglGetProcAddress to look up its address, which it then stores in the original table. Using this library, you can write code that uses GL functions willy-nilly, and their addresses will automatically be looked up at runtime the first time they are used.
So, it is an association list implemented using tables, where links are given by the __index property, using a metatable.
So maybe it is convenient after all to have a very simple data-structure like cons in a language and let more complex data be implemented with it, instead of the opposite.
> The loop itself is quite useless, why wasn't it also removed ? (Just kidding)
Good question! I guess LuaJIT isn't optimized for programs that don't do anything.
> So, it is an association list implemented using tables, where links are given by the __index property, using a metatable.
I think that's a matter of opinion. It's interesting that being able to form these simple hierarchies is emergent property of alists, but just because Lua provides another mechanism to implement hierarchical lookups, doesn't mean that the language designers were trying to ape alists. If anything, I'd assume that Roberto and company were inspired by Smalltalk's doesNotUnderstand message when they implemented __index metamethods.
The cons pair is a a really powerful primitive data structure, it is the linked list of Lisp. You can build a lot of powerful structures given this great base. I really don't it matters what architecture your running on.
Do they really confuse people new to Lisp? car+cdr is a pretty simple concept, when talking to new Lispers they usually don't complain about this.
Hash Tables are part of the Common Lisp Hyperspec so that is a non issue, they have been first class citizens since before Clojure existed (and are nicely integrated into things like loop)