It's encouraged with NaCl to cache the results of the Diffie-Hellman operation. Thus the protocol can be conceptually stateless, but the cache solves the problem of having to do per-packet, public-key operations.
From a quick skim of the code, it does appear that they might be doing this. They are calling the right functions for it (crypto_box_curve25519xsalsa20poly1305_afternm. (See "C precomputation interface" in http://nacl.cr.yp.to/box.html)
I've not looked at the actual protocol, but it's not impossible.
If we assume every packet contains the client's (random, generated at process start) public key then the server could (in theory) perform key-agreement for every packet, decrypt it and handle it. In practice, it would cache the result so that it only does one key-agreement operation per client.
If the server crashes (did you have the server in mind when you said "other side"?) then it looses its cache, but the first packet that the new instance of the server sees will have the client's public key in it so it just refills its cache immediately.
From a quick skim of the code, it does appear that they might be doing this. They are calling the right functions for it (crypto_box_curve25519xsalsa20poly1305_afternm. (See "C precomputation interface" in http://nacl.cr.yp.to/box.html)