So is there a place to download the trained model? I don't see anything but the ...

gwern · on Feb 10, 2020

They probably won't since DM doesn't open-source most of its work. The authors claimed way back in November that they'd at least open-source the code (https://openreview.net/forum?id=SylKikSYDH) but nothing yet. (The model isn't so big that open-sourcing it is all that important. It's no Turing-NLG https://www.microsoft.com/en-us/research/blog/turing-nlg-a-1... that's for sure!)

In the mean time, there's always Reformer, which has Trax and PyTorch implementations.

hooande · on Feb 10, 2020

I believe the transformer-xl pre-trained model can also be downloaded, to provide a similar long term memory functionality as the compression transformer. I don't have a direct link, but it's available via huggingface https://huggingface.co/transformers/pretrained_models.html

gwern · on Feb 11, 2020

Yeah. I didn't mention Transformer-XL because I'm not sure how much of a long-range dependency it actually learns to handle. The only papers I've seen on recurrency indicate that they tend to learn very short-range dependencies, while something like Reformer with direct access to thousands of timesteps seems more likely to actually be making use of them.

ColanR · on Feb 10, 2020

Wow, that's a lot of models. Thanks for pointing that out.

jwrae · on Feb 14, 2020

Hi, I have open-sourced the tensorflow model in the sonnet package:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f7...

Will look into releasing some pre-trained weights, but the model trained on PG-19 is not really intended to be a general purpose language generation model so I'd prefer if it not be picked up for downstream applications like gpt2 & bert. The text from these old books contains some historical bias etc.

Hopefully the model can be useful for people wanting to model long sequences generally, or build on other compressive memory ideas.