Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I believe the transformer-xl pre-trained model can also be downloaded, to provide a similar long term memory functionality as the compression transformer. I don't have a direct link, but it's available via huggingface https://huggingface.co/transformers/pretrained_models.html


Yeah. I didn't mention Transformer-XL because I'm not sure how much of a long-range dependency it actually learns to handle. The only papers I've seen on recurrency indicate that they tend to learn very short-range dependencies, while something like Reformer with direct access to thousands of timesteps seems more likely to actually be making use of them.


Wow, that's a lot of models. Thanks for pointing that out.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: