longdog's comments

longdog · on June 6, 2024

I feel the webpage strongly hints that sparse autoencoders were invented by OpenAI for this project.

Very weird that they don't cite this in their webpage and instead bury the source in their paper.

cosmojg · on June 7, 2024

Nahhh, that's the tried-and-true Apple approach to marketing, and OpenAI is well positioned to adopt it for themselves. They act like they invented transformers as much as Apple acts like they invented the smartphone.

longdog · on April 25, 2024

What are those other areas which display age-based creative decline? Other creative fields I can think of off the top of my head - scientific research, animation, fiction writing, architecture - are overwhelmingly dominated by older people.

Even in pop music, I'd argue that artists are doing very little of the actual heavy lifting compared to the producers and the writers. Pop singers have a much shorter shelf life than producer/writers due to the importance of image in appealing to younger fans. See https://en.wikipedia.org/wiki/Max_Martin for an example.

mellosouls · on April 25, 2024

Eg. Maths, physics, poetry. All dominated by the young, as pop music.

Note that I'm talking about creative brilliance - ie the outlier contributors that define or excel in the forms - not workmanlike producers of commercial trustworthiness as in your example (though I would assume that generally the same trend holds).

I said "some" because I'm aware that some fields (eg literature) do not follow the clear example in those.

longdog · on April 20, 2024

I've been playing with Llama 3 8b instruct but I've found it to be surprisingly low quality compared to some of the better Mistral 7b finetunes (zephyr, dolphin, openorca). Rather surprising because there's no way Mistral or any of the organizations doing the finetuning did even a fraction of the training volume that Meta did.

causal · on April 20, 2024

Complete opposite impression here

d-z-m · on April 20, 2024

Depending on the kind of questions you're asking it, the mistral finetunes may be much better positioned to give a high quality answer. An apples-to-apples comparison IMO would be Mistral 7B instruct vs Llama3 8b instruct.

d13 · on April 20, 2024

I’ve also found that Mistral instruct base is just as good, and gives less chatty replies. It all also gave me more consistently correct responses.

Sammi · on April 20, 2024

Shouldn't the finetunes be better than the vanilla llms? That's the point of a finetune? Maybe wait until there are llama finetunes to compare to the mistral finetunes?

longdog · on April 20, 2024

Population in Japan has barely fallen (yet). So far it's only a ~2% decline from peak population, but there will be a 20% decline in the next 20 years.

There is a long lag between below-replacement fertility and actual population decline. Because of how compounding growth works and the length of human lifespans, sub-replacement fertility won't result in population decline (for a previously fast-growing country) until 40+ years after the fact. Japan is only just now seeing the effects of lowish fertility from the 70s and 80s.

Note that one of the other consequences of population math is that if a country has been previously declining in population for a while, it'll continue to decline for decades even if the current fertility rate is at or slightly above replacement rate. This means that population decline is essentially an inevitability for most East Asian and European countries for the next several generations.

None of us knows what will happen when populations are falling by 5%+ per decade which is now the inevitable future of many countries the next few decades..it's totally unprecedented in human history (excluding cases like war/disaster).

longdog · on March 31, 2024

The author is way overselling how controversial Haidt's claim is.

- The effect of social media at a societal level isn't something that can tested experimentally. And like all non-experimental research, results depend heavily on modeling methodology (for example, the meta-analysis linked by the author is literally just a regression of facebook DAU on life satisfaction polls [1]). As a result there will NEVER a universal consensus, same as with most "macro" level studies in social sciences. If you wait for researchers to reach an agreement, you will wait forever.

- The most credible research I've seen on this is this quasi-experimental paper [2]. Since the timing of Facebook's rollout was staggered across schools, it can be used as a natural experiment. Schools should've seen declines in student mental health that correspond with the date of Facebook's rollout on their campus, which is indeed what happened.

- The effect of using social media at an individual level (as opposed to a societal level) IS known and it is very clearly negative. See [3] for an example.

- Most importantly, the fact is that mental health and suicide rates have been rising significantly since exactly when social media gained popularity (mid-to-late 2000s). The effect is global so you can't blame country-specfic policies. And the rise is most significant with demographics most exposed to social media (young people, especially girls). There's not a single other explanation that makes sense. Frankly, I think people are afraid to admit that social media is the problem because so many people are tech addicts and don't want to admit that their own addiction is part of the problem.

[1] https://royalsocietypublishing.org/doi/10.1098/rsos.221451

[2] https://pubs.aeaweb.org/doi/pdfplus/10.1257/aer.20211218

[3] https://www.aeaweb.org/articles?id=10.1257/aer.20190658

longdog · on March 26, 2024

Interesting, but I'm very skeptical. There are over a dozen transformers-based foundation time series model released in the past year and without fail, every one of them claims to be at or near SOTA. For example:

- Time-LLM (https://arxiv.org/abs/2310.01728)

- Lag-Llama (https://arxiv.org/abs/2310.08278)

- UniTime (https://arxiv.org/abs/2310.09751)

- TEMPO (https://arxiv.org/abs/2310.04948)

- TimeGPT (https://arxiv.org/abs/2310.03589)

- TimesFM (https://arxiv.org/html/2310.10688v2)

- GPT4TS (https://arxiv.org/pdf/2308.08469.pdf)

Yet not a SINGLE transformer-based model I've managed to successfully run has beaten gradient boosted tree models on my use case (economic forecasting). To be honest I believe these foundational models are all vastly overfit. There's basically only 2 benchmarking sets that are ever used in time series (the Monash set and the M-competition set), so it'd be easy to overtune a model just to perform well on these.

I would love to see someone make a broader set of varied benchmarks and have an independent third party do these evaluations like with LLM leaderboards. Otherwise I assume all published benchmarks are 100% meaningless and gamed.

dcl · on March 26, 2024

Why would you expect anything to work well for economic forecasting :p

donbreo · on March 26, 2024

Jamie pull up the article that proves none of the published models work well with economic forecasting

ImHereToVote · on March 26, 2024

There is always Gary Stevensons Economics model. Works without fail.

rokkitmensch · on March 26, 2024

I'm so sad. This hilarious comment is languishing in the doldrums.

idiotsecant · on March 26, 2024

Not reddit.

logicchains · on March 26, 2024

Pretty much any real-world time series prediction task is going to involve more data than just the time series itself, and some of this data will probably be tabular, so it's not surprise gradient boosted trees perform better.

hackerlight · on March 26, 2024

Neural nets are known to struggle with tabular data. Have you tried fine tuning or attaching a decoder somewhere that you train on your task? Zero-shot inference might be asking for too much.

boredemployee · on March 26, 2024

>> Neural nets are known to struggle with tabular data.

Not disagreeing with you, and I'm not a specialist, but it's funny that lot of papers seem to claim exactly the opposite.

hackerlight · on March 26, 2024

What paper says the opposite? This is what I can find:

https://arxiv.org/abs/2207.08815

https://arxiv.org/abs/2305.02997

Tarq0n · on March 26, 2024

Honestly the best part of this paper is they've put together a large new set of time series for benchmarking.

tudorw · on March 26, 2024

https://facebook.github.io/prophet/

"Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."

refulgentis · on March 26, 2024

longdog · on March 17, 2024

You don't need to be a cutting edge research scientist to train a SOTA LLM. You just need money for scaling. OpenAI's "secret" was just their willingness to spend tens/hundreds of millions without guaranteed returns, and RLHF/instruct fine tuning, both of which are out of the bag now.

simonw · on March 17, 2024

Disagree. It took more than 12 months from the release of GPT-4 to someone else producing a model of equivalent quality, and that definitely wasn't due to a shortage of investment from the competition.

There's a huge amount of depth in training a really good LLM. Not helped by the fact that iteration is incredibly expensive - it might take several months (and millions of dollars) before you can tell if your new model is working well or if there was some mistake in the pipeline that lead to a poor quality result.

Almost all of the world-class LLMs outside of OpenAI/DeepMind have been trained by people who previously worked at those organizations - giving them invaluable experience such that they could avoid the most expensive mistakes while training their new models.

lossolo · on March 17, 2024

Don’t overlook the training data (used for both training and instruction fine-tuning), it is one of the most crucial aspects, if not the most critical, given the significant differences observed in models with similar architectures.

barrell · on March 18, 2024

While I do agree there is some amount of secret sauce, keep in mind the training takes several months. So from someone to see the success of GPT4, decide they want to invest that amount of money to train the same, raise the money to train the model, find someone competent to supervise the training, train the model for several months, then test and integrate it could easily be a year long even if there was no secret sauce.

echelon · on March 17, 2024

That only remains an advantage if they can continue climbing the gradient from their lead position. If they hit a snag in scaling, methodology, or research, everyone else on the planet catches up, and then it's anyone's game again.

int_19h · on March 18, 2024

There's still no model of equivalent quality to GPT-4.

bbig · on March 18, 2024

Claude 3 Opus is reporting superior metrics, particularly in its coding ability, and in the LLM Arena it is statistically tied with GPT-4.

int_19h · on March 18, 2024

When it comes to LLMs, metrics are misleading and easy to game. Actually talking to it and running it through novel tasks that require ability to reason very quickly demonstrates that it is not on par with GPT-4. As in, it can't solve things step-by-step that GPT-4 can one-shot.

FloorEgg · on March 18, 2024

This was exactly my experience. I have very complex prompts and I test them on new models and nothing performs as well as GPT-4 that I've tried (Claude 3 Opus included)

astrange · on March 18, 2024

It's a bit better at writing jokes. GPT is stiff and unfunny - which is why the twitter spambots using it to generate text are so obvious.

johnthewise · on March 18, 2024

Claude opus is better in my experience