Nahhh, that's the tried-and-true Apple approach to marketing, and OpenAI is well positioned to adopt it for themselves. They act like they invented transformers as much as Apple acts like they invented the smartphone.
What are those other areas which display age-based creative decline? Other creative fields I can think of off the top of my head - scientific research, animation, fiction writing, architecture - are overwhelmingly dominated by older people.
Even in pop music, I'd argue that artists are doing very little of the actual heavy lifting compared to the producers and the writers. Pop singers have a much shorter shelf life than producer/writers due to the importance of image in appealing to younger fans. See https://en.wikipedia.org/wiki/Max_Martin for an example.
Eg. Maths, physics, poetry. All dominated by the young, as pop music.
Note that I'm talking about creative brilliance - ie the outlier contributors that define or excel in the forms - not workmanlike producers of commercial trustworthiness as in your example (though I would assume that generally the same trend holds).
I said "some" because I'm aware that some fields (eg literature) do not follow the clear example in those.
I've been playing with Llama 3 8b instruct but I've found it to be surprisingly low quality compared to some of the better Mistral 7b finetunes (zephyr, dolphin, openorca). Rather surprising because there's no way Mistral or any of the organizations doing the finetuning did even a fraction of the training volume that Meta did.
Depending on the kind of questions you're asking it, the mistral finetunes may be much better positioned to give a high quality answer. An apples-to-apples comparison IMO would be Mistral 7B instruct vs Llama3 8b instruct.
Shouldn't the finetunes be better than the vanilla llms? That's the point of a finetune? Maybe wait until there are llama finetunes to compare to the mistral finetunes?
Population in Japan has barely fallen (yet). So far it's only a ~2% decline from peak population, but there will be a 20% decline in the next 20 years.
There is a long lag between below-replacement fertility and actual population decline. Because of how compounding growth works and the length of human lifespans, sub-replacement fertility won't result in population decline (for a previously fast-growing country) until 40+ years after the fact. Japan is only just now seeing the effects of lowish fertility from the 70s and 80s.
Note that one of the other consequences of population math is that if a country has been previously declining in population for a while, it'll continue to decline for decades even if the current fertility rate is at or slightly above replacement rate. This means that population decline is essentially an inevitability for most East Asian and European countries for the next several generations.
None of us knows what will happen when populations are falling by 5%+ per decade which is now the inevitable future of many countries the next few decades..it's totally unprecedented in human history (excluding cases like war/disaster).
The author is way overselling how controversial Haidt's claim is.
- The effect of social media at a societal level isn't something that can tested experimentally. And like all non-experimental research, results depend heavily on modeling methodology (for example, the meta-analysis linked by the author is literally just a regression of facebook DAU on life satisfaction polls [1]). As a result there will NEVER a universal consensus, same as with most "macro" level studies in social sciences. If you wait for researchers to reach an agreement, you will wait forever.
- The most credible research I've seen on this is this quasi-experimental paper [2]. Since the timing of Facebook's rollout was staggered across schools, it can be used as a natural experiment. Schools should've seen declines in student mental health that correspond with the date of Facebook's rollout on their campus, which is indeed what happened.
- The effect of using social media at an individual level (as opposed to a societal level) IS known and it is very clearly negative. See [3] for an example.
- Most importantly, the fact is that mental health and suicide rates have been rising significantly since exactly when social media gained popularity (mid-to-late 2000s). The effect is global so you can't blame country-specfic policies. And the rise is most significant with demographics most exposed to social media (young people, especially girls). There's not a single other explanation that makes sense. Frankly, I think people are afraid to admit that social media is the problem because so many people are tech addicts and don't want to admit that their own addiction is part of the problem.
Interesting, but I'm very skeptical. There are over a dozen transformers-based foundation time series model released in the past year and without fail, every one of them claims to be at or near SOTA. For example:
Yet not a SINGLE transformer-based model I've managed to successfully run has beaten gradient boosted tree models on my use case (economic forecasting). To be honest I believe these foundational models are all vastly overfit. There's basically only 2 benchmarking sets that are ever used in time series (the Monash set and the M-competition set), so it'd be easy to overtune a model just to perform well on these.
I would love to see someone make a broader set of varied benchmarks and have an independent third party do these evaluations like with LLM leaderboards. Otherwise I assume all published benchmarks are 100% meaningless and gamed.
Pretty much any real-world time series prediction task is going to involve more data than just the time series itself, and some of this data will probably be tabular, so it's not surprise gradient boosted trees perform better.
Neural nets are known to struggle with tabular data. Have you tried fine tuning or attaching a decoder somewhere that you train on your task? Zero-shot inference might be asking for too much.
"Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well."
You don't need to be a cutting edge research scientist to train a SOTA LLM. You just need money for scaling. OpenAI's "secret" was just their willingness to spend tens/hundreds of millions without guaranteed returns, and RLHF/instruct fine tuning, both of which are out of the bag now.
Disagree. It took more than 12 months from the release of GPT-4 to someone else producing a model of equivalent quality, and that definitely wasn't due to a shortage of investment from the competition.
There's a huge amount of depth in training a really good LLM. Not helped by the fact that iteration is incredibly expensive - it might take several months (and millions of dollars) before you can tell if your new model is working well or if there was some mistake in the pipeline that lead to a poor quality result.
Almost all of the world-class LLMs outside of OpenAI/DeepMind have been trained by people who previously worked at those organizations - giving them invaluable experience such that they could avoid the most expensive mistakes while training their new models.
Don’t overlook the training data (used for both training and instruction fine-tuning), it is one of the most crucial aspects, if not the most critical, given the significant differences observed in models with similar architectures.
While I do agree there is some amount of secret sauce, keep in mind the training takes several months. So from someone to see the success of GPT4, decide they want to invest that amount of money to train the same, raise the money to train the model, find someone competent to supervise the training, train the model for several months, then test and integrate it could easily be a year long even if there was no secret sauce.
That only remains an advantage if they can continue climbing the gradient from their lead position. If they hit a snag in scaling, methodology, or research, everyone else on the planet catches up, and then it's anyone's game again.
When it comes to LLMs, metrics are misleading and easy to game. Actually talking to it and running it through novel tasks that require ability to reason very quickly demonstrates that it is not on par with GPT-4. As in, it can't solve things step-by-step that GPT-4 can one-shot.
This was exactly my experience. I have very complex prompts and I test them on new models and nothing performs as well as GPT-4 that I've tried (Claude 3 Opus included)
Very weird that they don't cite this in their webpage and instead bury the source in their paper.