That's the thing though, there is interest in "metaverse" style programs. VRChat, the biggest one, got 80k concurrent users last month (all time peak) according to SteamDB. Seems low, but hardware is a limiting factor for them.
What happened is Facebook's version of this was a corporatized, simplified, G-rated fraction of what its competition is. Despite being in a medium where the defining factor is the ability to look out the eyes of anything vaguely humanoid, you could only be a generic human who only exists from the waist up, devoid of almost any self expression beyond maybe accessories or retexturing.
As a result, there was no audience: the people who already use VR aren't going to go to an inferior product. And the people who would buy a VR headset aren't going to waste their time on a ghost town.
The thing is, Facebook/Meta wasn't trying to make a product with 80k concurrent users, or even with 800k concurrent users. Facebook has 3 billion MAU, and they literally renamed the entire company to Meta - they were expecting it to be big, hundreds of millions of users.
They hoped it would be a platform for fitness classes, business meetings, college classrooms, shopping, attending concerts [1] and so on.
If the primary appeal of your VR universe is that your avatar can be an anthropomorphic banana, an anime girl, a furry, a giant penis with legs - that's never going to become a 300-million-user platform.
I think what Meta didn't realize (or maybe they did and ignored it) was that they were not pioneering the metaverse. They already existed on the platforms you just mentioned. I've never played Roblox or Second Life but I know kids and teens who live on Roblox and adults who live on Second Life. Those worlds _were_ their metaverses, and there was no reason to jump ship to another platform when they already had a digital life established. And meta just ended up making a shitty version of the metaverse anyway for the reason you mentioned.
It's not that the metaverse never took off — the popularity of Roblox and Second life (and other online social spaces) is proof that the metaverse was in demand. It's that Meta never gave people a reason to join their metaverse.
Note that I'm loosely defining the "metaverse" as any online world where the community is the point and people spend real money to "get ahead" in those worlds. Many MMOs can be metaverses in this sense. I've logged onto Final Fantasy XIV and saw people who logged on just to hang out at their friend's in-game house, not to play the game at all.
I think the biggest problem that you hint as is that "metaverse" is an ill-defined term. When they rebranded, and given that I had been working in the 3d industry for _many_ years, I couldn’t define what the metaverse was.
To some extent I still cant. The real indicator is when the crypto bros started peddling it, then we all knew it was shite.
Shocking to watch this human imitate us, no shade to anyone neurodivergent either, but obviously it could track he would allegedly[1] OK with his bots sexting literal children—he’s obviously only making an effort to be like us (but he isn’t)
[1]not by me; Mark, you can sue Joseph Gordon-Levitt (Oct ‘25)
> If the primary appeal of your VR universe is that your avatar can be an anthropomorphic banana, an anime girl, a furry, a giant penis with legs - that's never going to become a 300-million-user platform.
I mean the inherent appeal of VR is self-expression; being who you want to be, seeing the worlds you want to see. You won't get 300 million users with corporate slop either. That maybe works once, if ever, VR headsets become an interface suitable for white collar work, which they currently very much aren't, and then it wouldn't be the next Facebook - it'd be the next Microsoft Teams. Which is not really in line with Meta's other offerings, though they certainly wouldn't say no to it I guess. But I think a 500-user survey is all it would take to get a very clear signal that current VR is NOT about to replace Teams.
Indeed, the people who would like to spend hours and hours hanging out in the digital world like something out of Snow Crash are not generally the kind of people to want to hang out in a simulated corporate lobby under the watchful gaze of someone like Zuckerberg.
I'm absolutely sure there is a massive market (or at least user base) for a metaverse but until spending more time in VR than reality is mainstream, the audience is the underground clubbers and kids behind the bike sheds of the digital world.
Until we reach the point where outside becomes ruined and hostile I do not think a metaverse has much attraction to your average person, I see that as the main reason as for why VR became MR and then just AR.
Also you missed furries from your audience group, there is overlap but it is a pretty distinctive group that is actively drawn towards VR for creative expression.
Indeed, physical world, nature, mountains, beaches, human look-in-the-eyes interaction, breeze of fresh air on a hill you climbed and so on is something extremely important to humans. Some feel it more, some less but ie everybody recharges in nature, just not everybody is so connected with their own bodies to actually recognize it.
I like a bit of gaming and VR seems like almost-there, but its just a gimmick in one's life, and for life quality purposes never should become more than fringe relax activity.
And for corporate-privacy-destroying virtual spaces - they would have to pay me massive amounts to spend, unwillingly, any time there. Those are the last people who should be in charge of such place
Indeed! Your comment is probably the most important in this thread. The Korean/German philosopher Byung-Chul Han writes a lot about losing humanity because of tech advances.
I am retired so this is easier for me to do: For every hour each day I spend on tech (personal AI research, writing) I spend 90 minutes hiking with friends, playing games like Bridge, enjoying meals with my wife and friends, reading good literature and philosophy, etc.
I worked for 50 years before retiring, but even working, I tried to balance human time vs. tech and work - often leaving 'money on the table' but it was worth it.
Pardon an old man ranting, but I think so many people seem caught up in the wrong things.
The SteamDB player number for VRChat is kind of underselling its size since half the player base is on other platforms, primarily running it standalone on Meta Quest.
A few days ago it reached 156k across all platforms because of some event that is outside my sphere of interest. And VRChat is generally above 100k per day peak nowadays.
https://metrics.vrchat.community/?orgId=1&refresh=30s&from=n...
But it is definitely limited by hardware and while it is constantly growing, its growth is dependent on there being a supply of relatively cheap hardware.
> That's the thing though, there is interest in "metaverse" style programs. VRChat, the biggest one, got 80k concurrent users last month (all time peak) according to SteamDB. Seems low, but hardware is a limiting factor for them.
The problem here is that "the metaverse" has a specific meaning, and that meaning was a Potemkin-elevator-pitch.
People were envisioning the ability to take a rocket launcher from Halo and use it directly in all your other games. Which is a fun sketch*, but nobody thought past the sketch into any concept of why any game developer would support that, well, meta.
To the extent that VRChat gets around this, it's because it's being a playground rather than a meta-game. So, again, the "meta" part isn't there, at least not to the extent envisioned by people who saw Ready Player One and thought "Yes! Also, I like what Nolan Sorrento is saying, how many more ads can we put into our stuff?"
There is a niche interest. Meta bet was on the next iPhone. They were either way too early or completely off.
Though I’m personally happy to see massive corporations spend their money on pushing the state of the art in niche fields instead of using it for more evil stuff. I’m not sure why people care that they burn their own money on risky bets, that’s great for my point of view. We need more of that
I'm not sure how you define metaverse but some games where you get together with friends in virtual worlds like Fortnite have been pretty successful - $9bn+ revenue on that one. I've never been a big believer that it's important to strap the computer screen on your face rather than looking at it in the normal way.
Yeah, they totally did not get it & burned a lot of money. They could basically just dumped a much less money into VRChat (or even 1:1 cloning it) and getting almost assured success.
That, and I've never had to beg an LLM for an answer, or waste 5 minutes of my life typing up a paragraph to pre-empt the XY Problem Problem. Also never had it close my question as a duplicate of an unrelated question.
The accuracy tends to be somewhat lower than SO, but IMO this is a fair tradeoff to avoid having to potentially fight for an answer.
Related, I was talking to a computational chemist at a conference a few years ago. Their work was mostly at the intersection of ML and material science.
An interesting concept they mentioned was this idea of "injected serendipity" when they were screening for novel materials with a certain target performance. They proceed as normal, but 10% or so of the screened materials are randomly sampled from the chemical space.
They claimed this had led them to several interesting candidates across several problems.
Tesseract does not understand layout. It’s fine for character recognition, but if I still have to pipe the output to a LLM to make sense of the layout and fix common transcription errors, I might as well use a single model. It’s also easier for a visual LLM to extract figures and tables in one pass.
For my workflows, layout extraction has been so inconsistent that I've stopped attempting to use it. It's simpler to just throw everything into postgis and run intersection checks on size-normalized pages.
My documents have one or two-column layouts, often inconsistently across pages or even within a page (which tripped older layout detection methods). Most models seem to understand that well enough so they are good enough for my use case.
Documents that come from FOIA. So, some scanned, some not. Lots of forms and lots of hand writing to add info that the form format doesn't recognize. Lots of repeated documents, but lots of one-off documents that have high signal.
I like to use textual anchors for things like, "line starts with" or "line ends with" or "file ends with" and combining that with levenshtein distance with some normalization stuff (combining adjacent strings in various patterns to account for OCR wonkiness). Turns into building lists of anchors that can be built off of. Of all the things I've tried, including things like image hashing and such, it's been the most effective generalized "tool".
But also, I hold the strong philosophy that it's important to actually read the documents that are being scanned. In that way, OCR tends to be more of a procedural step than anything.
Tesseract v4 when it was released was exceptionally good and blew everything out of the water. Have used it to OCR millions of pages. Tbh, I miss the simplicity of tesseract.
The new models are similarly better compared to tesseract v4. But what I'll say is that don't expect new models to be a panacea for your OCR problems. The edge case problems that you might be trying to solve (like, identifying anchor points, or identifying shared field names across documents) are still pretty much all problematic still. So you should still expect things like random spaces or unexpected characters to jam up your jams.
Also some newer models tend to hallucinate incredibly aggressively. If you've ever seen an LLM get stuck in an infinite, think of that.
I used Tesseract v3 back in the day in combination with some custom layout parsing code. It ended up working quite well. When looking at many of the models coming out today the lack of accuracy scares me.
The domain expired a few days ago and was purchased by someone else and then changed. There's a recreation of the original here https://html5zombo.com/
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".
Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.
Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.
If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.
So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.
And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.
There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.
I'd personally like to see top conferences grow a "reproducibility" track. Each submission would be a short tech report that chooses some other paper to re-implement. Cap 'em at three pages, have a lightweight review process. Maybe there could be artifacts (git repositories, etc) that accompany each submission.
This would especially help newer grad students learn how to begin to do this sort of research.
Maybe doing enough reproductions could unlock incentives. Like if you do 5 reproductions than the AC would assign your next paper double the reviewers. Or, more invasively, maybe you can't submit to the conference until you complete some reproduction.
The problem is that reproducing something is really, really hard! Even if something doesn't reproduce in one experiment, it might be due to slight changes in some variables we don't even think about. There are some ways to circumvent it (e.g. team that's being reproduced cooperating with reproducing team and agreeing on what variables are important for the experiemnt and which are not), but it's really hard. The solutions you propose will unfortunately incentivize bad reproductions and we might reject theories that are actually true because of that. I think that one of the best way to fight the crisis is to actually improve quality of science - articles where authors reject to share their data should be automatically rejected. We should also move towards requiring preregistration with strict protocols for almost all studies.
Yeah, this feels like another reincarnation of the ancient "who watches the watchmen?" problem [1]. Time and time again we see that the incentives _really really_ matter when facing this problem; subtle changes can produce entirely new problems.
That's fine! The tech report should talk about what the researchers tried and what didn't work. I think submissions to the reproducibility track shouldn't necessarily have to be positive to be accepted, and conversely, I don't think the presence of a negative reproduction should necessarily impact an author's career negatively.
And that's true! It doesn't make sense to spend a lot of resources on reproducing things when there is low hanging fruit of just requiring better research in the first place.
Is it time for some sort of alternate degree to a PhD beyond a Master's? Showing, essentially, "this person can learn, implement, validate, and analyze the state of the art in this field"?
Thats what we call a Staff level engineer. Proven ability to learn, implement and validate is basically the "it factor" businesses are looking for.
If you are thinking about this from an academic angle then sure its sounds weird to say "Two Staff jobs in a row from the University of LinkedIn" as a degree. But I submit this as basically the certificate you desire.
No, this is not at all being a staff engineer. One is about delivering high-impact projects toward a business's needs, with all the soft/political things that involves, and the other is about implementing and validating cutting-edge research, with all the deep academic and technical knowledge and work that that involves. They're incredibly different skillsets, and many people doing one would easily fail in the other.
> The challenge is there really isn't a good way to incentivize that work.
What if we got Undergrads (with hope of graduate studies) to do it? Could be a great way to train them on the skills required for research without the pressure of it also being novel?
Those undergrads still need to be advised and they use lab resources.
If you're a tenure-track academic, your livelihood is much safer from having them try new ideas (that you will be the corresponding author on, increasing your prestige and ability to procure funding) instead of incrementing.
And if you already have tenure, maybe you have the undergrad do just that. But the tenure process heavily filters for ambitious researchers, so it's unlikely this would be a priority.
If instead you did it as coursework, you could get them to maybe reproduce the work, but if you only have the students for a semester, that's not enough time to write up the paper and make it through peer review (which can take months between iterations)
Unfortunately, that might just lead to a bunch of type II errors instead, if an effect requires very precise experimental conditions that undergrads lack the expertise for.
Could it be useful as a first line of defence? A failed initial reproduction would not be seen as disqualifying, but it would bring the paper to the attention of more senior people who could try to reproduce it themselves. (Maybe they still wouldn't bother, but hopefully they'd at least be more likely to.)
Most interesting results are not so simple to recreate that would could reliably expect undergrads to do perform the replication even if we ignore the cost of the equipment and consumables that replication would need and the time/supervision required to walk them through the process.
> Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations."
It's the Google search algorithm all over again. And it's the certificate trust hierarchy all over again. We keep working on the same problems.
Like the two cases I mentioned, this is a matter of making adjustments until you have the desired result. Never perfect, always improving (well, we hope). This means we need liquidity with the rules and heuristics. How do we best get that?
I'm delighted to inform you that I have reproduced every patent-worthy finding of every major research group active in my field in the past 10 years. You can check my data, which is exactly as theory predicts (subject to some noise consistent with experimental error). I accept payment in cash.
Patent revenue is mostly irrelevant, as it's too unpredictable and typically decades in the future. Academics rarely do research that can be expected to produce economic value in the next 10–20 years, because the industry can easily outspend the academia in such topics.
Most papers generate zero patent revenue or even lead to patents at all. For major drugs maybe that works but we already have clinical trials before the drug goes to market that validate the efficacy of the drugs.
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".
usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.
sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.
thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem
theres a problem of people fabricating and fudging data and not making their raw data available ("on request" or with not enough meta data to be useful) which wastes everyones time and almost never leads to negative consequences for the authors
It's often quite common to see a citation say "BTW, we weren't able to reproduce X's numbers, but we got fairly close number Y, so Table 1 includes that one next to an asterisk."
The difficult part is surfacing that information to readers of the original paper. The semantic scholar people are beginning to do some work in this area.
yeah thats a good point. the citation might actually be pointing out a problem and not be a point in favor. its a slog to figure out... but seems like the exact type of problem an LLM could handle
give it a published paper and it runs through papers that have cited it and give you an evaluation
That feels arbitrary as a measure of quality. Why isn't new research simply devalued and replication valued higher?
"Dr Alice failed to reproduce 20 would-be headline-grabbing papers, preventing them from sucking all the air out of the room in cancer research" is something laudable, but we're not lauding it.
No, you do not have to. You give people with the skills and interest in doing research the money. You need to ensure its spent correctly, that is all. People will be motivated by wanting to build a reputation and the intrinsic reward of the work
If we did that, CERN could not publish, because nobody else has the capabilities they do. Do we really want to punish CERN (which has a good track record of scientific integrity) because their work can't be reproduced? I think the model in many of these cases is that the lab publishing has to allow some number of postdocs or competitor labs to come to their lab and work on reproducing it in-house with the same reagents (biological experiments are remarkably fragile).
SO was/is a great site for getting information if (and only if) you properly phrase your question. Oftentimes, if you had an X/Y problem, you would quickly get corrected.
God help you if you had an X/Y Problem Problem. Or if English wasn't your first language.
I suspect the popularity is also boosted by the last two; it will happily tell you the best way to do whatever cursed thing you're trying to do, while still not judging over English skills.
It's less the fact that someone owns JS's trademark, and more that it's specifically Oracle (they got it when they bought Sun).
Oracle is an incredibly litigious company. Their awful reputation in this respect means that the JS ecosystem can never be sure they won't swoop in and attempt to demand rent someday. This is made worse by the army of lawyers they employ; even if they're completely in the wrong, whatever project they go after probably won't be able to afford a defense.
> Oracle is an incredibly litigious company. Their awful reputation in this respect means that the JS ecosystem can never be sure they won't swoop in and attempt to demand rent someday. This is made worse by the army of lawyers they employ; even if they're completely in the wrong, whatever project they go after probably won't be able to afford a defense.
That is why on one level I am surprised by the petition. They are talking to a supercharged litigation monster and are asking it "Dear Oracle, ... We urge you to release the mark into the public domain". You know what a litigation happy behemoth does in that case? It goes asks some AI to write a "Javascript: as She Is Spoke" junk book on Amazon just so they can hang on to the trademark. Before they didn't care but now that someone pointed it out, they'll go out of their way to assert their usage of it.
On the other hand, maybe someone there cares about their image and would be happy to improve it in the tech community's eyes...
> It goes asks some AI to write a "Javascript: as She Is Spoke" junk book on Amazon just so they can hang on to the trademark.
IANAL, but I don't think that wouldn't be enough to keep the trademark.
Also the petition was a "we'll ask nicely first so we can all avoid the hastle and expense of legal procedings", they are now in the process of getting the trademark invalidated, but Oracle, illogically but perhaps unsurprisingly is fighting it.
I was just using it as an example of doing the absolute minimum. They could write a dumb Javascript debugger or something with minimal effort.
But yeah, IANAL either and just guessing, I just know Oracle is shady and if you challenge them legally they'll throw their weight around. And not sure if responding to a challenge with a new "product" is enough to reset the clock on it. Hopefully a the judge will see through their tricks.
Trademark law is kind of about hypotheticals though. The purpose of a trademark is to prevent theoretical damages from potential confusion, neither of which you ever have to show to be real
In this case the trademark existing and belonging to Oracle is creating more confusion than no trademark existing, so deleting it is morally right. And because Oracle isn't actually enforcing it it is also legally right
Imho this is just the prelude to get better press. "We filed a petition to delete the JavaScript trademark" doesn't sound nearly as good as "We collected 100k signatures for a letter to Oracle and only got silence, now we formally petition the USPTO". It's also a great opportunity to find pro-bono legal council or someone who would help fund the petition
The other aspect here is that general knowledge (citation needed) says that if a company doesn't actively defend their trademark, they often won't be able to keep it if challenged in court. Or perhaps general knowledge is wrong.
Assuming Oracle did decide to go down that route, who would they sue? No one really uses the JavaScript name in anything official except for "JavaScriptCore" that Apple ships with Webkit.
My bad, after reading more it seems Deno is trying to get Oracle's trademark revoked, but I found out that "Rust for Javascript" devs have received a cease and desist from Oracle regarding the JS trademark, which may have triggered Deno to go after Oracle.
The incredibly litigious company here is Deno. Deno sued on a whim, realized they were massively unprepared, then asked the public to fund a legal campaign that will benefit Deno themselves, a for-profit, VC-backed company.
This personal vendetta will likely end with the community unable to use the term JavaScript. Nobody should support this.
1. Oracle is the litigious one here. My favorite example is that time they attacked a professor for publishing less-than-glowing benchmarks of their database: https://danluu.com/anon-benchmark/ What's to stop them from suing anyone using the term JavaScript in a way that isn't blessed by them? That's what Deno is trying to protect against.
2. Deno is filing a petition to cancel the trademark, not claim it themselves. This would return it to the public commons.
It should be obvious from these two facts that any member of the public that uses JavaScript should support this, regardless of what they think of Deno-the-company.
That's the thing though, there is interest in "metaverse" style programs. VRChat, the biggest one, got 80k concurrent users last month (all time peak) according to SteamDB. Seems low, but hardware is a limiting factor for them.
What happened is Facebook's version of this was a corporatized, simplified, G-rated fraction of what its competition is. Despite being in a medium where the defining factor is the ability to look out the eyes of anything vaguely humanoid, you could only be a generic human who only exists from the waist up, devoid of almost any self expression beyond maybe accessories or retexturing.
As a result, there was no audience: the people who already use VR aren't going to go to an inferior product. And the people who would buy a VR headset aren't going to waste their time on a ghost town.
reply