Yes, these stories should scare you off of cloud services in general, not one particular vendor. The root problem is that you're storing valuable information on "someone else's computer." And that someone can decide to stop serving you for any or no reason at all, and you are without recourse. This should be totally unacceptable, but somehow the world has normalized it.
Don't keep anything in a cloud service that you couldn't live with losing, unless you keep a local backup. Including and especially your identity (E-mail) which unlocks all your accounts.
No, the root problem is you put all the eggs in one basket ignoring the folk wisdom that predates anything digital
> Don't keep anything in a cloud service that you couldn't live with losing, unless you keep a local backup.
Translated: so do keep everything in a cloud service, just backup it at a fraction of the effort with / insecurity / unreliability / unavailability of your own computer
Yes, and, importantly, have a plan to be able to log in to and reset your passwords through e-mail, on all your other services, if you suddenly lose you@yourcloudemail.com
I consider “cloud” to be a single (unreliable) basket. If you have your online stuff spread across 5 cloud providers, than any of them locking you out will disrupt you in some way.
This broad reclassification makes no sense. If you put literal eggs in 5 baskets, then any of them falling down will disrupt your eggs in some way. You're missing the whole point of the principle, which is that it will not disrupt you in the same big way of blocking all your digital life like in the example from the post!
This is really cool. I've built things on PostgreSQL ts_vector() FTS in the past which works well but doesn't have whole-index ranking algorithms so can't do BM25.
It's a bit surprising to me that this doesn't appear to have a mechanism to say "filter for just documents matching terms X and Y, then sort by BM25 relevance" - it looks like this extension currently handles just the BM25 ranking but not the FTS filtering. Are you planning to address that in the future?
I found this example in the README quite confusing:
SELECT * FROM documents
WHERE content <@> to_bm25query('search terms', 'docs_idx') < -5.0
ORDER BY content <@> 'search terms'
LIMIT 10;
That -5.0 is a magic number which, based on my understanding of BM25, is difficult to predict in advance since the threshold you would want to pick varies for different datasets.
I actually don't love this example either, for the reasons you mention, but at some point we had questions about how to filter based on numeric ranking. Thanks for the reminder to revisit this.
Re filtering, there are often reasonable workarounds in the SQL context that caused me to deprioritize this for GA. With your example, the workaround is to apply post-filtering to select just matches with all desired terms. This is not ideal ergonomics since you may have to play with the LIMIT that you'll need to get enough results, but it's already a familiar pattern if you're using vector indexes. For very selective conditions, pre-filtering by those conditions and then ranking afterwards is also an option for the planner, provided you've created indexes on the columns in question.
All this is just an argument about priorities for GA. Now that v1.0 is out, we'll get signal about which features to prioritize next.
While we’re talking about filtering — is there a way to set a WHERE clause when you’re setting up the index? I’ve been working on this a lot recently for a hybrid vector search in pg. One of the things that I’m running up against is setting a good BM25 index for a subset of a table (the where clause). I have a document subsets with very different word frequencies, so I’m trying to make sure that the search works on a set subset.
I think I can also setup partitions for this, but while you’re here… I’m very excited to start to roll this out.
Partitions would be one option, and we've got pretty robust partitioned table support in the extension. (Timescaledb uses partitioning for hypertables, so we had to front-load that support). Expression indexes would be another option, not yet done but there is a community PR in flight: https://github.com/timescale/pg_textsearch/pull/154
That's like saying there's no point in attending a lecture on "how to get the best out of your time at University" because University courses are taught in spoken language so you could just ask the professors.
> We've disabled it already. Basically it was giving product tips which was kinda ok on Copilot originated PR's but then when we added the ability to have Copilot work on _any_ PR by mentioning it the behaviour became icky. Disabled product tips entirely thanks to the feedback.
I’m grateful they disabled it, but their response still feels a bit tone deaf to me.
> Disabled product tips entirely thanks to the feedback.
This sounds like they are saying “thanks for your input!”, when really it feels more like “if you didn’t go out of your way to complain, we would have left it in forever!”
Of course they would have. The squeaky wheel gets the grease. Why do you think governments spend billions upon trillions trying to get their citizens to essentially "shut up" instead of improving their conditions?
I've not seen any evidence that these were ads and not "tips".
Ads implies someone was paying for them. Promoting internal product features is not the same thing - if it was then every piece of software that shows a tip would be an ad product, and would be regulated as such.
By my understanding of the term, Netflix can most definitely advertise Netflix shows on its own platform, a flyer that a barber hangs on a public bulletin board is an advertisement, and the Oscar Mayer Weinermobile is advertising hotdogs when it drives through my town. Do you not consider these things to be advertisements?
I think this particular story is a very different scandal if it turns out GitHub were charging other companies money in exchange for having Copilot include promotions for their products in PRs as opposed to Copilot adding uncompensated usage "tips" to those PRs.
1. People using the word "advertisement" when commenting on this situation aren't necessarily saying that's what's happening, and they may find these tips/ads distasteful anyway (I know I do).
2. Even if someone isn't literally paying Microsoft to insert these tips/ads, promoting third parties which are themselves Microsoft customers still benefits Microsoft.
I could buy it if this was just being shown to the person who was using Copilot. Hey, here's a feature you might like. Seems OK. But it was put into the PR description. That gets seen by potentially many people, who are not necessarily using Copilot.
When apple puts an advert for an apple show in front of for all mankind, that's an advert.
Maybe I put up with it and it just adds to my subconscious seething, or maybe I get the episode elsewhere because if I watch on jellyfin I don't have the advert. Of course that then harms the show as my viewing isn't counted, but they've cancelled it anyway so perhaps it doesn't really matter.
If it isn't an advert, then at very least there's a button to disable it.
Season 5 is coming out now with season 6 already confirmed coming—which, granted, will be its last, but that’s not a cancellation in any sense of the word.
ads usually implied a financial incentive. But that's not always the case. Technically, if I was to praise someone's blog and link to it, that would also be an ad.
Ads tend to also imply tangential information shown to you in an undesired area. If this was some tool tip and not embedded in the PR comment, many wouldn't call it an ad.
For its size (1.2GB download) it's very impressive.
Here's a pelican it drew me running on my phone - the SVG comments are good, the image not so much: https://tools.simonwillison.net/svg-render#%3Csvg%20width%3D...
reply