I see you basically recommend bypassing rate limits by using proxies etc? Why no...

uwuemu · on Aug 23, 2022

Because everyone "being nice" is how Google keeps its monopoly on search. Googlebot can do anything and everything and no one complains. Or how about sites like Twitter and Instagram that live off selling their user's data having extreme limitations on their public apis and aggressively blocking alternative frontends like nitter or bibliogram because OF COURSE god forbid someone could want to look up something on their platform and not have an account.

The typical response to people raising these issues is "buuuut xy is a private platform that can do what it wants", yes, but why are you defending technocrats with bigger profits than many nation state's GDPs? (Reasonable) crawling should be allowed and promoted, in fact, it should be codified in law as a necessary element for the future of open and free internet. Anyone trying to prevent it, or even worse, make it illegal, is a bad actor.

mnmkng · on Aug 23, 2022

I get your point and I don't have an objective answer to it. We believe that internet is an open medium and there's immense value for humankind waiting to be discovered and unlocked in all its data. After all, many of the big tech companies in the world utilize web scraping heavily.

Rate limits can be applied for different reasons. If they protect the website from being overloaded, they are good in our opinion. If they protect it from competition, research or building new non-competitive, but valuable products that are not harmful to the original website, they are not ideal.

We leave that to the user to decide the ethics of their project and just provide the tools.