In praise of S3

ryandvm · on March 24, 2020

AWS S3 is a testament to just how successful a service can be if it is SIMPLE and RELIABLE. I recall using it in its very first incarnation in the 00's. From an API surface perspective there was barely anything to it, but it nailed being easy to use, easy to pay for, and as reliable as the sun.

mtberatwork · on March 24, 2020

I agree that S3 is fantastic, but not sure I would consider it simple considering how much sensitive data is left wide open in publicly accessible buckets all the time. Clearly, applying correct permissions seems to be quite a challenge for even experienced folks. The S3 console doesn't really make life any easier either and terminology can be confusing. Also, applying IAM/CORS policies, object headers, etc isn't exactly simple for the layperson.

alpha_squared · on March 24, 2020

Buckets are private by default and always have been. A bucket needs to be made public. Given that, I would hazard a guess that buckets are often made public (when they shouldn't be) for either testing purposes and never reverted or because creating proper access to the bucket took too much time/knowledge.

ratww · on March 24, 2020

That's most certainly the reason.

I sometimes have to support freelancers working in some of our Wordpress websites. Their first instinct when something is wrong on their end is asking me to run a chmod or chown command they found on Google on the whole directory. Not that it matters – we're using Docker.

Security seems to be secondary when the priority is to just deliver.

klodolph · on March 24, 2020

Security should be secondary. I know that sounds wrong. And that’s not an absolute, sometimes security comes first. But in general, everyone is trying to get stuff done and security gets in the way. My passwords get in the way of using my devices. My keys gets in the way of coming home.

That’s different from saying “security should be an afterthought”. Security is something you should consider consciously and prioritize against your other goals.

Putting security first is kind of weird if you think about it. Imagine building a house and prioritizing the locks.

xref · on March 25, 2020

Several billion people don’t have easy, immediate access to your house locks.

salamander014 · on March 24, 2020

> Security seems to be secondary when the priority is to just deliver.

Security is always an afterthought to those who don't actually understand it.

elithrar · on March 24, 2020

Further, sometimes users aren't truly aware of the implications of what making a bucket "public" means. Heck, maybe the original use-case wasn't too concerning, but it changed over time. The folks writing/reading from the bucket are also likely distinct from those setting up the infrastructure - although that's (thankfully) not always the case these days, it certainly can still be.

This reminds me of how React has a dangerouslySetInnerHTML[1] API - sometimes it's the hammer you need, but the name should (hopefully?) cause pause.

[1]: https://reactjs.org/docs/dom-elements.html#dangerouslysetinn...

ceejayoz · on March 24, 2020

A shotgun is pretty simple.

It'll still blow your head off.

scarface74 · on March 24, 2020

You have to jump through hoops to make buckets public and even then you get a clear label saying “public”.

alpha_squared · on March 24, 2020

The "public" label was somewhat recent (I think that happened in the last 18 months), but I think many of the biggest S3 leaks predate that labeling.

madeofpalk · on March 24, 2020

The S3 console now makes it quite apparent, in my opinion when things are left public. Public is no longer the default, and theres big scary warnings whenever you make something public.

mywittyname · on March 24, 2020

Do one thing, but do it very well.

dnautics · on March 24, 2020

It's also at it's heart distributed state management, and that is generally speaking very hard to do for anyone to do correctly.

Thaxll · on March 24, 2020

You don't distribute state mgmt in S3 alone since there is no lock mechanism.

dnautics · on March 24, 2020

Eventual consistency is a valid distributed strategy.

nine_k · on March 24, 2020

If there is but one direction of distribution, e.g. when you deploy config changes, and never replace objects, this suffices.

david-cako · on March 24, 2020

Unix philosophy for services encourages the user to model their problem rather than their architecture. Edge computing is a brewing shitstorm at companies which are not trending in this direction.

mpfundstein · on March 24, 2020

Care to elaborate? Like every sentence you wrote?:-)

lenkite · on March 24, 2020

Implementing a S3 client from scratch is no longer simple. V4 signature and payload signing is a pain and you have to read the doc a dozen times before you can successfully implement it.

dexen · on March 24, 2020

>V4 signature and payload signing is pain

Don't be overly discouraged :-)

I have a reliable (if a bit ugly) implementation in under 3kb of sparse PHP OOP code. Yes it took a few hours to iron out the last kinks, and one extra refactor to free up a mistakenly bound variable. It might have helped that it was the first version I implemented, thus no need to match impedance of older interface.

On the plus side, the signing is designed as self-contained, and neither implementation nor API calls spill to too many parts of request handling.

>you have to read the doc a dozen times before you can successfully implement it

Very much so - it does require a small 'eureka' moment. It's hard to put your finger on what exactly makes it hard to get right the first time around. At the very least it is written from a somewhat peculiar viewpoint.

Fortunately the primitives used are common enough to be available in typical standard libraries.

closeparen · on March 24, 2020

I'm constantly reminded of Rich Hickey's point about good platforms providing few but powerful verbs.

Put another way, if you look at your service's API and it's complicated, you probably don't have the abstraction right.

apitman · on March 24, 2020

I work in a bioinformatics lab that develops data visualizations for genetics. A typical file for a whole human genome is in the 50-100GB range. These files are heavily indexed and streaming-friendly, but still rather unwieldy to work with.

Our most common flow involves files stored on S3. The app will download the index, then use sampling and streaming techniques to pull from the full file. This works well because S3 supports range requests.

Where S3 falls down for us is sharing. I'm not aware of any easy way to share S3 files with email. You either have to create a signed URL (which has a max expiration), make the file public, or use IAM, which I'm pretty sure requires all users to have AWS accounts. That's a non-starter.

Google Drive is the opposite. It's better for sharing. Just drop in an email address, or share a permanent link. But it no longer supports range requests, so it's essentially walled off from the rest of the web.

What's crazy to me is that neither of these services provide a non-authenticated way to use a CLI tool to download/sync a directory tree, even if it's public. rclone is probably the best tool in this space, and it requires you to configure "remotes" ahead of time. You can't just point it at a public bucket and have it download.

I think there's still room for improvement and competition in the cloud storage space.

rsync · on March 24, 2020

"What's crazy to me is that neither of these services provide a non-authenticated way to use a CLI tool to download/sync a directory tree, even if it's public. rclone is probably the best tool in this space, and it requires you to configure "remotes" ahead of time."

It is actually possible to run 'rclone' without configuring a remote ahead of time. It's relatively recent, but if you look at this discussion thread:

https://forum.rclone.org/t/can-rclone-be-run-solely-with-com...

... and the later development that was done on this ... you will see that you can run a single 'rclone' command (albeit, a very long and complicated one) that will specify everything that was in your remote configuration.

I hope this is helpful ...

"I think there's still room for improvement and competition in the cloud storage space."

We're trying[1] ... and specifically, with rclone[2] ...

[1] https://www.rsync.net/products/universal.html

[2] https://rclone.org/

apitman · on March 24, 2020

That's definitely a step in the right direction for rclone, but I still think the protocols are the problem. Or more precisely, the lack of a protocol designed for sharing data between services. HTTP is good for retrieving, but lacks directories and a standard for uploads. WebDAV is good in theory, but too complicated and implementations aren't compatible enough. I've been playing around with my own take[0], but it's still really early. Also, I think OAuth is too complicated. That's tricky though because most of the features have been added for good reasons.

I've been keeping an eye on rsync.net for while. I really like a lot of the things you're doing. I think SSH is pretty much the best way of doing remote file management that we have today. The main problems are it's hard to share stuff and it doesn't work in browsers.

[0] https://github.com/remfs/remfs-http-spec

rsync · on March 24, 2020

"I've been keeping an eye on rsync.net for while. I really like a lot of the things you're doing."

Thank you - appreciated!

" think SSH is pretty much the best way of doing remote file management that we have today."

I will alert you to the issue of deep inefficiencies in SSH/SFTP over long WAN links. This problem has been solved by HPN-SSH[1] and is likely already built into your OpenSSH distribution, but I mention it here because it is not very well known ... we run into customers doing long, distant (hops) transfers over SSH that we sometimes need to step through the process of updating their ssh so they can negotiate HPN-SSH with us. It's also worth noting that you can configure ssh do use a "null" cipher which might be applicable with borg uploads, bit I digress ...

"The main problems are it's hard to share stuff and it doesn't work in browsers."

This is true - and by design. There will never be anonymous / non-credentialed access to rsync.net accounts and no rsync.net storage array will ever run an httpd. Our nmap results will always be one line long: TCP22/sshd.

apitman · on March 24, 2020

> I will alert you to the issue of deep inefficiencies in SSH/SFTP over long WAN links

Good to know, thanks!

> This is true - and by design. There will never be anonymous / non-credentialed access to rsync.net accounts and no rsync.net storage array will ever run an httpd. Our nmap results will always be one line long: TCP22/sshd.

Makes sense from a security standpoint. But I think there's a business opportunity for having cloud storage that allows you to host your files over HTTP. Imagine if google drive was your CMS. Just drag your static site over and it's live. Edit HTML files in-place. Set permissions based on path.

lukeschlather · on March 24, 2020

> Google Drive is the opposite. It's better for sharing. Just drop in an email address, or share a permanent link. But it no longer supports range requests, so it's essentially walled off from the rest of the web.

What's the difference between a permanent link and simply making the file public? (The email address, that's essentially requiring the person to have a Google account, so that's the same as IAM.)

apitman · on March 24, 2020

Not a huge difference if you add your own obfuscation layer, but the UI and tooling isn't built around that use case.

Does Drive require a google account? I haven't tested that. If so that's really sad. I know you can share with non-google email addresses but maybe they have to already be logged in with a google account for it to work.

khc · on March 24, 2020

https://github.com/kahing/goofys/ can detect public bucket automatically and not require prior setup. https://cloud.google.com/storage/docs/gsutil works as well (despite it's from google it works fine with s3) iirc

apitman · on March 28, 2020

Thanks, I'll check those out

sirsar · on March 24, 2020

Two ideas come to mind:

- You can create public pre-signed URLs with an unlimited expiration date using SigV2. I doubt SigV2 will stop working anytime soon.

- Or, if you want the link to provide access to anyone, why not make the object public, but put a UUID in the key name to prevent enumeration?

yongjik · on March 24, 2020

You have to be careful. By default, the bucket owner pays for outbound traffic, which can get really expensive: $0.09/GB for the first 10TB. (And there's no caching!)

apitman · on March 24, 2020

Never heard of sigv2, but looks like it's currently on its way out?

I like the UUID idea. Especially since you can always move the file under a new UUID to revoke previous access. Not sure if there's a way to generate a new link with GDrive.

nickcw · on March 24, 2020

I know the S3 API quite well through hacking on rclone and I have to say it is pretty good as object storage APIs go.

If I could add one thing to it, it would be a way of listing a bucket to get the objects and the metadata back at once - it is another HTTP request to get the metadata otherwise.

The large object handling is pretty good (unlike, lets say the OpenStack Swift API which has 2 ways of uploading large objects both of which basically leave everything hard to the caller!). Large object handling (say objects over 5GB) is the Achilles heel of all the cloud storage systems.

The myriad ways of authenticating with S3 (which have grown like topsy over the years) are a significant complication. However that is the same for all of the Enterprisey cloud storage solutions (eg Azure blob storage).

There are quite a few re-implementation of the S3 API, CEPH, Wasabi, Minio, Alibaba cloud, etc all of which are more or less 100% compatible (though the rclone integration tests have winkled out a few small differences mostly to do with handling of characters like LF and CR).

So impressive work AWS, making an API which has stood the test of time and become a de-facto industry standard.

gandreani · on March 24, 2020

Thanks for your work! I'm currently using rclone to backup my camera photos. It's lovely software

Also thanks for implementing sync to different storage providers. It's probably tempting to just implement the big players, like S3, but little ol' people like me appreciate the rest :)

nickcw · on March 24, 2020

You are welcome!

It has been a voyage implementing so many cloud providers and keeping them working. You might be surprised how much code is tests which get run against the cloud providers every night in the integration tests!

khc · on March 24, 2020

and then you have GCS implementing their own version of multi-part upload API...

A thing I wish they have is a way to get a consistent view of the object (similar to X-Newest in swift). Charge it 10x or something. Sometimes you really need this.

wallflower · on March 24, 2020

AWS S3 is a testament to TLA+ and formal methods.

https://lamport.azurewebsites.net/tla/formal-methods-amazon....

ktpsns · on March 24, 2020

As a devloper, I have never used S3, despite, of course, heard thousands of times of it. I'm kind of traditional: "Cloud" still means "just another person's hard drive", and "serverless" means "just another person's computer" for me. Having a feel for privacy, I still run a NAS at home, and it's terrible: I really appreciate the amount of work put into S3. Somehow, as a developer, I was able to avoid all that fancy cloud stuff. I prefer self-hosted, but I'm bad at finding good arguments which support my point nowadays :-/

dividuum · on March 24, 2020

> "just another person's hard drive"

In general I'm with you, but that here is the main benefit imho. I don't have the nerves to operate my own redundant, transparently growing and reliable storage system for my business. For use cases where privacy is important (like backups, private user data, etc), encrypting data before uploading limits the privacy leaks to metadata which might be acceptable.

vorpalhex · on March 24, 2020

The tools for doing this at "medium business" and below scale have gotten much easier, especially with things like ZFS.

hjanssen · on March 24, 2020

which is still orders of magnitude harder than create an account and pay a bill

Spivak · on March 24, 2020

AWS is impressive because they're running everyone's scalable reliable redundant storage server. It's not so scary when you just have to do it for your own small pond.

jnbiche · on March 24, 2020

But you have to build the same level of redundancy and reliability for your small pond if you want to achieve S3's uptime. Scaling is the hardest part, but it's just one part of the puzzle.

apitman · on March 24, 2020

Do you really need more than basic redundancy + backups? I think most small businesses would survive a few hours restoring backups every few years.

NikolaeVarius · on March 24, 2020

This assumes backups work.

rsync · on March 24, 2020

"Somehow, as a developer, I was able to avoid all that fancy cloud stuff."

I am genuinely interested in what you think about things like this:

    ssh user@rsync.net du -Ahd2 some/directory

    pg_dump -U postgres db | ssh user@rsync.net "dd of=db_dump"

or maybe even:

    ssh user@rsync.net "git clone git://github.com/freebsd/freebsd.git freebsd"

Setting aside these "rsync.net tricks", I think you should look into the 'borg'[1][2] backup tool. It has become the de facto standard for remote backups because it does everything that rsync does (efficient, changes only backups) but also produces strongly encrypted remote backup sets that only you have a key to ...

[1] https://borgbackup.readthedocs.io/en/stable/

[2] https://www.stavros.io/posts/holy-grail-backups/

dvfjsdhgfv · on March 24, 2020

S3 in itself is tangential to privacy abuses; you can (and probably should) use it to store encrypted data only, if it's the kind of data you'd prefer not to be accessed by someone else. But most people don't use it with encryption as it's less convenient.

The real argument against S3 though is its price. For any but most trivial amounts of data it's terribly expensive and I use it only as internal storage for other AWS services.

bradleyankrom · on March 24, 2020

I’ve found the overall price to be quite reasonable. We have about 10TB stored across maybe 40M objects - the biggest cost is for PUTs, but even those requests + storage keeps the bill around $150US. At the enterprise level, that’s a negligible cost. YMMV, of course.

dvfjsdhgfv · on March 24, 2020

For these kinds of amounts you could use the cheapest of Hetzner SX line and save half of that. But real savings start over 100 TB.

pnutjam · on March 24, 2020

Especially when you want to get the data out, B3 (from backblaze) has a much better cost structure.

ignoramous · on March 24, 2020

> I still run a NAS at home, and it's terrible... I prefer self-hosted...

One could self-host minio which is nice S3 replacement: https://github.com/minio/minio

K0SM0S · on March 24, 2020

Would you consider bootstrapping a small / freelancing business using this?

(Assuming an already enterprise-y prosumer IS, i.e. pfSense, VLANs, ZFS storage, 'enough' CPU/RAM/network, all patched and well in good sysadmin fashion, inet-facing resources behind some reverse proxy, etc).

klohto · on March 24, 2020

Who’s your target? You can do all of these with AWS already and much cheaper. On top of that, getting all the necessary certifications that AWS has will take you a year and huge sum of money.

K0SM0S · on March 25, 2020

In short,

The first target is internal customers for a consulting dev shop — i.e. me + n first partners / founders / employees (I'll begin alone but with a goal to grow to ~3 people asap).

So website, storage w. inet-facing endpoints, security all the way (encrypt etc), and basic IT stuff for dev: e.g. private GitLab + straightforward CI/CD pipe. I figure all developers need is an endpoint to push git, and access to the automation processes.

____

Longer elaboration, FWIW (maybe not much to answer the question but I'm a bit over my head on this topic so I'd rather share too much, to get debunked where needed).

The rationale to bootstrap as self-hosted is simply that it's already paid (gear is humming as I write) and I'm “99% sure I can guarantee 99% uptime” (the full setup is split in two homes 500km apart, each behind 1Gbps symmetrical business-grade fiber, each site being in High-Availability locally— from router/firewalls to storage passing by proxies and k8s-pods).

Re certs, well in a bootstrap scenario we'd worry about those later I suppose, when there's actual demand to justify the overhead cost (first clients are likely to be extremely low-profile, looking for a "break" too). Legally, clients own their code and infra at the end of the day, we would deliver a product/tool but not host it (not an MSP; we'd typically rather work with their MSP— or recommend one —to smooth out their operation, add critical value to key pain points).

In terms of actual workloads, I'm looking at custom domain-parametized ML/DL services (think: selling very specialized neural nets that solve one problem extremely well for this one company, e.g. classifying internal docs). So I need some in-house hardware anyway to experiment when developing models (it's painful and limiting to be forced to spend at the exploratory stage, for each project).

I figure my first partners or employees would either run their stuff locally or tap into my GPUs; we'd likely pay for the big guns on AWS or GC only for actual prod training iterations (those that earn us actual money when our work is 'good enough', final drafts so to speak).

It's a bit of a maverick project¹, hoping to leverage ML in profitable ways for small operations (obviously as we accumulate clients and experience, we'd get better at nailing areas of best-value optimization). It will be a rocky road during the early phase (first two years at least). The costs can skyrocket pretty fast with ML on clouds, I'm extremely wary of such fuckups, not to mention running the ongoing risk of being cancelled or shutdown (account terminated or frozen) for whatever reasons — so better have "plan B"... which historically might have begun as plan A, i.e. self-hosting.

I've also got a long-term angle of building expertise, so I've a positive bias for on-prems: if this thing works and grows enough, we'll definitely look at running the operation in-house, because clouds cost proposition for ML is too steep to scale profitably if it's your core business, or so I hear (and before one says "but Spotify is on whatever cloud", yeah, that's not their core business, they don't sell AI but music; we're the guys who sell AI to Spotify ideally). So, in ML for now, gotta get'em GPUs and TPUs to make a mark. Thus I'd rather learn all of it (myself maybe, as a company surely) sooner rather than later. It's gonna come down to this anyway, if this maverick thing works.

1: "maverick" as of 2020, but I suspect there will be a whole market of such consulting / ML shops by 2030.

jacurtis · on March 24, 2020

I can see your point if you think about S3 as a replacement to an at-home NAS. For that comparison, S3 is not at all competitive. Price alone is $23/Tb/mo, if you are replacing 4Tb on a NAS, you would be paying ~$100/mo before you even consider egress fees (paying to download). It is less convenient than a NAS, and you have the potential privacy concerns. I total agree that S3 is a terrible NAS replacement.

However, S3 isn't trying to replace your NAS. S3 offers many use cases where it is the smartest and most viable solution. The most common use case is as centralized block storage for an online web app, that runs on load balancers, a CDN, and distributed servers. If you run an app that is going to run on multiple servers and get load balanced between them, then uploading directly to the servers is complicated. Since a user could upload something to server A one time, and a month later they could request the file while visiting on server B, C, D, E, etc. This centralized S3 platform simplifies the block storage for these large apps.

S3 does offer more advanced features as well that many people aren't educated on. For example, S3 Lifecycle Management allows you to set up timelines so that your file uploads are available on S3 for 30 days (when users are most likely to access them), then automatically move them to reduced redundancy (reduces cost, but at lower 99.9% SLA) when the file is 30-120 days old, and after that it moves to Glacier storage (long term, taped storage, super cheap), for long term storage when it is unlikely to be retrieved but needs to be kept. You can also set up a CDN just for this block storage, with over 200 edge locations all around the world.

S3 also offers HIPAA and PCI compliance. There is also incredibly robust user management (Through AWS IAM). Plus you get 11x9's of availability (99.999999999% availability).

Im not trying to sell you on S3. It definetely isn't for everyone. There are times when I would tell people that it isn't a good fit. But if you work on enterprise apps, it gives you a lot of enterprise level features for month-to-month, (relatively) straightforward pricing, no long term commitements, etc. Which is something that doesn't generally exist in the world of enterprise software or PAAS (which is what S3 really is). So for that, S3 is a breakthrough product.

S3's biggest flaw is its price. If you are using all the enterprise features than it is arguably worth it, especially considering the 11 nines of availability. But for anyone on a non-enterprise app that doesn't need that stuff, there are far cheaper solutions out there for you. Blackblaze B2, Wasabi, Digital Ocean Spaces are three that come to mind.

jscholes · on March 24, 2020

> Blackblaze B2, Wasabi, Digital Ocean Spaces are three that come to mind.

Seconded on the last two in this list. I use both, for completely personal storage needs. I was lucky enough to get Wasabi's older, cheaper pricing model for the lifetime of my account, but even so it's a hugely attractive option for storing large amounts of data without worrying about local equipment and all the rest of it.

ignoramous · on March 24, 2020

> I was lucky enough to get Wasabi's older, cheaper pricing model for the lifetime of my account

You'd like https://scaleway.com free-tier of 75GB object storage / transfers per month.

jscholes · on March 24, 2020

Thanks for the heads-up, looks great and is S3-compatible too.

apitman · on March 24, 2020

And I'll second the first. B2 has been a great experience for me.

matwood · on March 24, 2020

Same. I use B2 as a backup location for Arq, and it's inexpensive and has worked great.

sfg75 · on March 24, 2020

> Plus you get 11x9's of availability (99.999999999% availability).

Note that those 9s are for durability

From https://aws.amazon.com/s3/faqs/#How_durable_is_Amazon_S3:

"This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years"

vorpalhex · on March 24, 2020

For what it's worth, you can get "S3 compatible" solutions for your local/colo'd hardware through Ceph or Minio - and I use S3 on my homeserver because it's such a robust and well supported protocol in a lot of apps that don't support ssh/syncthing.

tmh88j · on March 24, 2020

>I'm kind of traditional: "Cloud" still means "just another person's hard drive", and "serverless" means "just another person's computer" for me.

I agree for the most part on cloud, but isn't "another person's computer" just a server? Serverless is more like a service running on a server that requires zero configuration or maintenance for you, just add your code and it works. Typically that means no messing with apache, nginx, no language nor framework setup/configuration, etc...

MonaroVXR · on March 24, 2020

> Serverless is more like a service running on a server that requires zero configuration or maintenance for you, just add your code and it works. Typically that means no messing with apache, nginx, no language nor framework setup/configuration, etc...

Whoa, I thought it was: Use a server, use a CDN, shutdown the server and its running serverless. Because everything is getting hitting the CDN and not the actual server.

tmh88j · on March 24, 2020

I'm not sure what purpose of the server is at the start in that example, but sure, that's a common serverless architecture. S3 has an option to host static websites. You can upload HTML/JS to an S3 bucket and it will serve the content. Unless the site is truly static, the JS will be making requests to other servers/services/APIs.

Obviously it's on a server, but the "serverless" part comes from not having to provision and manage a server. You can put your code a a service and allow those services (S3, Lambda, API Gateway, etc...) to do the "server" work for you.

Serverless is kind of a generic term. I guess hosting sites like GoDaddy that allow you drag and drop some PHP, HTML and JS files into a CPanel type interface are technically serverless, since you're not managing a server. Put some code up there and let their platform run it for you.

MonaroVXR · on March 24, 2020

I genuinely don't understand this. You can use Jenkins with Kubernetes or Openshift to do this right?

tmh88j · on March 25, 2020

Serverless is just a buzzword for a service that you'd typically have to operate your own server for, but the service completely removes the need for that so you can focus solely on that (usually) one task. It also goes hand in hand with microservices, you'll see those two used together a lot because microserves are often built using serverless services. You can make it as complex or as simple as you like. Your entire application doesn't have to be serverless, you can have even just a single small piece of it running serverless.

For example, going back to a serverless frontend using S3 that I mentioned before. You can host a JavaScript SPA on S3 and use a service like Route 53 for DNS. Your JavaScript can then make API calls to a full-on server that you manage, say an EC2 instance with a LAMP stack or even Azure with C#/.NET. Your front end is serverless because it's just hosted on S3 and you're not maintaining a server, but your backend code would be deployed and served however you like, such as Jenkins and Kubernetes like you mentioned.

AWS lambda is a good intro to the world of serverless computing if you're not familiar with it. While you can write code on your computer and use the AWS CLI to upload it, the AWS console even has a text editor that resembles JS Fiddle. It can receive events from other AWS services and you can write JavaScript, Python, Java, Ruby, C#, Go and PowerShell to process those events and do whatever you like, such as writing a DB record or adding something to an SQS queue. The point is you're writing real production code that does not live on a server you manage.

jbverschoor · on March 24, 2020

Simply moving that towards the developer who must adhere to the available "serverless" servers + configurationscreens. J2EE re-invented.

tmh88j · on March 24, 2020

I don't understand, are you disagreeing with what I've said or just saying we've had the concept of serverless for a while?

viridian · on March 24, 2020

I don't see how you equate serverless with another person's computer. Cloud computing is another person's computer, sure, but there's not really a local equivalent for something like AWS Lambda as far as I know. Having an API call or trigger that pops a machine into existence, runs the code, and terminates the machine in a couple hundred milliseconds isn't something I've ever really seen used outside of clouds environments.

leafmeal · on March 24, 2020

Eventual consistency is my biggest annoyance from S3.

dividuum · on March 24, 2020

It seems google is better at that: https://cloud.google.com/storage/docs/consistency (when compared to https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction...)

alex_young · on March 24, 2020

This appears to be an advertisement.

Rafuino · on March 24, 2020

It sure does seem to be one. Here's the author's sign-off at the bottom:

Forrest Brazeal is an AWS Serverless Hero and enterprise architect who has led cloud adoption initiatives for companies ranging from startups to the Fortune 50.

When I go to the AWS Heroes page, here's how a "Serverless Hero" is defined:

AWS Serverless Heroes are spirited pioneers of the AWS serverless ecosystem. They evangelize AWS serverless technologies online, in person, and via open source contributions to GitHub and the AWS Serverless Application Repository. These Serverless Heroes help evolve the way developers, companies, and the community at large build modern applications.

While the author of the blog doesn't work directly for AWS, it's in his interest to drive more business for them as he consults for businesses bringing more of their needs to AWS.

tehjoker · on March 24, 2020

This is the kind of article you get when people are motivated by money and not by an objective desire to compare and contrast different approaches to a problem.

hn2017 · on March 24, 2020

One recent headache I had, I had to download JSON files loaded since yesterday's (about 15k) using python. The issue is that the bucket had 20+ million files. The naming convention (folders) wasn't perfect unfortunately. There wasn't a way to do this filtering on the server side. I had to download the file list with the metadata and then make calls to get each file individually based on metadata date.

Not a smooth process.

kjgkjhfkjf · on March 24, 2020

Feature-wise, GCS is better. For example, it's much easier to set up a multi-region bucket with GCS.

apitman · on March 24, 2020

Something I've wondered about for a while: does GCS allow 3rd-party access via OAuth? One of my biggest annoyances with S3 is you have to automate generating signed URLs in order to allow apps to access files.

merb · on March 24, 2020

it dependes. you can invite somebody to our gcloud project and give him only access for a specific bucket, which means he can access the content via oauth. unfortunatly it does not work in all cases i.e. some hyperlinks do not work reliably when you create static pages with relative links, because after authentication it rewrites the uri's.

dvfjsdhgfv · on March 24, 2020

Greatest? The most expensive, for sure.

votepaunchy · on March 24, 2020

The price has dropped from $150/TB-month at inception to $1/TB-month with Deep Archive. Now competitive with RAID storage.

Hamuko · on March 24, 2020

Deep Archive is really cheap for storage, but you better pray you never need to restore your files.

dvfjsdhgfv · on March 24, 2020

I'm sorry, but there is no such thing as fixed per-TB price in S3.

byteshock · on March 24, 2020

If you want an S3 service with lower pricing than AWS there are providers like Wasabi. They only charge for space used and not traffic.

I believe Wasabi currently charges $5.99 per tb a month which is pretty reasonable imo.

dvfjsdhgfv · on March 25, 2020

That's way too expensive for my needs. I use an 10x10 TB server from Hetzner for €200/month. Unlimited traffic is included if you use the standard 1 Gbps link.

dordoka · on March 24, 2020

Could you please elaborate? Storage price is fixed. Other costs are requests and transfers and those depend on your usage.

dvfjsdhgfv · on March 25, 2020

> Other costs are requests and transfers and those depend on your usage.

That's my point. You can't actually state the price per TB is N, you can just estimate it and hope it doesn't exceed the estimate. For some companies and scenarios it doesn't matter, for others it's significant enough.

jacknews · on March 24, 2020

At best, 'so far'

I don't use it, so can't comment on whether it's even the best at the moment, let alone 'all time'.

_asummers · on March 24, 2020

"All time" colloquially never implies that it will be the best forever. Being all time at a specific point puts you in the running, but when talking about bands, sport teams, tech, whatever, does not discount the possibility of improvement in the future.