I don't understand the benefit of that. Could you explain please? It sounds to me like doing what you describe would prevent all the wins of serverless - scaling and only paying for the runtime you use. In what you describe, you get zero auto-scaling unless you implement it yourself, and you pay more than just lambda or just a http server. It sounds like adding another layer of complexity on top for no benefit
I think the idea is to run a regular http server locally that forwards requests to your lambda handler function for the purpose of testing. When you actually deploy your lambda, you just hook up the lambda handler function and leave the http server out of the picture.
I don't see how this would be useful for people other than Google/Amazon/Microsoft because, to my uneducated self, it seems like their problem is driven by this:
> Over the past decade of research and experimentation in memory overcommit, we observed a distinct trend across millions of servers and clients: the size of page cache has been decreasing because of the growing popularity of cloud storage. Nowadays anon pages account for more than 90% of our memory consumption and page cache contains mostly executable pages.
The implication of `growing popularity of cloud storage` is that today the user's data files are sitting in a remote cloud storage and not on local file system. This means a typical most-used application on user's Android/ChromeOS device (say a web browser, or a streaming app) has very little local file storage usage and hence very little page cache usage. Bulk of the memory is used by non-page-cache memory – that's anon memory. Based on this mix shift in memory use, this patchset is enhancing the swap system to evict anon pages better. It is useful for end-user linux devices like phones and laptops.
Well, you are just confirming parent's affirmation. Not everybody uses the cloud. A database like PostgreSQL very much has local file storage usage and hence very high page cache usage. Or am I missing something ?
To me the TFA I've read says they've tested what is convenient for them: cloud compute node, chromebook (which are not desktop in the traditional sens) and phone. Which is fine since it suits their needs.
But I maintain it needs more independant testing, on good-old-not-cloud-server, especially databases (which are not client-side devices). It may very well be that it is positive for that workoad too. Or not. That is all I'm saying.
Yes. Bypassing the kernel cache is a major optimization used by virtually all databases that are serious about prioritizing performance. It won't add anything to the top end of database engine performance.
However, there are plenty of databases that do not prioritize performance which could benefit, including most open source ones.
I'm one of the Google engineers who work on multigenerational LRU.
The kernel page cache doesn't know better than databases themselves when it comes to what to cache. All high performance databases do this in user space, and they use AIO with direct IO or the latest io_uring to bypass the kernel page cache. Modern cloud (distributed) file systems do the same, including GFS: https://en.wikipedia.org/wiki/Google_File_System
And speaking of io_uring, check this out to see how much improvement when copying without going through page cache: https://wheybags.com/blog/wcp.html
FWIW, the Use Cases section presents data about the impact in mobile and laptop environments:
On Android, our most advanced simulation that generates memory
pressure from realistic user behavior shows 18% fewer low-memory
kills, which in turn reduces cold starts by 16%.
...
On Chrome OS, our field telemetry reports 96% fewer low-memory tab
discards and 59% fewer OOM kills from fully-utilized devices and no UX regressions from underutilized devices.
Stavros form TileDB here. Here is a more verbose explanation. Up until before 2.0, TileDB was already powerful for the main applications we targeted at: geospatial and genomics. The support for both dense and sparse arrays and the way it handles data versioning made it quite unique vs. HDF5 and Zarr. But we noticed that most of the data scientists we were working with had a lot of data beyond genomic variants, LiDAR points and rasters. They had tons of dataframes. And they were using at least two storage engines, TileDB for arrays, and Parquet or a relational database for dataframes. If you are in a large organization, this a big pain.
In TileDB 2.0 we made a huge refactoring to support something seemingly simple: dimensions in sparse arrays that can have different types and that could even be strings. This allowed us to model any dataframe as a sparse array, effectively making TileDB act as a primary multi-dimensional index. In relational databases, this means that your data is sorted in an order on disk that favors your multi-column slicing enormously, so range search becomes rapid.
Therefore, what we are telling the community with this release is that you can have dense arrays, sparse arrays, and dataframes in a single embeddable library being integrated with pretty much every data science tool out there, so that data scientists never have to worry about backends, files, updates, or anything other than their scientific analysis. In other words, we believe the future of data science is more science.
Rust documentation uses markdown and it is amazing. Not sure what this person is on about. Everyone also knows Markdown very well. Also, if you are trying to color certain parts of your documentation blue, I think you're focusing on the wrong things.... Documentation should be about example code, which is perfectly fine in Markdown, you simply go:
```code
<example>
```
Also I don't agree with the author's points about links in Markdown whatsoever.