On the grand scale of things, I don't have that many years of work experience under my belt. What do you think would be the optimal evolution of a company from day 1 would be in this context? Obviously it's all speculation, but I would be interested to know how you think the message queue on database idea would go if implemented from the start.
If you do only one thing then following the "functional core, imperative shell" idea of application development will set you up nicely for easy refactors in the future.
In short. Keep mutations to the edges of your application and keep your business logic purely functional.
Day 1 may be too soon to introduce queues etc, but having a clean separation between application logic and attached resources (like your storage layer) will greatly help with maintainability, testability and extensibility from the get go
I've been mostly functional throughout my career, so that method would have appealed to me before. The idea is to never mutate but always create a copy of data to prevent race conditions. Fundamentally though, you are mutating data. We have to accept this. Creating copies at each iteration does not change this. The question then is what abstraction allows mutation without allowing data races.
Recently I was looking into how I would implement a simple TCP protocol in C++. I think Boost's ASIO (asynchronous IO) provides the correct abstraction for this. It allows you to write your code as if it were single-threaded. Code which needs to be serialised can run on the same "strand". Strands simply serialise code, even if the code runs on separate threads.
I think this abstraction would work for the example in your link as well.
> What do you think would be the optimal evolution of a company from day 1
optimal solution is which ever one that gets the product shipped and making profit. All else can come after. Because if you spend too much time planning for scale, even before you reach any scale (unless you specifically have pre-knowledge of such), you would not be successful.
Just one data point, that might be of use to know:
We're currently running into an issue where we've used a table as a message queue in a database.
We gained a compliance requirement that we must log all access (including direct DB queries) to certain info in the database. One of the tables being used as a queue contains this data and is polled hundreds of times per minute making our logs ridiculously huge and we now have to move the queue somewhere else and/or split up the data.
TBH, that's the most obvious solution for eg. task management. I am sure majority of applications started at least 10 years ago started off their task management using the app database for queue management, before the proliferation of messaging queues.
The couple webapps I worked on moved to a rabbitmq-based solution relatively painlessly (though old BaseTasks tend to linger for a long while still).
Depends what you need the message queue for. I think it can be a sensible choice, it depends how much of the rest of the system you need to start building. If you're then working out retries, etc. start looking to see if there's a prebuilt solution.
I find a few things I tend to look for in solutions:
* Are other people using it for the same sort of scale of problem I have? This cuts both ways - some projects are complicated and work well for people who have much harder scaling issues than I may have, and some projects are too simple to deal with scale issues I have. Some backend systems may need to process billions of things, others sit and take 20-30 a week.
* Are there plenty of people using it? More people = more chance bugs are known about with workarounds / already fixed, more chance other people have hit the same problems I will.
* Have people been using it for a while now? Unless there's a good reason, a project a year or two old that's stable and solves your business requirements can be far better than a shiny new thing. Let other people hit the issues and upgrade next year.
* Can I debug it? This is what actually came to mind reading your comment. A database is nice when you can keep a full history of what's been going on.
* What exists where I have to build the absolute minimum amount of stuff? A nice example here is python-rq. For some of my problems I can easily set it up, it does what I need and probably more and happily chugs along. I need to build next to nothing to use it. Also, whatever you pick first likely reveals another unspecified business requirement and finding that out ASAP is important.
* Is there some industry standard? Company standard?
Don't be afraid to adjust the way you were planning to build your project to fit better with some industry standard approach. If you want to use package X (say, luigi or airflow for data processing) what philosophy does it go by? Can you tweak the way you're working to fit in with that? There are often good reasons for certain choices, and even when it's less clear it'll at least avoid you fighting your systems so much.
Not too precise I'm afraid as it all comes under "it depends" but picking something:
Job queues / processing, try for example
* Amazon batch
* Luigi (+aws batch potentially)
* Python RQ (with redis)
* Just files on S3 (an underutilised option imo)
* SQLite or whatever database you're using now if you need something more custom