There's a lot more to protein sequences than legos. I think the argument is that you don't need to train a model on fundamental organic chemistry/biochemistry, electrostatic protein interaction, hydrogen bonding, hydrophobic interaction, quantum mechanics, etc... in order for it to accurately predict protein sequences.
The data that AlphaFold was trained on included all that information and more. The database they used for training included software simulations (and real world data) that accounted for atomic (quantum) interactions. The 3D structure of proteins includes all the quantum interactions.
More generally, AI models (aka very large function graphs) are trained on tuples that represent mappings of inputs to outputs (input -> output). The idea then is that whatever structure exists in those pairs/tuples/mappings is discovered by the training process with the help of gradient descent which tunes the parameters of the model/graph to optimally compress the information contained in the data. This means the model must uncover the quantum effects (or some close proxy of it) and then encode them into the parameters in a way that makes compression/prediction possible [1].
None of this is magic, compressing data requires uncovering structures and symmetries that can be used to reduce the size of the data and it turns out gradient descent with lots of parameters manages to do that for a large class of problems albeit at a very steep computational cost that requires billions of dollars for hardware and software (including nuclear power plants [2]). We are not going to get AGI with this approach but fortunately I know how to make it happen for a mere $80B.
It's been 80 years and we still haven't figured out how to reliably perform loading doses (when someone is just starting out on warfarin) and to maintain stable INR (anticoagulation effectiveness, essentially) in people who take it daily. Lots of work has been done on it, but it is petering out due to the recent introduction of new anticoagulation drugs that do not require regular monitoring.
I happen to take warfarin daily, happy to answer any questions!
Interesting way of describing/thinking about hazard or risk analysis which is applied in many industries through ISO standard frameworks such as ISO 14971 for medical devices (but is also used elsewhere). Risk analysis complements requirements analysis in that risk mitigation plans become requirements of the system (if the risks meet some threshold).
I came here to note the same thing, from an aerospace perspective.
In a formal development following something like ARP4754A even before one works on the requirements that a system has to meet, the high level system functions are considered and a Functional Hazard Assessment is done to look at the criticality of those functions failing. Then one can add requirements and architectural mitigations as the system and Safety Assessment is developed.
ZeroTier is cool. With it on, I can just grab a file / test something on my laptop without needing to expose it publicly and either setup dyndns or grab my laptop's assigned IP address. I can just keep one bookmark for each service.
As a bonus, it'll route directly, instead of through an Open on server, etc.
I just wish they had an alternative to curl | bash. Something like Docker's install instructions, where you don't have to look through the install script to figure out what's going on inside sudo.
The curl | bash just figures out your distro, adds the appropriate setup to your package manager, and installs a package. You can do it all by hand if you want.
Have you thought about hiring someone remote in the same or different timezone to be on-call for outages? I'm sure there are many people around that would be able to help with this. You could hire someone on a retainer who can be on-call via PagerDuty or something.
23andMe only tests for known relatively common mutations (single nucleotide polymorphisms, or SNPs) that have been identified by non-profit large scale sequencing projects done in the past (basically public data now, see dbSNP for instance). You will almost certainly have mutations that are not covered by their panel, so it is not a matter of comparing your 23andMe results to the "baseline" (or "reference genome"). Their panel costs in the hundreds of dollars to run since it is so focused (at least 10x less than a full "exome" test, which would look at the full sequence of all known genes).
I hadn’t come across papermill before - looks quite full featured. Seems to be oriented a bit towards “scripting Notebooks” with the ability to generate output notebook files with populated output cells. I think it must have to emulate some of the logic of the Notebook ui to do that.
NotebookScripter is much more minimal and doesn’t support generating/updating .ipynb files. It has just one api allowing one to treat a notebook file as a function that can be called from another python program.
I’m not entirely sure whether papermill can run the target notebook in the calling process — it looks like it spins up a Jupyter kernel and communicates with it via message passing. NotebookScripter creates an ipython interpreter context in process and execs the notebook code in process.