Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As I see it, you need a model you can train quickly so you can do tuning, model selection, and all that.

I have a BERT + SVM + Logistic Regression (for calibration) model that can train 20 models for automatic model selection and calibration in about 3 minutes. I feel like I understand the behavior of it really well.

I've tried fine tuning a BERT for the same task and the shortest model builds take 30 minutes, the training curves make no sense (back in the day I used to be able to train networks with early stopping and get a good one every time) and if I look at arXiv papers it is rare for anyone to have a model selection process with any discipline at all, mainly people use a recipe that sorta-kinda seemed to work in some other paper. People scoff at you if you ask the engineering-oriented question "What training procedure can I use to get a good model consistently?"

Because of that I like classical ML.



There's a reason xgboost is still king in large companies.


That's the thing that blows my mind. Even if NN are some percentage better, the training+deployment headaches are not worth it unless you have a billion users where a 0.1% lift equates to millions of dollars.


It is pleasantly surprising to see how close your pipeline is to mine. Essentially a good representation layer - usually based on BERT - like minilm or MPNet, followed by a calibrated linear SVM. Sometimes I replace the SVM with LightGBM if I have non-language features.

If I am building a set of models for a domain, I might fine-tune the representation layer. On a per-model basis I typically just train the SVM and calibrate it. For the amount of time this whole pipeline takes (not counting the occasions when I fine-tune), it works amazingly well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: