As I see it, you need a model you can train quickly so you can do tuning, model ...

korkybuchek · on Jan 24, 2025

There's a reason xgboost is still king in large companies.

3eb7988a1663 · on Jan 25, 2025

That's the thing that blows my mind. Even if NN are some percentage better, the training+deployment headaches are not worth it unless you have a billion users where a 0.1% lift equates to millions of dollars.

abhgh · on Jan 24, 2025

It is pleasantly surprising to see how close your pipeline is to mine. Essentially a good representation layer - usually based on BERT - like minilm or MPNet, followed by a calibrated linear SVM. Sometimes I replace the SVM with LightGBM if I have non-language features.

If I am building a set of models for a domain, I might fine-tune the representation layer. On a per-model basis I typically just train the SVM and calibrate it. For the amount of time this whole pipeline takes (not counting the occasions when I fine-tune), it works amazingly well.