Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What do you use to clean text data for ML/DL?
1 point by k4ch0w on Feb 13, 2020 | hide | past | favorite | 1 comment


I guess you mean preparing CSV files for another format, in order to load it in some ML code?

Vim. It can easily handle large multi gigabyte text files.

For batching: head, tail, awk, grep -- the good old command line gems. They have hardly been beaten in speed.

If you mean "clean" in terms of some standarization (thinking of natural language recognition), I hardly can imagine there is a single tool which covers all use cases...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: