We tried all models from openai and google to get data from images and all of th...

infecto · on Oct 15, 2024

Multimodal LLMs are not up for these tasks imo. It can describe an image but its not great on tables and numbers. Now on the other hand, using something like Textract to get the text representation of the table and then feeding that into a LLM was a massive success for us.

is_true · on Oct 15, 2024

LLMs don't offer much value for our use case, almost all values are just numbers

infecto · on Oct 15, 2024

Then you should be using something like Textract or other tooling in that space. Multimodal LLMs are no replacement.

is_true · on Oct 15, 2024

We use opencv + tesseract and easyocr

thenaturalist · on Oct 15, 2024

Curious, did that make you "fall back" to more conservative OCR?

Or what else did you do to correct them?

is_true · on Oct 15, 2024

We already had an OCR solution. We were exploring models in case the information source changes