pull down to refresh

Here is a pretty nice smattering of very basic tasks on which AI assistants do a bad job.
Now, I'd say many of these are instances where the users were treating it like SciFi AI rather than...what? That's a good question.
Forgetting any expectations or promises of flying cars, how should we treat the tool we have today that is commonly called AI?
For instance, nobody started having relationships with their operating system or acting shocked when it produced a blue screen of death -- it was just annoying and the expectation was that it would get fixed. Same for browsers: if it fails to render a site or got stuck, you refresh or terminate the browser and try again. If the problem persists, you stop using the browser and try another.
Seems like with these AI tools, the expectation is more magical. If it doesn't work it's not a bug, it's a mystery. This is silly.
  1. For my research, I recently had to take long PDFs that contained multiple documents smushed together into one PDF file (mostly letters and reports), and find the document boundaries. All the AI tools I tried did a pretty bad job at that, but it's something a human could have done easily.
  2. Check out AI's attempts to draw ascii art: #1031420
reply
111 sats \ 2 replies \ @kepford 3 Oct
It might be worth looking at something like Docling. We built a proof of concept AI chatbot a while back that used this OCR tool to pull the text out of the PDFs we have. Converting to plain text first is going to give much better results in the chatbot.
reply
Interesting. I actually did try passing it through tesseract OCR first. Tesseract did a fine job, but the chatbot still couldn't do a good job. One issue was that tesseract pulls out all text from the PDF, including page headers and footers that should not be treated as actual content.
reply
MinerU has a function to identify headers and footers, which it uses to analyze what to extract - it's pretty neat as it saves a pdf with markings what is what. It would still not do what you want without additional code, but perhaps this feature can be used to detect format changes. Like make a decision based on the pattern of layout structure and text patterns?
reply