pull down to refresh
0 sats \ 0 replies \ @optimism 4 Oct \ parent \ on: Unsexy AI Failures: The PDF That Broke ChatGPT AI
MinerU has a function to identify headers and footers, which it uses to analyze what to extract - it's pretty neat as it saves a pdf with markings what is what. It would still not do what you want without additional code, but perhaps this feature can be used to detect format changes. Like make a decision based on the pattern of layout structure and text patterns?