MinerU has a function to identify headers and footers, which it uses to analyze what to extract - it's pretty neat as it saves a pdf with markings what is what. It would still not do what you want without additional code, but perhaps this feature can be used to detect format changes. Like make a decision based on the pattern of layout structure and text patterns?

optimism

Interesting.  I actually did try passing it through tesseract OCR first.  Tesseract did a fine job, but the chatbot still couldn't do a good job.  One issue was that tesseract pulls out all text from the PDF, including page headers and footers that should not be treated as actual content.

Unsexy AI Failures: The PDF That Broke ChatGPT

Scoresby

Interesting.  I actually did try passing it through tesseract OCR first.  Tesseract did a fine job, but the chatbot still couldn't do a good job.  One issue was that tesseract pulls out all text from the PDF, including page headers and footers that should not be treated as actual content.