Viewing a single comment thread. View all comments

30katz t1_j2lilwg wrote on January 2, 2023 at 5:15 AM

Reply to comment by low_effort_shit-post in [D] Data cleaning techniques for PDF documents with semantically meaningful parts by cm_34978

Our company is stuck with PDF’s but it’s actually not too hard to work with using Amazon’s textract or Adobe Extract API. But maybe that’s a sign that it is hard because the technology is owned by the two biggest tech giants in the space.