Submitted by cm_34978 t3_100rbhp in MachineLearning
30katz t1_j2lilwg wrote
Reply to comment by low_effort_shit-post in [D] Data cleaning techniques for PDF documents with semantically meaningful parts by cm_34978
Our company is stuck with PDF’s but it’s actually not too hard to work with using Amazon’s textract or Adobe Extract API. But maybe that’s a sign that it is hard because the technology is owned by the two biggest tech giants in the space.
Viewing a single comment thread. View all comments