Viewing a single comment thread. View all comments

VacuousWaffle t1_j2m3shr wrote

I remember at a hospital job I was asked to mine text from PDFs that were generated internally after another team spent a few months trying to build a solution themselves. The source data they were trying to mine was already in the data warehouse in a surprisingly well-formatted table.

2

low_effort_shit-post t1_j2mpzz5 wrote

We get pdf feeds all the time with promises that have financial implications to get a proper data feed. Usually we kick the can down the road and when we get the feed just pull it in and it moves along out usual etl process within a day. PDFs are to be ignored

3