croninsiglos t1_iwgrpq8 wrote on November 15, 2022 at 3:06 PM

Reply to comment by asbruckman in Using AI to combat human trafficking raises ethical concerns including bias endemic in datasets, privacy risks stemming from data collection and reporting, and issues concerning potential misuse by asbruckman

I’m only referring to the latter.

If the real numbers have a bias then the dataset can, will, and should have a bias. It’s expected, but you have to watch for issues with it in validation. If there are then you can augment training but the raw dataset itself is never going to be balanced.

petgreg t1_iwgw5h4 wrote on November 15, 2022 at 3:37 PM

You are misunderstanding the word bias here.

We are not saying "more (insert race) get trafficked, so that's what the data shows, but Liberals are all like "oh no, we can't say that"

We're talking about the prevalence of intentional and unintentional data manipulation as people collect data, such as seeing what they want to see, selecting subjects based on their approachability rather than true random, selecting subjects based on ease of collating data (maybe all in one place, or all who have access to Internet, etc.)

That is something we'd want ai to correct for. It is not representative of the population (well, not perfectly).

croninsiglos t1_iwgwlpk wrote on November 15, 2022 at 3:40 PM

I’m not misunderstanding the word bias, just to be clear. I do this for a living.

asbruckman OP t1_iwhvnl6 wrote on November 15, 2022 at 7:27 PM

I think we're on the same page here. The point is that a lot of folks aren't doing the needed corrections you are recommending.