Viewing a single comment thread. View all comments

croninsiglos t1_iwgrpq8 wrote

I’m only referring to the latter.

If the real numbers have a bias then the dataset can, will, and should have a bias. It’s expected, but you have to watch for issues with it in validation. If there are then you can augment training but the raw dataset itself is never going to be balanced.

−4

petgreg t1_iwgw5h4 wrote

You are misunderstanding the word bias here.

We are not saying "more (insert race) get trafficked, so that's what the data shows, but Liberals are all like "oh no, we can't say that"

We're talking about the prevalence of intentional and unintentional data manipulation as people collect data, such as seeing what they want to see, selecting subjects based on their approachability rather than true random, selecting subjects based on ease of collating data (maybe all in one place, or all who have access to Internet, etc.)

That is something we'd want ai to correct for. It is not representative of the population (well, not perfectly).

7

croninsiglos t1_iwgwlpk wrote

I’m not misunderstanding the word bias, just to be clear. I do this for a living.

−1

asbruckman OP t1_iwhvnl6 wrote

I think we're on the same page here. The point is that a lot of folks aren't doing the needed corrections you are recommending.

5