Submitted by kkoolook t3_101ej0g in todayilearned
DrifterInKorea t1_j2n60xb wrote
It should be more than 50% by now and it will keep growing for multiple reasons :
- more SaaS and more servers talking to apis to get or post data.
- AIs and other statistical tools require lots and lots of data from the web, hence more crawling.
- There is more and more motivations for crawling the web and we go up in layers which means web pages will be crawled way more to present data in a different format.
- AIs are most likely going to start building tools for us (and for itself) and those tools will require way more data (a lot more) than what they use today.
Reason 3 & 4 are basically extensions of reasons 1 & 2.
Anopanda t1_j2psm0a wrote
The 2022 report tells its 27%
DrifterInKorea t1_j2px16p wrote
If true it means there is something wrong with their detection.
It makes no sense to have less bots crawling the web when automation is getting bigger and bigger in every field.
Also when you see social medias' bot generated contents explosion during the pandemic and not really slowing down it is going in the opposite direction.
UnknownQTY t1_j2qni13 wrote
Yeah and that what makes using the term “bot” a bit of a misleading term here. When the average person hears “bot” online, they equate this with an account pretending to be human.
The technical definition of bot used here is basically any automated process, most of which don’t even interact with real users, other than passive consumption of user data.
That’s not really a bot to most people.
DrifterInKorea t1_j2qotpn wrote
Yes and it's hard to detect true bots (I mean automated processes that do not just follow links like wget) because even a simple curl call can spoof its signature and become "human" from an external observer.
So its both ways :
- on one side you have users that may interact with tools that will cause the traffic to be labelled as "bot".
- on the other side even simple scripts (bots) can alter their behavior to make it look like they are humans (added noise and delays to mouse cursor position, randomizing ips, using various user agents, etc...) and be labelled "human".
Viewing a single comment thread. View all comments