dmart89 OP t1_j9po770 wrote on February 23, 2023 at 6:04 PM

Reply to comment by KPTN25 in [D] Python library to collect structured datasets across the internet by dmart89

Just a library not a commercial tool. Anyone using it would be scraping themselves, not via a 3rd party service or something.

dmart89 OP t1_j9oxk6s wrote on February 23, 2023 at 3:17 PM

Reply to comment by Sal-Hardin in [D] Python library to collect structured datasets across the internet by dmart89

Probably keeping it simple to start with and just use filters during the crawl.

dmart89 OP t1_j9olr3r wrote on February 23, 2023 at 1:50 PM

Reply to comment by ch9ki7 in [D] Python library to collect structured datasets across the internet by dmart89

Possibly, yes, I would need to check. I recently built parsing services for tiktok, and it was super annoying to deal with.

dmart89 OP t1_j9olf7e wrote on February 23, 2023 at 1:48 PM

Reply to comment by step21 in [D] Python library to collect structured datasets across the internet by dmart89

There was a court ruling a year or two ago that concluded that scraping public linkedin profiles is legal :) LN obviously still doesn't want you to scrape their data, so building scrapers for it is extra tedious because you need to navigate their blocking.

dmart89 OP t1_j9nkm2u wrote on February 23, 2023 at 6:28 AM

Reply to comment by noxiousmomentum in [D] Python library to collect structured datasets across the internet by dmart89

Fair. Thanks for your thoughts. I personally find constructing scrapers and parsing data annoyingly tedious, but it's probably just me (:

dmart89 t1_j98eltr wrote on February 20, 2023 at 1:13 AM

Reply to [R] Using AI/ML for Quality Control for a factory? by aumzzzz

I think what you're asking is how to implement ML instead of building something from the ground up. I don't know your industry, but there are lots of suppliers and startups that would happily partner with you to help you adopt these capabilities without you needing to hire a team to build your own infrastructure. Many other industries already do!

dmart89 t1_j4nio9p wrote on January 16, 2023 at 11:30 PM

Reply to comment by lumin0va in [D] Can ChatGPT flag it's own writings? by MrSpotgold

Idk, I guess the point is that if text is 100% gpt written and not reviewed by a human, then there is a risk that gpt learns from bad gpt examples. If you review and modify it to remove the watermark, then it is effectively human reviewed/labelled content and ok for re-ingestion in future iterations.

But tbh the guys at openai are pretty capable, I'm sure they'll think of something. I don't know anything more than the headline I read.

dmart89 t1_j4mkxyd wrote on January 16, 2023 at 7:54 PM

Reply to comment by EmbarrassedHelp in [D] Can ChatGPT flag it's own writings? by MrSpotgold

I guess we don't know how they'll do it yet, but from what I understand, the purpose is to prevent future gpt versions to train on gpt generated text because gpt trains on text from the Internet.

dmart89 t1_j4l4vyz wrote on January 16, 2023 at 2:17 PM

Reply to [D] Can ChatGPT flag it's own writings? by MrSpotgold

Right now, no. They're working on a digital watermark for model outputs to distinguish whether gpt wrote something or a human.

dmart89 t1_j27at4r wrote on December 30, 2022 at 3:58 AM

Reply to [P]Run CLIP on your iPhone to Search Photos offline. by RingoCatKeeper

This is cool.

dmart89 t1_iy6k204 wrote on November 29, 2022 at 2:55 AM

Reply to Is coding from scratch a requirement to be able to do research? [D] by [deleted]

No need to reinvent the wheel. Code is a tool to get to an outcome. if you can do it with packages, great.