jakderrida t1_j025iqs wrote on December 13, 2022 at 3:17 PM

Reply to comment by [deleted] in [D] Simple Questions Thread by AutoModerator

Problem with that is that using engagement or clicks will just give you an inferior version of Facebook's formula for turning retirees into conspiracy theorists.

On the other hand, I think you could make one. Perhaps by scraping the abstracts of published research and differntiating between those that later received extraordinary amounts of citations and those that didn't. I actually used NLP models against Seeking Alphas author-tagged articles for Bullish and Bearish to the stocks article pertained to and while I started with expectations to beat a coin toss, the results surged to over 90% accuracy.

[deleted] t1_j028rzq wrote on December 13, 2022 at 3:43 PM

[deleted]

jakderrida t1_j02apso wrote on December 13, 2022 at 3:58 PM

Well, for one, flipping the script already occurs. When I was an electrician, a manager overheard me claim that a device measures resistance in the circuit. He proclaimed it measures continuity of the charge going through it. I repeatedly told him that it's the same thing with no success.

If it measures whether it has many citations, the inverse of the probability measure given will be the probability it has low citations.

Now if what you're looking for is something like short stories, the hurdle to cross would be to find pretagged data that you would consider a reliable measure of "interesting/engaging" to be converted into mutually exclusive dummy variables for the NLP tool to train for. The reason I mentioned published research and citations is only because it's massive, well-defined, and feasible to collect metrics with associated texts.

Just to ensure you don't waste your time with any dreams of building the database without outside sources, I want you to realize that the thing about deep learning/neural network technologies is that it tends to produce terrible results unless the training data is pretty massive. Even the 50,000 tagged articles I used from Seeking Alpha would be considered somewhat frivolous of me by most in the ML community. Not because they're jerks or anything, but because that's just how NNs work.

[deleted] t1_j02b3bj wrote on December 13, 2022 at 4:01 PM

[deleted]

jakderrida t1_j02bzq2 wrote on December 13, 2022 at 4:08 PM

>It must be a pretty hard problem.

Not particularly. The only hurdle is the database. I collected all the Seeking Alpha articles and tags very easily before organizing the data and building the model to astonishing success on Colab.

An alternative would be to find literature from great writers (James Joyce, Emile Bronte, etc.) and divide it into paragraphs as texts, remove paragraphs that are too small and tag those paragraphs as a 1 and take awful writing (Twilight, Ann Coulter, Mein Kampf, etc.) and do the same with them tagged as 0 before training the model to separate the two.

[deleted] t1_j02d1ch wrote on December 13, 2022 at 4:15 PM

[deleted]

jakderrida t1_j02d90q wrote on December 13, 2022 at 4:17 PM

I guess I just assumed you wanted to avoid things intellectually vacuous. My bad.

[deleted] t1_j02drwm wrote on December 13, 2022 at 4:20 PM

[deleted]