trajo123 t1_jbits7f wrote on March 9, 2023 at 10:30 AM

Posting assignment questions to reddit.

RoboiosMut t1_jbj01s2 wrote on March 9, 2023 at 11:50 AM

You can ask chatgpt for this type of questions

neuralbeans t1_jbiu3io wrote on March 9, 2023 at 10:35 AM

Yes, if the features include the model's target output. Then, the overfitting would result in the model outputting that feature as is. Of course this is a useless solution, but the more similar the features are to the output, the less overfitting will be a problem and the less data you would need to generalise.

BamaDane t1_jbjhitr wrote on March 9, 2023 at 2:24 PM

I’m not sure I understand what your method does. If Y is the output, then you say I should also include Y as an input? And if I manage to design my model so it doesn’t just select the Y input, then I’m not overfitting? This makes sense that it doesn’t overfit, but doesn’t it also mean I am dumbing-down my model? Don’t I want my model to preferentially select features that are most similar to the output?

neuralbeans t1_jbjizpw wrote on March 9, 2023 at 2:35 PM

It's a degenerate case, not something anyone should do. If you include Y in your input, then overfitting will lead to the best generalisation. This shows that the input does affect overfitting. In fact, the more similar the input is to the output, the simpler the model can be and thus the less it can overfit.

Constant-Cranberry29 OP t1_jbiutcs wrote on March 9, 2023 at 10:44 AM

Can you provide a reference that states that feature engineering can address overfitting?

neuralbeans t1_jbixife wrote on March 9, 2023 at 11:19 AM

https://sciendo.com/article/10.2478/cait-2019-0001

Constant-Cranberry29 OP t1_jbixwf1 wrote on March 9, 2023 at 11:24 AM

I think feature selection and feature engineering are different

neuralbeans t1_jbiygo0 wrote on March 9, 2023 at 11:31 AM

Well selection is part of engineering, is it not?

Constant-Cranberry29 OP t1_jbiyw8d wrote on March 9, 2023 at 11:36 AM

because I've read from some paper, they saying FS and FE is different

trajo123 t1_jbjpz72 wrote on March 9, 2023 at 3:23 PM

Have you done any research at all? What did you find so far?

Constant-Cranberry29 OP t1_jbjt7m4 wrote on March 9, 2023 at 3:44 PM

yes, you can see my problem here https://stackoverflow.com/questions/75672909/why-by-adding-additional-information-as-number-of-sequence-on-dataset-can-avoid

seanv507 t1_jbkc2o1 wrote on March 9, 2023 at 5:42 PM

Please just remove the question. Basically your stack overflow question is asking to debug your code. No general principles

jzaunegger t1_jbjst8q wrote on March 9, 2023 at 3:41 PM

Heres one paper that I can immediately think of, https://arxiv.org/abs/1409.7495. The authors use a synthetic dataset to select and enginer features of a “real” dataset. Not sure if this is what you are looking for but could be a step in the right direction.

[deleted] t1_jbnpo4j wrote on March 10, 2023 at 10:29 AM

[deleted]

Can feature engineering avoid overfitting?

Comments