alkibijad

alkibijad t1_j14x08t wrote

This may not be the direct answer, but it's applicable to many problems:

  1. Use the simplest approach first. This would be creating a simple model, in this case flat fully connected layer.
  2. Measure the results.
  3. If the results aren't good enough, think about what could improve the results: different model architecture, training procedure, obtaining more data...
  4. Iterate (go to 2)

Also:

`creating linear or embedding layers for each feature group before combining them together` - this adds additional knowledge into the network, so it may help... but in theory the network should be able to find this out on its own - the combinations that don't have much sense will have weights close to zero - that's why I advise you to start without it (and try doing it without it).

​

1K+ features: in some cases this is a lot of features, in some it's not that big number... but it maybe makes sense to reduce the number of features, by using some of the dimension reduction techniques.

5