Viewing a single comment thread. View all comments

PassionatePossum t1_it1jsw2 wrote

Too little information go on. I hope you have a training and an independent validation set (and by independent I don't mean different images of the same blood cell).

  1. Accuracy can be a highly misleading metric, especially if you have strong imbalances in the classes and number of examples.
  2. Increasing validation error when adding layers, can be a sign of overfitting. However don't train for a fixed number of epochs and then evaluate. Validate regularly during the training and take the best checkpoint.
  3. "the learning rate is crazy small" sets off alarm bells. You are aware that the learning rate is a parameter you need to set, right?
  4. You have a CNN but you also have a dense layer with 128 units while only having 17 classes. Something does not add up here.

As for the number of layers. There is no definite answer to this questions and it is also not that important. You might not get the best performance if you don't optimize it, but it should always sort of work. The problems you have are likely much more fundamental than that.

Classification of images is a well-studied problem. Why not start from existing and pretrained networks such as EfficientNet and build your own classifier on top of it?

5

thanderrine OP t1_it3dyfy wrote

So yes I do have an independent training and validation set (there are no same image both in training and validation).

  1. I understand that, so in this case is there a specific metric that you'd like to suggest?

  2. Totally agree. Increase in validation error could be overfitting but what I'd also assume with over fitting is that my training accuracy should also increase.... Unless I'm wrong and you can correct me.

  3. Yeah by learning rate I meant the increase in accuracy with each epoch. Sorry about using 'learning rate'

  4. so unless I'm misunderstanding your question, the 128 neuron dense layer is a hidden layer, the last layer is of course a dense with 17 neurons. If you were talking about something else do let me know.

So I have studied the architecture and hyper parameters of VGG and resnet but you see, I want to understand what goes into saying we're going to stack x convoluted layers and x2 dense layers with these many neurons. Like where does this confidence comes from? You know what I mean...

Sure the hyperparameter tuning is great and the result of every stack of convoluted layer is also great. But still the core architecture I.e. number of layers and number of neurons in each layer is still a bit of a mystery from the papers.

So this is a project to kind of give me a ballpark estimate that okay 'for an image of size fxf that could belong to x classes, the number of layers with neurons are around this range '

Anywho thank you for replying. And thank you for your insights.

1

PassionatePossum t1_it3rd0p wrote

If the examples between classes are strongly unbalanced, I would probably go for a precision/recall plot. One per class. Overall performance can be compared by the mean average precision.

You are right. In an overfitting classifier, training accuracy should go up over the long term. But that does not have to be a strong effect. I've seen plenty of overfitting classifiers where the training loss was essentially flat but the validation loss kept increasing. Also doesn't have to be a strong effect. But from what you told me, that makes my theory of overfitting slightly less likely.

Your explanation of the 128 units makes a lot more sense. However, I would argue to start simple. One dense layer after a sufficiently deep convolutional network, should be all that is needed.

I feel like you quest for "understanding" network structures is an unproductive direction. Well-performing network architectures are mostly just something that empirically works, there is not real theory behind it. You can waste a lot of time trying to tweak something that has been shown to work across a wide area of problem domains or you can just stick with something that you know works. Especially if you only need a ballbark estimate.

My setup for a ballpark estimate for pretty much any problem is:

  1. An EfficientNet as backbone. That has the advantage you can easily scale up the backbone if you have the resources and want to see what is possible with a larger network. I usually start with EfficientNet-B1.
  2. Pretrained imagenet weights (without the densely connected layers)
  3. Global average pooling on the features.
  4. A single dense layer to the output neurons.
  5. I usually train only the last layer for a single epoch and then release the weights for the backbone.

After I have the initial predictions. I try to visualize the error cases to see whether I can spot commonalities and work my way up from there.

That hasn't failed me so far. I normally usually use a focal loss to guard against strongly unbalanced examples. Unfortunately, the multi-class case isn't implemented in TensorFlow (which is what I tend to use), but that is easily implemented in a few lines of code.

But in your case I wouldn't go through the trouble of tweaking the loss. A normal crossentropy loss should be sufficient to get an idea of what is possible. If everything fails, downweight the loss on examples that are overrepresented.

1

thanderrine OP t1_it8qwlf wrote

So I used to do transfer learning for my models before right... But it kind of feels like I'm using someone else's work. Like if I'm using the weights and architecture of someone else's work then how does it shows my skills... You know what I mean.

All I do is use the image dataset and preprocess it so that it fits the model. So how can I possibly present something like. as my project if the majority of the work is done by someone else.

About tweaking the loss, so I am kind of doing that for my model. I'm using focal tversky loss, with gamma as 0.65.

1

PassionatePossum t1_it9djl0 wrote

I understand, if you are working on an academic paper or something like that. In that case novelty is important. If you are working in industry - as I currently am- I have no such concerns. In industry, skill is to produce a working solution fast and if someone has already built a framework that I am allowed to use, even better.

1