trajo123

trajo123 t1_je7dgjz wrote

100 images??? Folks, neural nets are data hungry, if you don't have reams of data, don't fiddle with architectures, definitely not at first. The first thing to do when data is limited is to use pre-trained models. Then do data augmentation and only then look at other things like architectures and losses if you really have nothing better to do with your time.

SMP offers a wide variety of segmentation models with the option to use pre-trained weights.

12

trajo123 t1_je79rme wrote

Have you tried using the segmentation models from the SMP package (Iakubovskii, P. (2019)? I built a segmentation model for dermoscopy images and pre-trained models consistently outperformed anything else, architecture didn't matter that much. I got best results with "U-Net with SegFormer pre-trained encoder".

It depends how much training data you have, but unless you have millions of samples, pre-training usually trumps architecture.

1

trajo123 t1_jdsflhh wrote

> like ingesting a book

Interestingly, currently LLMs can't naturally ingest a book, since it doesn't fit in the prompt (they can fit 32K tokens that's about 24k words). This is where GPTs differ fundamentally from the human brain. GPTs always produce one token at a time, given the full prompt. There is no state kept between token generation steps other than the prompt which grows one token at a time. The human brain on the other hand has a state, and it is continuously evolving. In the case of a book, our brain state will be affected by the content of the book as we read it.

LLMs need to be able to hold more state to get to the next level. Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input, or have another compressed/non-human-readable prompt that gets read before generating the token and gets updated after generating the token.

4

trajo123 t1_jdscn2h wrote

>Apparently it cannot solve coding problems which require any amount of thinking.

Not yet, and this is not surprising.

First, GPT-4 can solve many coding problems on the first try. Yes, these small programs may be simple, but how many developers can write code that directly runs? Maybe in 1-2 languages, and even then only in the problem domain that they are very familiar with. Also, since LLMs can write code in more languages and frameworks than most developers, LLMs can actually solve more coding problems than most of the programmer out there... So LLMs already contain vast amounts of "knowledge" and "intuitive ability". But intuition is not enough to solve larger or more complex problems.

So, finally, coming to the thinking part. What challenging problems can be solved by humans by "off-the-cuff"? We also, scribble, draw diagrams, try out a few things, see if things run and work as expected, do web searches, talk to stake holders, sleep on the problem, etc. In other words, in any non-trivial problem solving, we also rely heavily on feedback between our brains and the external world.

Frankly, I don't see this as a problem of LLMs, they can be effectively used as foundation models. One could have another layer, on top of LLMs to solve problems end-to-end. For example one could build a meta-model, where multiple instances work together in an actor-critic fashion. The actor is the one interacting with the user, the critic can be prompted (and perhaps) fine-tuned with with general problem solving strategies, with the main prompt being to second-guess and try to find flaws in the reasoning of the actor. Just as reinforcement learning (RL) was used to improve the general usability of ChatGPT, RL could be used to fine-tune such a meta-model (or maybe just fine-tune the critic). ...thinking fast, thinking slow

P.S. I think LLMs also need some sort of memory, so that not everything needs to be in the prompt to work on a problem.

6

trajo123 t1_jdhi7u8 wrote

Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112

The problem is likely in your training loop. Perhaps your computation graphs keeps going because you keep track of the average loss as an autograd variable rather than a plain numerical one. Make sure that for any metrics/logging you use loss.item().

5

trajo123 t1_jabwl15 wrote

>I know it’s common for massive projects to use Fortran in order to train NN.

It is definitely not common. Yes, Fortran is used in scientific computation applications due to efficient and well tested linear algebra libraries and other numerical computing legacy code.

Fortran code is or can be used under-the-hood of higher level libraries / languages, such as Numpy for Python or Matlab. Even PyTorch uses LAPACK for linear algebra computations when running on the CPU. In this sense, yes, Fortran code is used, indirectly for training NNs. But using Fortran to actually implement a NN model and train it is virtually unheard of, as far as I know.

Maybe having a look at LAPACK will give you more insight.

3

trajo123 t1_ja9aghn wrote

Very strange.

Are you sure your dataset is shuffled before the split? Have you tried different random seeds, different split ratios?

Or maybe there a bug in how you calculate the loss, but that should affect the training set as well...

So my best guess is you either don't have your data shuffled and the validation samples are "easier" or maybe it's something more trivial, like a bug in the plotting code. Or maybe that's the point where your model become self-aware :)

1

trajo123 t1_ja3lwj9 wrote

The architecture depends on what task you want to solve: classification, semantic segmentation, detection/localization?

On another note, by choosing to do deep learning on image-like data such as MRI in R you are making your job more difficult from the get-go as there are many more tool and documentation resources available for Python.

4

trajo123 t1_j8rdzfd wrote

Some general things to try:

  • (more aggressive) data augmentation when training to make you model behave better on other data, not in the dataset
  • if by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there are specialized architectures for this, e.g. R-CNN, Yolo.
  • If you have to do it with the regression head, then go for at least Resnet50, it should get you better performance across the board, assuming it was pre-trained on a large dataset like ImageNet. Vgg16 is quite small/weak by modern standards.

Why do you need to implement this in JavaScript? Wouldn't it make sense to decouple the model development from the deployment? Get a Pytorch or Tensorflow model working first, then worry about deployment. This way you can access a zoo of pre-trained models - at Hugging Face for instance.

2

trajo123 t1_j3qwftm wrote

13 periods as history to forecast another 13, this seems like a very atypical/extreme TS forecasting problem, do these services actually handle so little data?

First, it's unlikely that this little data is enough for anything but the simplest models. Probably the best you could do in terms of a domain independent model is linear regression. Even so calculating performance metrics - knowing how good the model is - is going to be challenging as that would require you to further reduce the amount of training data in order to have a validation/"out of sample" set.

Getting useful predictions with so little data is probably going to require you to make a model with strong assumptions - e.g. come up with a set of domain-specific parametrized equations that govern the time-series and then fit those parameters to the data.

In any case, Deep Learning is far from the first approach that comes to mind trying to solve this problem. Solving this problem is probably just a few lines of code using R or scipy.stats + sklearn, probably less than calling the cloud API functions. The trick is to use the right mathematical model.

2