trajo123
trajo123 t1_je7dgjz wrote
100 images??? Folks, neural nets are data hungry, if you don't have reams of data, don't fiddle with architectures, definitely not at first. The first thing to do when data is limited is to use pre-trained models. Then do data augmentation and only then look at other things like architectures and losses if you really have nothing better to do with your time.
SMP offers a wide variety of segmentation models with the option to use pre-trained weights.
trajo123 t1_je79rme wrote
Have you tried using the segmentation models from the SMP package (Iakubovskii, P. (2019)? I built a segmentation model for dermoscopy images and pre-trained models consistently outperformed anything else, architecture didn't matter that much. I got best results with "U-Net with SegFormer pre-trained encoder".
It depends how much training data you have, but unless you have millions of samples, pre-training usually trumps architecture.
trajo123 t1_je2gie9 wrote
Reply to [N] OpenAI may have benchmarked GPT-4’s coding ability on it’s own training data by Balance-
How much of the code that devs write on a typical day is truly novel and not just a rehash / combination / adaptation of existing stuff?
He who has not copied code from stackoverflow, let him cast the first insult at ChatGPT.
trajo123 t1_jdsflhh wrote
Reply to comment by liqui_date_me in [D] GPT4 and coding problems by enryu42
> like ingesting a book
Interestingly, currently LLMs can't naturally ingest a book, since it doesn't fit in the prompt (they can fit 32K tokens that's about 24k words). This is where GPTs differ fundamentally from the human brain. GPTs always produce one token at a time, given the full prompt. There is no state kept between token generation steps other than the prompt which grows one token at a time. The human brain on the other hand has a state, and it is continuously evolving. In the case of a book, our brain state will be affected by the content of the book as we read it.
LLMs need to be able to hold more state to get to the next level. Perhaps get augmented with some sort of LSTM architecture where state can be built up from a theoretically infinite amount of input, or have another compressed/non-human-readable prompt that gets read before generating the token and gets updated after generating the token.
trajo123 t1_jdscn2h wrote
Reply to [D] GPT4 and coding problems by enryu42
>Apparently it cannot solve coding problems which require any amount of thinking.
Not yet, and this is not surprising.
First, GPT-4 can solve many coding problems on the first try. Yes, these small programs may be simple, but how many developers can write code that directly runs? Maybe in 1-2 languages, and even then only in the problem domain that they are very familiar with. Also, since LLMs can write code in more languages and frameworks than most developers, LLMs can actually solve more coding problems than most of the programmer out there... So LLMs already contain vast amounts of "knowledge" and "intuitive ability". But intuition is not enough to solve larger or more complex problems.
So, finally, coming to the thinking part. What challenging problems can be solved by humans by "off-the-cuff"? We also, scribble, draw diagrams, try out a few things, see if things run and work as expected, do web searches, talk to stake holders, sleep on the problem, etc. In other words, in any non-trivial problem solving, we also rely heavily on feedback between our brains and the external world.
Frankly, I don't see this as a problem of LLMs, they can be effectively used as foundation models. One could have another layer, on top of LLMs to solve problems end-to-end. For example one could build a meta-model, where multiple instances work together in an actor-critic fashion. The actor is the one interacting with the user, the critic can be prompted (and perhaps) fine-tuned with with general problem solving strategies, with the main prompt being to second-guess and try to find flaws in the reasoning of the actor. Just as reinforcement learning (RL) was used to improve the general usability of ChatGPT, RL could be used to fine-tune such a meta-model (or maybe just fine-tune the critic). ...thinking fast, thinking slow
P.S. I think LLMs also need some sort of memory, so that not everything needs to be in the prompt to work on a problem.
trajo123 t1_jdhi7u8 wrote
Reply to comment by Rishh3112 in Cuda out of memory error by Rishh3112
The problem is likely in your training loop. Perhaps your computation graphs keeps going because you keep track of the average loss as an autograd variable rather than a plain numerical one. Make sure that for any metrics/logging you use loss.item().
trajo123 t1_jdhhteo wrote
Reply to Cuda out of memory error by Rishh3112
Have you tried asking ChatGPT? :)
trajo123 t1_jctu6my wrote
Reply to comment by chengstark in Seeking Career Advice to go from general CS background to a career in AI/Machine Learning by brown_ja
Why?
trajo123 t1_jcagr8m wrote
Reply to comment by SuperTankMan8964 in [N] Baidu to Unveil Conversational AI ERNIE Bot on March 16 (Live) by kizumada
I meant the CCP trolling, disinforming and manipulating the population using such tools.
trajo123 t1_jca1bh4 wrote
CCP trolling and disinformation capabilities just got a massive upgrade.
trajo123 t1_jbjpz72 wrote
Have you done any research at all? What did you find so far?
trajo123 t1_jbits7f wrote
Posting assignment questions to reddit.
trajo123 t1_jba7yx2 wrote
So this is where clueless managers come for inspiration!
trajo123 t1_jaekibz wrote
trajo123 t1_jabwl15 wrote
Reply to [P] [R] Neural Network in Fortran! by Etterererererer
>I know it’s common for massive projects to use Fortran in order to train NN.
It is definitely not common. Yes, Fortran is used in scientific computation applications due to efficient and well tested linear algebra libraries and other numerical computing legacy code.
Fortran code is or can be used under-the-hood of higher level libraries / languages, such as Numpy for Python or Matlab. Even PyTorch uses LAPACK for linear algebra computations when running on the CPU. In this sense, yes, Fortran code is used, indirectly for training NNs. But using Fortran to actually implement a NN model and train it is virtually unheard of, as far as I know.
Maybe having a look at LAPACK will give you more insight.
trajo123 t1_ja9aghn wrote
Reply to comment by Apprehensive_Air8919 in Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
Very strange.
Are you sure your dataset is shuffled before the split? Have you tried different random seeds, different split ratios?
Or maybe there a bug in how you calculate the loss, but that should affect the training set as well...
So my best guess is you either don't have your data shuffled and the validation samples are "easier" or maybe it's something more trivial, like a bug in the plotting code. Or maybe that's the point where your model become self-aware :)
trajo123 t1_ja8cyw2 wrote
Reply to Why does my validation loss suddenly fall dramatically while my training loss does not? by Apprehensive_Air8919
How is your loss defined? How is your validation set created? Does it happen if for any test/validation split?
trajo123 t1_ja3lwj9 wrote
The architecture depends on what task you want to solve: classification, semantic segmentation, detection/localization?
On another note, by choosing to do deep learning on image-like data such as MRI in R you are making your job more difficult from the get-go as there are many more tool and documentation resources available for Python.
trajo123 t1_j8rdzfd wrote
Some general things to try:
- (more aggressive) data augmentation when training to make you model behave better on other data, not in the dataset
- if by "the problem of bounding objects" you mean object detection / localization then a single regression head on top a classifier architecture is not a good way of solving this problem, there are specialized architectures for this, e.g. R-CNN, Yolo.
- If you have to do it with the regression head, then go for at least Resnet50, it should get you better performance across the board, assuming it was pre-trained on a large dataset like ImageNet. Vgg16 is quite small/weak by modern standards.
Why do you need to implement this in JavaScript? Wouldn't it make sense to decouple the model development from the deployment? Get a Pytorch or Tensorflow model working first, then worry about deployment. This way you can access a zoo of pre-trained models - at Hugging Face for instance.
trajo123 t1_j62yj1s wrote
Convention coming from linear algebra: Ax+B where B is a vector of real numbers, positive or negative.
What makes you feel that subtracting is in any way more meaningful?
trajo123 t1_j4rzvz3 wrote
What do you mean by "dynamically changing the structure"? Do the previous classes remain?
One solution is to treat this as a transfer learning problem and get rid of the last layer and re-train your network with the new set of classes when the set of classes changes.
trajo123 t1_j3qwrtb wrote
Reply to comment by trajo123 in Time-series forecasting by AwayBobcat2273
I understand that you want to use 13 rows of history for prediction, but do you have more than 13 rows to train the model? How many rows do you have in total?
trajo123 t1_j3qwftm wrote
Reply to Time-series forecasting by AwayBobcat2273
13 periods as history to forecast another 13, this seems like a very atypical/extreme TS forecasting problem, do these services actually handle so little data?
First, it's unlikely that this little data is enough for anything but the simplest models. Probably the best you could do in terms of a domain independent model is linear regression. Even so calculating performance metrics - knowing how good the model is - is going to be challenging as that would require you to further reduce the amount of training data in order to have a validation/"out of sample" set.
Getting useful predictions with so little data is probably going to require you to make a model with strong assumptions - e.g. come up with a set of domain-specific parametrized equations that govern the time-series and then fit those parameters to the data.
In any case, Deep Learning is far from the first approach that comes to mind trying to solve this problem. Solving this problem is probably just a few lines of code using R or scipy.stats + sklearn, probably less than calling the cloud API functions. The trick is to use the right mathematical model.
trajo123 t1_j3ly2r2 wrote
Sorry for the ignorant question, but are there any practical applications of this theory?
trajo123 t1_jebbxaf wrote
Reply to comment by Tight-Lettuce7980 in [D] Improvements/alternatives to U-net for medical images segmentation? by viertys
Sufficient to train a model from scratch? Unlikely. Sufficient to fine-tune a model pre-trained on 1million+ images (imagenet, etc)? Probably yes. As mentioned, some extra performance can be squeezed out with some smart data augmentation.