Submitted by d0cmorris t3_10xxxpa in MachineLearning
Clearly, large scale deep learning approaches in image classification or NLP use all sorts of Regularization mechanisms, but the parameters are typically unconstrained (i.e., every weight can theoretically attain any real value). In many Machine Learning domains, constrained optimization (e.g. via Projected Gradient Descent or Frank-Wolfe) plays a huge role.
I was wondering whether there are large-scale Deep Learning applications which rely on constrained optimization approaches? When I say large-scale, I mean large CNNs, transformers, diffusion models or the like. Are there settings where constrained optimization would even be a preferred approach, but not efficient/stable enough?
Happy for any paper suggestions or thoughts! Thanks!
tdgros t1_j7vdocr wrote
With constrained optimization, you usually have a feasible set for the variables you optimize, but in a NN training you optimize millions of weights that aren't directly meaningful, so in general, it's not clear if you can define a feasible set for each of them.