Submitted by d0cmorris t3_10xxxpa in MachineLearning
tdgros t1_j7vdocr wrote
With constrained optimization, you usually have a feasible set for the variables you optimize, but in a NN training you optimize millions of weights that aren't directly meaningful, so in general, it's not clear if you can define a feasible set for each of them.
notdelet t1_j7vv9pi wrote
You can get constrained optimization in general for unconstrained nonlinear problems (see the work N Sahinidis has done on BARON). The feasible sets are defined in the course of solving the problem and introducing branches. But that is both slow, doesn't scale to NN sizes, and doesn't really answer the question ML folks are asking (see the talk at the IAS on "Is Optimization the Right Language for ML").
d0cmorris OP t1_j819chm wrote
Exactly. I mean I can easily define L2-constraints for the weights of my network and then do constrained optimization, which would at least theoretically be equivalent to L2-regularization/weight decay. But this is not quite useful, I am wondering whether there are applications of constraints where it actually makes sense.
Mental-Reference8330 t1_j8xup7w wrote
in the early days, researchers considered the architecture itself to be a form of regularization. LeCunn didn't invent it, but he did popularize the idea that a convolutional layer (like LeNet in his case) is like a fully-connected layer, but constrained to only allow solutions where the layer weights could be expressed in terms of a convolution kernel. In their introduction, ResNets were also motivated by the fact that they're "constrained" to start from better minima, even though you could also convert a resnet model to a fully-connected model without loss of precision.
Viewing a single comment thread. View all comments