BBAAQQDDD t1_ixcvhcy wrote on November 22, 2022 at 2:22 PM

Maybe a stupid question but I've always wondered how backropagation works. Maybe a stupid question but I've always wondered how backpropagation works. I do not understand how we actually know how z changes with respect to x (where y would be the output) and x a node in some layer. My intuition would be that you know the weight (w) from x to z that you could just say that y = activationfunc(w*x) (of course with a load of other input and weights). So how do you know the amount with which z changes if x changes?

give_me_the_truth t1_ixcwr6u wrote on November 22, 2022 at 2:32 PM

It is not clear what is z.

However I think gradient descent can also be thought of as back propagation in its simplest sense where independent variable is updated based on change in dependent variable.

danman966 t1_ixgzwfh wrote on November 23, 2022 at 11:13 AM

Back propagation is essentially applying the chain rule a bunch of times. Since Neural nets and other functions are just applying basic functions loads of times on top of a variable x, to get some output z, e.g. z = f(g(h(x))), then the derivative of z with respect to the parameters of f, g, and h, is going to be the chain rule applied three times. Since pytorch/tensorflow store all derivatives of their functions, e.g. activation functions or linear layers in a neural network, it is easy for the software to compute each gradient.

We need the gradient of course because that is how we update our parameter values, with gradient descent or something similar.