lfotofilter t1_iz32jjy wrote on December 6, 2022 at 2:31 AM

Geoff Hinton by now must know each of the 60,000 digits of MNIST like an old friend.

AsIAm t1_iz4g8q4 wrote on December 6, 2022 at 12:09 PM

He knows the true probability distribution of the MNIST.

katprop t1_iz28tqp wrote on December 5, 2022 at 10:49 PM

I watched his neurips presentation. While I love explorations of alternatives to back prop, does anyone else feel like he’s going a bit off the deep end with saying this paper could explain why people sleep and we’ll use non-binary computers in the future?

gambs t1_iz406vc wrote on December 6, 2022 at 8:18 AM

Hinton has figured out how the brain works every year since the mid-80s, let the man cook

gunshoes t1_iz2cili wrote on December 5, 2022 at 11:15 PM

These OG guys from the PDP days usually do that. I just take it as a bit of garnish for some fun hypotheticals.

Direct_Ad_7772 t1_iz4gg0e wrote on December 6, 2022 at 12:12 PM

I think trying to understand the mind must be in one of his main motivations. If it wasn't for that, he would have not contributed to machine learning to begin with. So going off the deep end is a side effect of whatever it is that made him a great researcher.

[deleted] t1_iz5z9zy wrote on December 6, 2022 at 6:56 PM

[deleted]

ReginaldIII t1_iz4ries wrote on December 6, 2022 at 1:59 PM

Do you have access to the video of his presentation still?

It bothers me greatly that they paywall their presentations even after the conference has ended.

By all means have exclusivity for the duration of the actual conference, and limit commenting and discussion to conference attendees. But as soon as the conference ends they should flip the switch and make everything public. There's literally no reason not to, it isn't going to stop people wanting to attend.

logicbloke_ t1_iz5x64f wrote on December 6, 2022 at 6:43 PM

This 10x, I wish the paper presentations and keynotes were made available online. It doesn't add much effort to record an audio + slides of the presentation.

Doesn't take anything away from the in person conference, which is more about networking and discussion.

suedepaid t1_izco494 wrote on December 8, 2022 at 3:09 AM

I was also frustrated about that, but I went on the website and it looks like they're gonna publish them all in a couple weeks. Still a bit frustrated at the delay, but it's a bit understandable.

ReginaldIII t1_ize0b4z wrote on December 8, 2022 at 12:30 PM

That's good. I will keep an eye out :)

The_Real_RM t1_iz41rkd wrote on December 6, 2022 at 8:42 AM

What's funny is that a few decades from now the only relevant brains in the world will be the ones this guy brought to existence. It's just a self fulfilling prophecy

ktpr t1_iz2eya1 wrote on December 5, 2022 at 11:33 PM

If he mentioned those extrapolations in a psychology or neuroscience conference he would be laughed out of the room. World class expertise in one area does not translate to informed speculation in another.

Nameless1995 t1_iz2j0ja wrote on December 6, 2022 at 12:03 AM

Incidentally, Hinton has a lot of professional experience in psychology/cognitive science: https://www.cs.toronto.edu/~hinton/fullcv.pdf

> Jan 76 - Sept 78 Research Fellow Cognitive Studies Program, Sussex University, England Oct 78 - Sept 80 Visiting Scholar Program in Cognitive Science, University of California, San Diego Oct 80 - Sept 82 Scientific Officer MRC Applied Psychology Unit, Cambridge, England Jan 82 - June 82 Visiting Assistant Professor Psychology Department, University of California, San Diego

ktpr t1_iz2te98 wrote on December 6, 2022 at 1:22 AM

Impressive. Also the latest multiple month appointment was nearly 40 years ago. Boulder of salt here.

uotsca t1_iz2vg3b wrote on December 6, 2022 at 1:37 AM

Hinton was educated at King's College, Cambridge, graduating in 1970 with a Bachelor of Arts in experimental psychology.

kebabmybob t1_iz3n3gc wrote on December 6, 2022 at 5:30 AM

Cog sci stuff is all sophistry of this exact flavor. With respect to neuroscience you might be right.

maxToTheJ t1_iz2kwac wrote on December 6, 2022 at 12:17 AM

https://www.science.org/content/blog-post/vitamin-c-and-linus-pauling

master3243 t1_iz2f181 wrote on December 5, 2022 at 11:34 PM

Interesting read, I'm always interested in research about alternatives to backprop.

One important paragraph (for the curious, that won't read the paper):

> The forward-forward algorithm is somewhat slower than backpropagation and does does not generalize quite as well on several of the toy problems investigated in this paper so it is unlikely to replace backpropagation for applications where power is not an issue. The exciting exploration of the abilities of very large models trained on very large datasets will continue to use backpropagation.

> The two areas in which the forward-forward algorithm may be superior to backpropagation are as a model of learning in cortex and as a way of making use of very low-power analog hardware without resorting to reinforcement learning (Jabri and Flower, 1992).

whatstheprobability t1_iz58l5i wrote on December 6, 2022 at 4:04 PM

I feel like this is saying:

this won't generally replace backprop, but it could lead to insight that will lead to algorithms that will replace backprop
this could improve upon backprop for some specific use cases (low power), so even if it doesn't lead to major insights, researchers can still justify spending time on it

Does that sound right?

amassivek t1_izoh41k wrote on December 10, 2022 at 5:23 PM

There is a framework for learning with forward passes, a friendly and thorough tutorial: https://amassivek.github.io/sigprop .

The most interesting insights from the framework:

This algorithm provides an explanation for how neurons in the brain without error connections receive learning signals.
It works for continuous networks with hebbian learning. This provides evidence for this algorithm as model of learning in the brain.
It works for spiking neural networks using only the membrane potential (aka voltage in hardware). This supports applying this algorithm for learning on neuromorphic chips.

The Signal Propagation framework paper: https://arxiv.org/abs/2204.01723 . The Forward-Forward algorithm is an implementation of this framework.

I am an author of this work. I was presenting this work at a reading group when one of the members pointed out the connection between signal propagation and forward forward.

kebabmybob t1_iz3ntyw wrote on December 6, 2022 at 5:38 AM

What a chad, no grad students or anybody on this paper.

seiqooq t1_iz43dss wrote on December 6, 2022 at 9:07 AM

Probably explains why the title of the paper isnt “forward passes are all you need”

csiz t1_iz48axq wrote on December 6, 2022 at 10:23 AM

Not even auto grad.

noobbodyjourney t1_iz9272f wrote on December 7, 2022 at 11:09 AM

You sir have won the internet for today

modeless t1_iz28lbg wrote on December 5, 2022 at 10:47 PM

This seems more interesting than the capsule stuff he was working on before. Biologically plausible learning rules are cool. Does it work on imagenet though?

new_name_who_dis_ t1_iz2b35v wrote on December 5, 2022 at 11:05 PM

Is this actually biologically plausible? Seems that the idea of negative data is pretty constructed.

I see that Hinton claims it's biologically more plausible, but I don't see any justification for that statement apart from comparing it to other biologically plausible approaches, and more so spending time discussing why backprop is definitely not biologically plausible.

I'm not a neuroscientist so don't have much background on this.

modeless t1_iz2bm8r wrote on December 5, 2022 at 11:09 PM

Well no one knows exactly what the brain is up to in there, but we don't see enough backwards connections or activation storage to make backprop plausible, so this is a way of learning without backwards connections, and that alone makes it more biologically plausible.

new_name_who_dis_ t1_iz2c6t0 wrote on December 5, 2022 at 11:13 PM

I’ve heard that hebbian learning is how brains learn and this doesn’t seem like hebbian learning.

However idk if hebbian learning is even how neuroscientists think we learn in contemporary research

whymauri t1_iz38qtl wrote on December 6, 2022 at 3:20 AM

As of 2019, it is what I was taught in a graduate course on associative memory and emergent dynamics in the brain. We read Hertz's Theory Of Neural Computation. This was right before people worked on Hopfield-Self Attention.

fortunum t1_iz2v4li wrote on December 6, 2022 at 1:35 AM

Check out E-prop for recurrent spiking NN

Commyende t1_iz2euh0 wrote on December 5, 2022 at 11:32 PM

Synapses can be excitatory or inhibitory, so that's basically like positive/negative, but I don't really know if that tracks with this algorithm 100%

jms4607 t1_iz38c09 wrote on December 6, 2022 at 3:17 AM

I think the pos/neg here is more like contrastive learning.

new_name_who_dis_ t1_iz2fjjk wrote on December 5, 2022 at 11:37 PM

It's negative data. It's basically contrastive learning, except without backprop. Like you pass a positive example and then a negative example in each forward pass, and update the weights based on how they fired in each pass.

It's a really cool idea, I'm just interested if it's actually biologically plausible.

I might be wrong but inhibitory synaptic connections sounds like a neural connection with weight 0, i.e. it doesn't fire with the other neuron.

Commyende t1_iz2wzk0 wrote on December 6, 2022 at 1:49 AM

Inhibitory synapses reduce the likelihood of the downstream neuron firing.

Red-Portal t1_iz2kafb wrote on December 6, 2022 at 12:12 AM

Geoff... everything is great but please stop abusing footnotes...

kebabmybob t1_iz3nsfu wrote on December 6, 2022 at 5:37 AM

I like it this way. 100x more readable than your standard terse academic paper which gets off on appearing overly complex.

Red-Portal t1_iz3u8k2 wrote on December 6, 2022 at 6:55 AM

Oh I'm not saying you should just remove the footnotes. I'm saying it's better to blend them into the main text so I don't have to jump back and forth...

ppg_dork t1_iz6btrx wrote on December 6, 2022 at 8:16 PM

No! I think all academic papers should be structured like Infinite Jest!

Ulfgardleo t1_iz2ampb wrote on December 5, 2022 at 11:02 PM

I will start believing in Hinton's algorithms once they proof that it is consistent with some vector field with fixed points that are meaningful optima of some objective function.

_der_erlkonig_ t1_iz3k920 wrote on December 6, 2022 at 5:00 AM

Out of curiosity, why do you include this as a requirement for an algorithm to be good/interesting/useful/etc?

Ulfgardleo t1_iz3q7pd wrote on December 6, 2022 at 6:05 AM

I did not. I did it for Hinton.

A heuristic can be useful without proof, especially in tasks that are very difficult to solve. However, you have to supply strong theoretic arguments why they should work. A biological analog is not enough, especially if it is one that we do not understand, either.

Otherwise you end up like the other category of nature inspired optimization heuristics that pretend to optimize by mimicking the hunting patterns of the Harris hawk. And I wished I made this up just now.

chaosmosis t1_iz3ymas wrote on December 6, 2022 at 7:55 AM

Gimmick animal optimization procedures are my guilty pleasure. They're like intellectually cute to me or something. I get happy every time I come across a new one.

Ulfgardleo t1_iz437fi wrote on December 6, 2022 at 9:04 AM

I have a story to tell about the one time where i got invited as external evaluator for a MSc thesis. I agreed, later opened it and then realized it was a comparison of 10 animal migration algorithms.

This thesis sat on my desk for WEEKS because i did not know how to grade it. How do you grade pseudo science?!? Like, it is not the fault of the students to fall prey to this topic, but I also can't condone them not figuring out that it IS pseudoscience.

chaosmosis t1_iz8fzyo wrote on December 7, 2022 at 6:00 AM

I think the main problem is that they aren't theory driven except in an ad hoc sense. They'd be fine if they hadn't become a fad published on by everyone and their mother.

For actually neat discussions of distributed computing in animals, I don't think it's possible to do better than reading about octopuses. Strong recommend for Other Minds to anyone interested in the area.

Red-Portal t1_iz67ufu wrote on December 6, 2022 at 7:51 PM

Yeah there is a whole "zoo" of those things haha.

pm_me_your_pay_slips t1_iz358y7 wrote on December 6, 2022 at 2:52 AM

Do you mean that his algorithms don’t converge?

PolywogowyloP t1_iz36kj7 wrote on December 6, 2022 at 3:02 AM

I'm excited to see an alternative to backprop, but I think the most exciting part of this for me is the ability to still learn through stochastic layers in the model. I think this could have some major applications in probabilistic models for distributions without reparameterization tricks.

jms4607 t1_j1s103c wrote on December 26, 2022 at 11:12 PM

Are there any problems with the reparam trick?

Ford_O t1_iz2eau3 wrote on December 5, 2022 at 11:28 PM

So that's why I keep getting nightmares.

Jokes aside, this sounds quite plausible. However, I am unsure if this can be ever more efficient than backprop. Yet, this could have huge impact on neuroscience, if it turns that's what happens in sleep.

nikgeo25 t1_iz48wsj wrote on December 6, 2022 at 10:33 AM

Paper reads like an idea he had in the shower. Where's the math and connection to existing work? Normalizing each layer after maximizing a square. Someone's gonna show he's doing some fancy PCA in no time I bet.

Wild-Ad3931 t1_iz8sb6c wrote on December 7, 2022 at 8:41 AM

What about non linearities ?

Wild-Ad3931 t1_iz8urzx wrote on December 7, 2022 at 9:19 AM

Did anyone understand how weights were updated ?

SeverelyCanadian t1_izv8vvr wrote on December 12, 2022 at 2:27 AM

I wondered this too. It's very unclear, and seems like a central detail is missing.

No-Cold8421 t1_izxuc54 wrote on December 12, 2022 at 5:31 PM

Hi guys, I try to reimplement the Forward forward network with pure numpy.

I tested it on a subset of the Iris dataset, it seems converged but is very sensitive to the hyper-parameters (lr, bs, num_hidden).

Hope you can have fun with it!

https://github.com/JacksonWuxs/Forward-Forward-Network

valleyro t1_izxujen wrote on December 12, 2022 at 5:32 PM

Great tryout! Thank you!

SatoshiNotMe t1_iz4cehg wrote on December 6, 2022 at 11:22 AM

Odd thing about the abstract: suddenly says “video” near the end. Is it only for video data ?

tchumbae t1_iz55i98 wrote on December 6, 2022 at 3:44 PM

The idea behind the paper is very cool, but there has been previous work that substitutes the backward pass with a second forward pass. Check out this work by G. Dellaferrera and G. Kreiman!

singularineet t1_iz5q79f wrote on December 6, 2022 at 5:58 PM

Other relevant prior work: arXiv:2202.0887, Gradients without Backpropagation, by Atılım Güneş Baydin et al, 2022

nikgeo25 t1_iz97ctl wrote on December 7, 2022 at 12:14 PM

Also the work by Ma and Wright that uses a form of generalized nonlinear PCA. Search ReduNet

Competitive_Dog_6639 t1_iz2oaue wrote on December 6, 2022 at 12:42 AM

Hinton is awesome and really enjoyed his neurips talk. Naive question: are single layer gradients biologically plausible? My understanding is that gradients back thru multiple layers are not. The FF algorithm still uses gradients for single layers tho right?

dasayan05 t1_iz2ucqp wrote on December 6, 2022 at 1:29 AM

yes, they are like "local" updates I believe

IDe- t1_iz356pg wrote on December 6, 2022 at 2:51 AM

Backprop has really overstayed its welcome. It's great to see people doing something about it.

bohreffect t1_iz3gn0s wrote on December 6, 2022 at 4:26 AM

You're sleeping on differentiable programming then

IDe- t1_iz6z4y3 wrote on December 6, 2022 at 10:50 PM

The issue is that requiring a model to be differentiable puts far too many limitations on the types of models you can formulate. Much of the research in the last few decades has focused on how to deal with issues caused purely because of the artificial constraint of differentiability. It's purely "local optimization" in the space of potential models, when what we really should be doing is "basin-hopping".

bohreffect t1_iz74sa2 wrote on December 6, 2022 at 11:31 PM

But to imply backprop is getting old neglects all of the real world applications that haven't been pushed yet.

I understand there are problems where differentiability is an intractable assumption but saying "oh old thing how gauche" isn't particularly constructive.

IDe- t1_iz77rsw wrote on December 6, 2022 at 11:53 PM

Ah, I didn't intend to say that it's old or useless, just that I think it receives disproportionate research focus/effort.

bohreffect t1_iz7eg8v wrote on December 7, 2022 at 12:42 AM

Fair enough

[deleted] t1_iz6e54k wrote on December 6, 2022 at 8:31 PM

"differentiable"

bohreffect t1_iz6emfb wrote on December 6, 2022 at 8:34 PM

I mean, can you not compute the Jacobian of a constrained optimization program and stack that into any differentiable composition of functions?

People snoozin'.

[deleted] t1_iz6hlao wrote on December 6, 2022 at 8:53 PM

no you can't because it's not actually a Jacobian

bohreffect t1_iz6j2xr wrote on December 6, 2022 at 9:02 PM

The Jacobian of the solution of a constrained optimization program with respect to its parameters, but I thought that was understood amongst the towering intellect of neural network afficiandos, e.g. the original commenter finding backprop to be stale.

Here's the stochastic programming version: Section 3.3. https://proceedings.neurips.cc/paper/2017/file/3fc2c60b5782f641f76bcefc39fb2392-Paper.pdf

Ulfgardleo t1_iz9fjio wrote on December 7, 2022 at 1:34 PM

Funny that stuff always comes back. We used to differentiate SVM solutions wrt kernel parameters like that back in the day.

eccstartup t1_iz3mej4 wrote on December 6, 2022 at 5:23 AM

I would be good if someone could provide the code.

ReasonablyBadass t1_iz3nk3n wrote on December 6, 2022 at 5:35 AM

Can someone ELI5 what negative data means here? How does the network generate it?

Paluure t1_iz6zvqd wrote on December 6, 2022 at 10:56 PM

Basically, for an unsupervised task, it's nonsense data that does not fall under any meaningful class in the training dataset. It can be anything. In the paper, they modify each MNIST image so that it isn't a digit anymore but looks like one. The network doesn't generate negative images, you do, and feed it as "bad data" right after you give it "good data" to create contrast between them for the model to learn.

For a supervised task, "bad data" can also be nonsense (just as in unsupervised task) or can be mislabeled data such as feeding an image of "5" but embedding "4" as the label inside the image. That's obviously wrong, and is considered bad data.

ReasonablyBadass t1_iz88w1p wrote on December 7, 2022 at 4:47 AM

Thank you!

ObjectManagerManager t1_iz5xous wrote on December 6, 2022 at 6:46 PM

(Confession: I haven't read the paper yet). I have a couple of questions:

If each layer has its own objective function, couldn't you train layers back-to-front? e.g., train the first layer to convergence, then train the second layer, and so on. I doubt this would be faster than training it end-to-end, but a) as the early layers adapt, they screw up the representations being fed to the later layers anyways, so it probably wouldn't be too much slower than training it end-to-end, and b) it would use significantly less memory (e.g., if you pre-compute the inputs to a layer just before you begin training it, you could imagine training any arbitrarily deep model with a finite amount of memory).
What's the motivation behind "goodness"? Suppose we're talking about classification. Why doesn't each layer just minimize cross entropy? I guess that'd require each layer to have its own flatten + linear projection layers. But then you wouldn't have to concatenate the label and the input data, and so inference complexity would be (mostly) independent of the number of classes. Thinking of a typical CNN, a layer could be organized:
1. Batch norm
2. Activation (e.g., ReLU)
3. Convolution (the output of which is fed into the next layer)
4. Pooling
5. Flatten
6. Linear projection
7. Cross entropy loss

Can anyone (who has read the paper) answer these questions?

Batsev t1_iz6yzbn wrote on December 6, 2022 at 10:49 PM

For the first question: https://conferences.miccai.org/2022/papers/233-Paper1173.html They basically train a layer at a time in a "back to front" fashion. They use a reconstruction loss and a classification loss as layer's objectives.

[deleted] t1_iz6yw0m wrote on December 6, 2022 at 10:49 PM

[deleted]

sytelus t1_iz8t24n wrote on December 7, 2022 at 8:52 AM

Was anyone able to reproduce the results for forward forward algo?

kourouklides t1_j05bmni wrote on December 14, 2022 at 4:17 AM

In my view, this sounds very boring. It would've been revolutionary if he came up with a new Gradiet-Free Deep Learning method in order to completely get rid of gradients. With very few exceptions, during the last 10 years or so, we keep seeing small and incremental changes in ML, but no breakthroughs.

Abhijithvega t1_j0esd7x wrote on December 16, 2022 at 2:49 AM

Transformers? PINNs? Skip connections, adam, hell even RNNs happened less than 10 years ago.

kourouklides t1_j0jyi5c wrote on December 17, 2022 at 5:21 AM

A simple google search would've revealed to you the following: "The concept of RNN was brought up in 1986. And the famous LSTM architecture was invented in 1997." Hence, not even close.
Didn't I specify that "With very few exceptions?" You merely mentioned those exceptions.
Do you realize that in order to attempt to challenge someone's argument you need to specify two quantities in comparison? What specific decade are you comparing it with?

WashiBurr t1_iz34l3z wrote on December 6, 2022 at 2:47 AM

Definitely interesting at the very least.

[deleted] t1_iz6yt0l wrote on December 6, 2022 at 10:48 PM

[deleted]

ClassicJewJokes t1_iz75pwk wrote on December 6, 2022 at 11:38 PM

Capsule Nets 2: Electric Boogaloo. My man off da perc, I like it.

wilgamesh t1_izzoduc wrote on December 13, 2022 at 12:47 AM

Hinton cites Francis Crick's "Function of Sleep" 1983 idea in his list of references.

Like the 2nd forward pass that reduces the fitness function of "negative data", Crick proposed REM sleep is "reverse learning" that removes "undesirable modes."

Quite elegant to see this implemented...

Sepic2 t1_j0fzir8 wrote on December 16, 2022 at 10:45 AM

Maybe a dumb question but i don't see how this method enables learning in any way:

- The (first) forward part calculates loss/goodness, and then you need backpropogation to change weights of the network according to derivatives of the loss/goodness. How does the network learn if weights are not changed and you only calculate goodness?

The paper says: "The positive pass operates on real data and adjusts the weights to increase the goodness in every hidden layer. The negative pass operates on "negative data" and adjusts the weights to decrease the goodness in every hidden layer"

- Could it be that the in the first "forward", you actually do both forward and backward-prop, and the name just sounds fancy with the second "forward" trying to implement contrastive learning in a clever way?

kourouklides t1_j0jzimn wrote on December 17, 2022 at 5:32 AM

Well, nobody really knows if this method actually works because Hinton reached to the part of writing the paper. He didn't reach to the part of actually coding the solution (yet).

[deleted] t1_j0gwvxi wrote on December 16, 2022 at 3:43 PM

[deleted]

rmoot t1_j0zsydo wrote on December 20, 2022 at 5:11 PM

Everyone was waiting for this, of course:

https://twitter.com/schmidhuberai/status/1605246688939364352?s=61&t=zA5kJ1GnrZMNSx8nat6WYQ

Comments