f_max t1_j3e2s3m wrote on January 7, 2023 at 10:28 PM

I work at one of the big techs doing research on this. Frankly LLMs will be the leading edge of the field for the next 2 years imo. Join one of the big techs and get access to tens of thousands of dollars of compute per week to train some LLMs. Or in academia, lots of work needs to be done to characterize inference-time capabilities, understand bias, failure modes, smaller scale experiments w/ architecture, etc.

singularpanda OP t1_j3e6asd wrote on January 7, 2023 at 10:52 PM

Yes. That's the benefit of in the big companies. However, for a lot of NLP researchers like me, we do not have that many gpu resources(I believe most of the companies also cannot afford this).

f_max t1_j3eagrm wrote on January 7, 2023 at 11:20 PM

Right. So if you’d rather not shoot to join a big company, there’s still work that can be done in academia with say a single A100. Might be a bit constrained at pushing the bleeding edge of capability. But there’s much to do to characterize LLMs. They’re black boxes we don’t understand in a bigger way than maybe any previous machine learning model.

Edit: there are also open source weights for gpt3 type models w similar performance. Ie huggingface BLOOM or Meta OPT.

singularpanda OP t1_j3elwu4 wrote on January 8, 2023 at 12:40 AM

Seems recently, not too much paper are doing on them. Don't look at details. Maybe models like OPT is still too large?

f_max t1_j3frqfb wrote on January 8, 2023 at 6:16 AM

They have a sequence of models ranging from 6B params up to 175B largest, so you can work on smaller variants if you don’t have gpus. There’s def some papers working on inference efficiency and benchmarking their failure modes if you look around.

Think_Olive_1000 t1_j3tnkld wrote on January 11, 2023 at 12:03 AM

Dude that's why you ought to put everything into NLP find a way of producing better results for cheaper on less expensive hardware and you'll be the talk of the town. I think everyone would love to have an unrestricted local version of chatgpt on their phones. Do the research!

currentscurrents t1_j3eo4uc wrote on January 8, 2023 at 12:56 AM

There's plenty of work to be done in researching language models that train more efficiently or run on smaller machines.

ChatGPT is great, but it needed 600GB of training data and megawatts of power. It must be possible to do better; the average human brain runs on 12W and has seen maybe a million words tops.

singularpanda OP t1_j3eohh7 wrote on January 8, 2023 at 12:59 AM

Yes, it is quite costy. However, it seems not easy to modify it in our research as it is not open.

KBM_KBM t1_j3g7swj wrote on January 8, 2023 at 9:30 AM

https://github.com/lucidrains/PaLM-rlhf-pytorch

Similar to chat get architecture you can play with this

singularpanda OP t1_j3gdv9p wrote on January 8, 2023 at 10:50 AM

Thanks! Yes, there are many similar things. But the ChatGPT seems to have the most amazing performance.

Think_Olive_1000 t1_j3tnqyd wrote on January 11, 2023 at 12:04 AM

I feel like you'd make a really bad research student

KBM_KBM t1_j3gere2 wrote on January 8, 2023 at 11:02 AM

True but practically training a gpt model is not computationally cheap. I think instead of making such generalized language models we need to focus more one subject specific language models.

f_max t1_j3frhxs wrote on January 8, 2023 at 6:14 AM

Megawatt sounds right for training. But kilowatts for inference. Take a look at tim dettmer’s work (he’s at UW) on int8 to see some of this kind of efficiency work. There’s definitely significant work happening in the open.

allaboutthatparklife t1_j3h8d9o wrote on January 8, 2023 at 3:46 PM

> Frankly LLMs will be the leading edge of the field for the next 2 years imo.

(curious outsider) what do you see being the leading edge after that? or will NLP be more or less solved by then?

f_max t1_j3hztd5 wrote on January 8, 2023 at 6:42 PM

Idk. Have a decent idea what’s being worked on for the next year but it gets fuzzy after that. Maybe we’ll have another architectural breakthrough. Alex net 2012, transformers 2017, something else 2023 or 2024 maybe.

suflaj t1_j3e1h8f wrote on January 7, 2023 at 10:19 PM

Not by a long shot.

ChatGPT in practice is a politically-biased conversational Google and Wikipedia summarizer with a bit of polite talk. And it is less broad than both of them.

It is truly fascinating how DEEP it can go, ex. translating arbitrary code in almost correct assembly, even recent one like M1, but that's that. It cannot reason fully, it cannot extrapolate, and most importantly, it has fairly old training data to compete with the speed of NLP research.

But it's nifty to chat with if none of your colleagues have the time.

Freed4ever t1_j3e36am wrote on January 7, 2023 at 10:30 PM

But that's the current state, we know there will be a v.next to infinity, no? Would there be a state where it can train itself, similar to how Deepmind trains itself in games?

suflaj t1_j3e3piv wrote on January 7, 2023 at 10:34 PM

Based on the techniques ChatGPT uses we cannot formally prove that it can generalize without infinite width. Even our training process amounts to mostly teaching the model to compress knowledge. ChatGPT made some strides by partially introducing something similar to reinforcement learning, but reinforcement learning itself is not enough to extrapolate or come up with new concepts.

All the big names in AI claim that stochastic gradient descent techniques and our current direction are fascinating, but ultimately a dead end. Certainly the area has been stale for several years and has degenerated into a dick measuring contest, only instead of dicks you measure parameters, TPUs and metrics on benchmark datasets. Blame transformers which were in a sense us getting a taste of the forbidden fruit, but you know what followed after that.

Of course, out of this you do get some advances useful for the industry, but nothing really of note in the general picture. And it seems to me that lately all these big models that imitate knowledge really well are generating negative sentiment in the population, which may ruin AI.

Freed4ever t1_j3e66lo wrote on January 7, 2023 at 10:51 PM

Thanks. I'm not a researcher, and more curious about the practicality aspect of the technology. So, the problem is wide, so we cannot formally prove, which is fair. However, if I'm interested in the practicality of the tech, I do not necessarily need a formal proof, I just need it to be good enough. So, just use code generation as an example, it is conceivable that it generates a piece of code, then it actually executes the code and then learn about its accuracy, performance, etc. And hence it is self - taught. Looking at another example like say poetry generation, it is conceivable that it generates a poem, publishes it and then crowd source feedbacks to self teach as well?

suflaj t1_j3e9yim wrote on January 7, 2023 at 11:17 PM

Well, my first paragraph covers that.

> So, just use code generation as an example, it is conceivable that it generates a piece of code, then it actually executes the code and then learn about its accuracy, performance, etc. And hence it is self - taught.

It doesn't do that. It learns how to have a conversation. The rest is mostly a result of learning things through learning how to model language. Don't give it too much credit. As said previously, it cannot extrapolate.

Think_Olive_1000 t1_j3toxf1 wrote on January 11, 2023 at 12:12 AM

I think they meant: it is conceivable that in the future it could. i.e. you hook an LLM up with a repl. https://youtu.be/pdSfgRYy8Ao take at look at 15 minutes in. I could easily see how you could fine tune using self appraisal by executing code.

suflaj t1_j3tq0u2 wrote on January 11, 2023 at 12:19 AM

Sure you could. But the cost is so much it probably outweighs the benefits. And that is even if you made training stable (we already know based on recurrent networks, GANs and even transformers that they're not particularly stable). Hooking it up to the repl would make the task essentially reinforcement learning. And if you know something about reinforcement learning, you know that it generally doesn't work because the environment the agent has to traverse is too difficult to learn anything - what Deepmind managed to achieve with their chess and go engines is truly remarkable, but these are THEIR achievements despite the hardships RL introduces. This is not the achievement of RL. Meanwhile ChatGPT is mostly an achievement of a nice dataset, a clever task and deep learning. It is not that impressive from an engineering standpoint (other than syncing up all the hardware to preprocess the data and train it)

Unless LLMs are extremely optimized in regards to latency and cost, or unless compute becomes even more cheaper (not likely), they have no practical future for the consumer.

So far, it's still a dick measuring contest, as if a larger model and dataset will make much of a difference. I do not see much interest in making them more usable or accessible, I see only effort in beating last year's paper and getting investors to dump more money into a bigger model for next year. I also see ChatGPT as being a cheap marketing scheme all the while it's being used for some pretty nefarious things, some of them being botted Russian or Ukrainian war propaganda.

So you can forget the repl idea. Who would it serve? Programmers have shown they are not willing to pay for something like GitHub Copilot. Large companies can always find people to hire and do programming for them. Unless these are strides in something very expensive, like formal verification, it's not something a large company, the one that has the resources to research LLMs, would go into.

Maybe the next step is training it on WolframAlpha. But at that point you're just catching up to almost 15 year old software. Maybe that "almost 15 year old" shows you how overhyped ChatGPT really is for commercial use.

Think_Olive_1000 t1_j3tqojo wrote on January 11, 2023 at 12:24 AM

Nah, there's already work that can reduce generic LLM model size by a half and not lose any performance. And LLMs I think will be great as foundation models for training more niche smaller models for narrower tasks - people already use openAIs API to generate data to fine-tune their own niche models. I think we'll look back at current LLMs and realise just how inefficient they were - though a necessary evil to prove that something like this CAN be done.

suflaj t1_j3twskh wrote on January 11, 2023 at 1:06 AM

Half is not enough. We're thinking in the order of 100x or even more. Do not forget that even ordinary BERT is not really commercially viable as-is.

I mean sure you can use them to get a nicer distribution for your dataset. But at the end of the day the API is too slow to train any "real" model, and you can already probably collect and generate data for smaller models yourself. So as a replacement for lazy people - sure, I think ChatGPT by itself probably has the potential to solve most repetitive questions people have on the internet. But it won't be used like that at scale so ultimately it is not useful.

If it wasn't clear enough by now, I'm not skeptic because of what LLMs are, but how they simply do not scale up to real-world requirements. Ultimately, people do not have datacenters at home, and OpenAI and other vendors do not have the hardware for any actual volume of need other than a niche, hobbyist one. And the investment to develop something like ChatGPT is too big to justify for that use.

All of this was ignoring the obvious legal risks from using ChatGPT generations commercially!

Think_Olive_1000 t1_j3u3k7w wrote on January 11, 2023 at 1:53 AM

Bert is being used by Google for search under the hood. It's how theyve got that instant fancy extractive answers box. I don't disagree that LLMs are large. So was Saturn V.

suflaj t1_j3u4smq wrote on January 11, 2023 at 2:02 AM

Google's BERT use is not a commercial, consumer product, it is an enterprise one (Google uses it and runs it on their hardware), they presumably use the large version or something even larger than the pretrained weights available on the internet and to achieve latencies they have they are using datacentres and non-trivial distribution schemes for it, not just consumer hardware.

Meanwhile, your average CPU will need anywhere from 1-4 seconds to do one inference pass in onnx runtime, of course much less on a GPU, but to be truly cross platform you're targetting JS in most cases, which means CPU and not a stack as mature as what Python/C++/CUDA have.

What I'm saying is:

people have said no to paid services, they want free products
consumer hardware has not scaled nearly as fast as DL
even ancient models are still too slow to run on consumer hardware after years of improvement
distilling, quantizing and optimizing them seems to get them to run just fast enough to not be a nuisance, but is often too tedious to work out for a free product

currentscurrents t1_j3eiw5w wrote on January 8, 2023 at 12:19 AM

I think you're missing some of the depth of what it's capable of. You can "program" it to do new tasks just by explaining in plain english, or by providing examples. For example many people are using it to generate prompts for image generators:

>I want you to act as a prompt creator for an AI image generator.

>Prompts are descriptions of artistic images than include visual adjectives and art styles or artist names. The image generator can understand complex ideas, so use detailed language and describe emotions or feelings in detail. Use terse words separated by commas, and make short descriptions that are efficient in word use.

>With each image, include detailed descriptions of the art style, using the names of artists known for that style. I may provide a general style with the prompt, which you will expand into detail. For example if I ask for an "abstract style", you would include "style of Picasso, abstract brushstrokes, oil painting, cubism"

>Please create 5 prompts for an mob of grandmas with guns. Use a fantasy digital painting style.

This is a complex and poorly-defined task, and it certainly was not trained on this since the training stops in 2021. But the resulting output is exactly what I wanted:

>An army of grandmas charging towards the viewer, their guns glowing with otherworldly energy. Style of Syd Mead, futuristic landscapes, sleek design, fantasy digital painting.

Once I copy-pasted it into an image generator it created a very nice image.

I think we're going to see a lot more use of language models for controlling computers to do complex tasks.

suflaj t1_j3ek7d0 wrote on January 8, 2023 at 12:28 AM

> This is a complex and poorly-defined task

Not at all. First of all, ChatGPT does not understand complexity. It would do you well not to think of it like there is some hierarchy. Secondly, there is no requirement of it needing to be well defined. From what I could gather, ChatGPT requires you to convince it it is not giving out an opinion, and then it can hallucinate pretty much anything.

Specifically the task you gave it is likely implicitly present in the dataset, in the sense that the dataset allowed the model to learn the connections between the words you gave it. I hate to break your bubble, but the task is also achievable even with GPT2, a much less expressive model, since it can be represented as a prompt.

It will be easier to see the shortcomings there, but to put it simply, ChatGPT also has them, ex. it does not by default in the genral case differentiate between uppercase and lowercase letters even if it might be relevant for the task. Such things are too subtle for it. Once you realize the biases it has in this regard you being to see through the cracks. Or generally once you give it a counting task, it says it can count but it is not always successful in it.

What is fascinating is the amount of memory ChatGPT has. It is compared to other models very big. But it is limited and it is not preserved outside of the session.

I would say that the people hyping it up probably just do not understand it that well. LLMs are fascinating, yes, but not ChatGPT specifically, it's how malleable the knowledge is. I would advise you to not understand it, because then the magic stays alive. I had a lot of fun for the first week when I was using it, but I never even use it nowadays.

I would also advise you to approach it more critically. I would advise you to first look into how blatantly racist and sexist it is. With that, you can see the reflection of its creators in it. And most of all, I would advise you to focus on its shortcomings. They are easy to find once you start talking to it more like you'd talk with a friend. They will help you use it more effectively.

currentscurrents t1_j3emas4 wrote on January 8, 2023 at 12:43 AM

>I hate to break your bubble, but the task is also achievable even with GPT2

Is it? I would love to know how. I can run GPT2 locally, and that would be fantastic level of zero-shot learning to be able to play around with.

I have no doubt you can fine-tune GPT2 or T5 to achieve this, but in my experience they aren't nearly as promptable as GPT3/ChatGPT.

>Specifically the task you gave it is likely implicitly present in the dataset, in the sense that the dataset allowed the model to learn the connections between the words you gave it

I'm not sure what you're getting at here. It has learned the connections and meanings between words of course, that's what a language model does.

But it still followed my instructions, and it can follow a wide variety of other detailed instructions you give it. These tasks are too specific to have been in the training data; it is successfully generalizing zero-shot to new NLP tasks.

suflaj t1_j3emtbh wrote on January 8, 2023 at 12:47 AM

> I would love to know how to do this! I can run GPT2 locally, and that would be fantastic level of zero-shot learning to be able to play around with.

It depends on how much you can compress the prompts. GPT2 is severely limited by memory. This means that you would need to train it on already condensed prompts. But in reality, it has the same (albeit not as refined) capabilities as ChatGPT.

> But it still followed my instructions

Well, it turns out that following instructions can be reduced to a symbol manipulation task. Again, you're giving it too much credit. I do agree that it is wide, but it is not as wide as Google or Wikipedia, which would represent humanity I guess.

> it is successfully generalizing zero-shot to new NLP tasks.

As are lesser models. Transformer based models are fairly successful at it and we have hypothesized this since GPT2, and confirmed it with GPT3. But one thing: technically it generalized few-shot to a new NLP task. It hallucinates on zero shot problems generally or states that it doesn't know. Ask it, for an example, what a "gebutzeripanim" is. I made that up just now.

As for the task you gave it, you cannot claim it is zero shot, as you cannot prove its components were not in the database. Unless you want to say that you're pretty sure the prompt you gave it was not in the database, but hey, that can apply to all generative models, that's what generalization is. But there are tasks it fails on because it just cannot do some things. Ask it to integrate or derive certain functions and you'll quickly see what I mean.

It can tell you all you want to know about integration, it can tell you all the rules perfectly, but it simply cannot apply them as well.

blose1 t1_j3j5vpp wrote on January 8, 2023 at 10:59 PM

>Ask it, for an example, what a "gebutzeripanim" is. I made that up just now.

Q: what gebutzeripanim is?

A: I'm sorry, but I am unable to find any information about "gebutzeripanim." Could you please provide more context or clarify what you are asking about?

suflaj t1_j3ld7gy wrote on January 9, 2023 at 10:38 AM

Exactly. It's not exactly meant to be a zero shot model, but rather a conversational agent. If it were a zero shot model, here is what it could say:

gebutzeripanim is a new phrase for me
based on its start it sounds german
based on its middle it sounds vaguely korean
based on its end it sounds vaguely turkish
it might be made up slang

But in conversation you wouldn't just start decomposing stuff you never heard about. Here are what it says if you try to fish out these answers:

I'm sorry, but I'm not familiar with the term "gebutzeripanim." Could you provide more context or clarify what you are referring to?
Without more context or information about the word, it is difficult for me to determine whether it has a connection to the German language. "Gebutzeripanim" is not a word that I am familiar with, and a search of online resources did not yield any results. It is possible that this is a made-up or invented word, or that it is used in a specific context that is not familiar to me.
Without more context or information about the word, it is difficult for me to determine whether it has a connection to the Korean language. "Gebutzeripanim" is not a word that I am familiar with, and a search of online resources did not yield any results. It is possible that this is a made-up or invented word, or that it is used in a specific context that is not familiar to me. It is also possible that the word includes elements from multiple languages or that it is a hybrid of several languages. Without more information, it is difficult to say for sure.
Without more context or information about the word, it is difficult for me to determine whether it has a connection to the Turkish language. "Gebutzeripanim" is not a word that I am familiar with, and a search of online resources did not yield any results. It is possible that this is a made-up or invented word, or that it is used in a specific context that is not familiar to me. It is also possible that the word includes elements from multiple languages or that it is a hybrid of several languages. Without more information, it is difficult to say for sure.
It is not possible for me to determine whether "gebutzeripanim" is a made-up or invented word, as I do not have sufficient information about the word. It is possible that "gebutzeripanim" is a slang term that is not widely known or used, or that it is a word that has been created recently and has not yet gained widespread recognition. Without more context or information, it is difficult to say for sure what the word means or how it is used.

It mostly bases it answers on the fact that it doesn't know. And given that it uses past answers as part of the prompts you can see that the second time I ask it for a language it assumes it can be made out of more languages, but not because it understands it, only because I mentioned it.

If you ask it in a new session whether it is made out of words or phrases from several languages, it answers with

> I'm sorry, but I am unable to find any information about a word spelled "gebutzeripanim." It is possible that this is a made-up word or a word from a language that I am not familiar with. Can you provide any context or additional information about the word that might help me to better understand it?

Since it basically needs to explicitly see things in training, it's not really a zero-shot, but rather a few-shot model. There are instances where it seems like it can connect the dots but you can't really say it happens in the general case...

gamerx88 t1_j3fx20a wrote on January 8, 2023 at 7:15 AM

I am very impressed by the underlying GPT3.5 LLM and the capabilities that alignment via RLHF has unlocked in LLM, but I don't believe any serious NLP researchers or practitioners think that NLP is solved.

There are still tonnes of challenges and limitations that needs to be solved before this tech is ready. E.g The very convincing hallucinations, failure on simple math problems, and second order reasoning tasks amongst others. And many other areas that remains unresolved in NLP as well.

Having been in the NLP field for close to 10 years and having experienced several other developments and paradigm shifts in the past (RNN/LSTM, Attention, Transformer Models, LLMs with emergent capabilities) , I am more optimistic than fearful of this development's impact on our job.

Each of these past developments made obsolete certain expertise, but also expanded the problem space that NLP can tackle. The net effect however has been consistently positive with the amount of money and demand for NLP expertise increasing.

visarga t1_j3ecrx8 wrote on January 7, 2023 at 11:36 PM

Yes, I agree traditional NLP tasks are mostly solved, a possibly large number of new skills unlocked at once. And they work so well without fine-tuning, just from the prompt.

So take your task to chatGPT (or text-davinci-003), label your dataset or generate more data. Then you finetune a slender transformer from Huggingface. You got an efficient and cheap model.

[deleted] t1_j3ea4u5 wrote on January 7, 2023 at 11:18 PM

LLMs cost a lot of bucks, and sometimes you just need something simple and fast, and sometimes dont know about very specific domains or tasks.

leeliop t1_j3dw1fc wrote on January 7, 2023 at 9:42 PM

I have heard its essentially Googling with extra steps, are you certain its actually creating novel solutions to novel problems or is it just scraping together Googlable elements? Maybe I have subconsious bias as I develop for a living

singularpanda OP t1_j3e13z7 wrote on January 7, 2023 at 10:16 PM

It's not just googling. I can summarize the information it has and write a good answer to the questions. It can even have some inference capability.

Kingstudly t1_j4xygdq wrote on January 19, 2023 at 1:06 AM

Not really. It's taking an input and providing the statistically most likely string of words that are associated with it. There's far more to NLP than that. Think about how a human can see a word they've never seen before and infer it's meaning based on context clues. I'm not sure any publicly available system can do that.

New words are entering every language constantly. There's no way to train such a massive model to keep up as fast as a human or purpose built system can.

Freed4ever t1_j3e04qq wrote on January 7, 2023 at 10:10 PM

I'm not in the field, but would be curious. Since you are in the field, why don't you try it out yourself and tell us. FWIW, majority of everyday problems can be solved by putting Googlable elements together properly.

singularpanda OP t1_j3e17rj wrote on January 7, 2023 at 10:17 PM

I have tried and found it is a huge advance in this area. It not just googling. It has some inference capability.

Freed4ever t1_j3e2ilt wrote on January 7, 2023 at 10:26 PM

Again, not in the field so don't laugh at me, but would there be opportunity / value to apply a Meta layer on top of ChatGPT? We know that it needs to be prompted certain ways, so would there be an opportunity to tune the prompting and also to evaluate the responses? Maybe you can apply your skills on this Meta layer?

singularpanda OP t1_j3e6zvy wrote on January 7, 2023 at 10:56 PM

I guess openai will not open the model for us to apply a meta layer. It will remain a black box. So, this is why we cannot do anything on top of it.

Freed4ever t1_j3e790p wrote on January 7, 2023 at 10:58 PM

Could you just use the Api and treat it like a blackbox?

SartoriusX t1_j3e3lk5 wrote on January 7, 2023 at 10:33 PM

Is this true? What type of inference would it be capable of?

singularpanda OP t1_j3e5ct5 wrote on January 7, 2023 at 10:45 PM

I have tried many cases. For example. It gives correct proof of one of my technical lemmas in my own paper which make me quite amazine. It is a simple lemma, but it is very specific to my question. I also tried to search with google but do not find the answer.

El_Diel t1_j3e2sr5 wrote on January 7, 2023 at 10:28 PM

When I used it it kept saying it had no connection to the internet and was trained on a large amount of text and data. I tested in two languages.

At the time I used it the answers to most questions were structures in the same way: paraphrasing the question, weighing a few pros/cons or facts, summary. Almost every answer to a question that required a decision was inconclusive and ChatGPT usually said it was difficult to answer the question.

As an interface for human-machine-communication it was great. But the conversations were simple and lacked depth. It can write short stories and expand these stories. And it creates poems and jokes. I’d say you are lucky if it comes up with something that is above middle school level.

The next version will be far better I believe.

NotARedditUser3 t1_j3e0j6w wrote on January 7, 2023 at 10:12 PM

This.

It basically is just a good google searcher, that can articulate results in a helpful way.

It may be useful to save time researching things... But it has had some laughable failure results as well.

Western-Elevator-456 t1_j3e6vx8 wrote on January 7, 2023 at 10:55 PM

Are you sure google solves a novel problem? From what I’ve heard it just pulls together a bunch of web pages that you could get with urls.

singularpanda OP t1_j3em4ap wrote on January 8, 2023 at 12:42 AM

But google really change the way we are working. This is why I guess there may be another change.

Western-Elevator-456 t1_j3emuvx wrote on January 8, 2023 at 12:47 AM

Which is my point. Although the comment was also somewhat tongue-in-cheek.

PassingTumbleweed t1_j3gq662 wrote on January 8, 2023 at 1:18 PM

Not every physicist can afford a particle accelerator, but that doesn't stop them from researching particle physics.

Chat gpt makes basic reasoning errors that even a child wouldn't make, which makes me think this is a weakness of the current approach. Maybe "more data" is not the solution to this problem. This is one direction I would consider.

Accomplished-Low3305 t1_j3gzgux wrote on January 8, 2023 at 2:41 PM

I think you are overestimating ChatGPT a lot. It hallucinates information, it fails adding numbers, it fails at solving problems and complex reasoning, and a lot more. ChatGPT is great, but it has not solved NLP.

Borrowedshorts t1_j42uyey wrote on January 12, 2023 at 8:12 PM

No and just think about it. If LLM's become monetizable at the scale that other tech areas such as search or social media has, there's a ton of opportunity there, and you have a leg up on everyone else.

I_will_delete_myself t1_j3f9e8j wrote on January 8, 2023 at 3:36 AM

I learned this today. The moment you leave the Google search engine, is the moment it turns to total useless garbage.

TeamRocketsSecretary t1_j3fmzxw wrote on January 8, 2023 at 5:30 AM

Fusion of LLM and vision models is something I’m noticing more work on. Also, embodied feedback with human in the loop, especially towards robotics applications. The vision field def seems to be co-opting language models and there is research on making inference with them faster (recurrent-transformers) and bringing back recurrence into the transformer which is interesting since transformers succeeded them naturally once the power of attention came to light.

Also a lot of work to be done on using them for mission critical applications (healthcare) as well as “robustifying” them (transformers using raw byte sequences showing much more robustness to noise.)

So I guess a lot of the native NLP tasks that LLM were made for are being used more for non-NLP tasks, especially now in reinforcement learning.

[deleted] t1_j3e1qy9 wrote on January 7, 2023 at 10:21 PM

[removed]

KingsmanVince t1_j3erhcv wrote on January 8, 2023 at 1:21 AM

No. We still need NLP researchers to understand the output of ChatGPT. ChatGPT exists to help not to replace.

Longjumping_Essay498 t1_j3fjfbo wrote on January 8, 2023 at 4:57 AM

Domain specific LLM's need not to be huge like these LLM's like chatgpt. They have world knowledge. In most of the settings, we don't need that.

[deleted] t1_j3gtbpc wrote on January 8, 2023 at 1:49 PM

[removed]

Featureless_Bug t1_j3fwvj9 wrote on January 8, 2023 at 7:12 AM

You are a lousy researher then. The trend of using incredibly large models was there a long time ago, so individual researchers couldn't produce SOTA NLP models for years already. And Chat GPT isn't even a great model compared to something like Chinchilla - you should know that, actually

singularpanda OP t1_j3g7v2j wrote on January 8, 2023 at 9:30 AM

It's a sad story as I put a lot of time on generation during these years. Any possible suggestions that our research can focus on?

Comments