gamerx88

gamerx88 t1_jdn1dd3 wrote

> In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

I agree but I think data is already a limiting factor today, with the largest (that is public knowledge) models at 175B. The data used to train these models supposedly already cover a majority of the open internet.

1

gamerx88 t1_jctqruk wrote

For ETL, write unit tests to handle some input edge cases. E.g Null values, mis-formatting, values out of range as well as some simple working cases.

For model training, the test focus is on having "valid" hyperparams and configurations. I write test cases to try to overfit on a small training set. i.e Confirm the model learns. There are also some robustness tests that I sometimes run post training, but those are very specific to certain NLP tasks, applications.

For model serving, successful parsing of the request and subsequent feature transformation (if any), very similar to ETL.

2

gamerx88 t1_j70rs5v wrote

Without referring to the paper again, my intuition is that a pairwise loss over final outputs does not gel well with how the model is auto-regressively generating the text.

Generation with GPT is basically a token by token decoding process with the previous time steps taken into account. Think about the difference between a supervised learning problem vs reinforcement learning. The former ignores the step-by-step nature of the generation scheme, and is a poorer fit for a decoding problem.

1

gamerx88 t1_j6cqerx wrote

It's not about large data or number of parameters. OpenAI has not actually revealed details regarding ChatGPT's architecture and training. What is special is the fine-tuning procedure -- alignment through RLHF on the underlying LLM (nicknamed GPT3.5) that is extremely good at giving "useful" responses to prompts\instructions.

Prior to this innovation, zero-shot and in-context few-shot learning with LLM was hardly working. Users had to trial and error their way to some obtuse prompt to get the LLM to generate some sensible response to their prompt, if it even worked at all. This is because LLM pre-training is purely about language structure without accounting for intent (what the human wishes to obtain via the prompt). Supervised fine-tuning based on instructions and output pairs helped but not by much. With RLHF however, the process is so effective that a mere 6B parameter model (fine-tuned with RLHF) is able to surpass a 175B parameter model. Check out the InstructGPT paper for details.

2

gamerx88 t1_j3m0drc wrote

Yes, we used DistilBERT (and even logistic regression) heavily in my previous startup where data volume was web scale.

Depending on the exact problem, large transformer models can be an overkill. For some straightforward text classification even logistic regression with some feature engineering can hit within 3% point of a transformer, and costs a negligible fraction of them.

3

gamerx88 t1_j3fx20a wrote

I am very impressed by the underlying GPT3.5 LLM and the capabilities that alignment via RLHF has unlocked in LLM, but I don't believe any serious NLP researchers or practitioners think that NLP is solved.

There are still tonnes of challenges and limitations that needs to be solved before this tech is ready. E.g The very convincing hallucinations, failure on simple math problems, and second order reasoning tasks amongst others. And many other areas that remains unresolved in NLP as well.

Having been in the NLP field for close to 10 years and having experienced several other developments and paradigm shifts in the past (RNN/LSTM, Attention, Transformer Models, LLMs with emergent capabilities) , I am more optimistic than fearful of this development's impact on our job.

Each of these past developments made obsolete certain expertise, but also expanded the problem space that NLP can tackle. The net effect however has been consistently positive with the amount of money and demand for NLP expertise increasing.

12

gamerx88 t1_j239l9d wrote

Reply to ML Impacts [D] by evomed

Technological improvements and economic restructuring taking away jobs is nothing new and is not AI specific. Such creative destruction ultimately leads to a productive economy and better standards of living for all.

I do recognize however, that these net benefit is not equally distributed throughout society. Those who bear the brunt of the cost (unemployment) may not even get a shred of the payoff from improved productivity. Secondly, I do think that the potential scale of disruption from AI may be far greater than other occasions in history, and there may be extremely short term suffering on an unprecedented scale in the short term.

Hence, I do think that policymakers should seriously consider the ideas of universal basic income and enhanced social safety nets when the time comes.

1

gamerx88 t1_iwkogi7 wrote

Work in NLP. Ever since HuggingFace became mainstream we almost never had to do this.

We used to have to implement the cutting edge stuff ourselves because papers do not come with code, or give code that requires huge amount of work to run. Now they often appear on HF within a few weeks of publication.

The only occasion in the last 2 or 3 years where I wrote a DNN from scratch was when I had to give a short lecture, for pedagogical reasons.

1