Submitted by Devinco001 t3_yzh6v1 in MachineLearning
skelly0311 t1_iwzz7td wrote
For starters, why are you generating word embeddings? First the Bert model generates word embeddings by tokenizing strings into a pre trained word vector, then you run those embeddings through a transformer for some type of inference. So, I'll assume you're feeding those word embeddings into an actual transformer for inference. If this is true.
- depends on time requirements. Larger models will generally be more accurate, but also take a lot more time to perform inference than smaller models
- See above
- In my experience, and according to papers, ELECTRA and RoBERTA are variants of BERT that have outperformed BERT on experiments
- Again, for inference, this depends on many factors, such as the max amount of tokens per inference example
- https://mccormickml.com/2019/07/22/BERT-fine-tuning/
Devinco001 OP t1_ix08pyr wrote
I am going to use the embeddings for clustering the text in an unsupervised manner to get the popular intents actually.
1,2. Would be fine with a bit of trade off in accuracy. Time is the main concern, since I want it not to take more than a day. Maybe, I have to use something other then BERT
-
Googled them out and RoBERTA seems to be the best choice. Much better than base BERT or larger BERT
-
I actually asked this because Google collab has some restrictions on the free usage
-
Thanks, really good article
pagein t1_ix2wkue wrote
If you want to cluster sentences, take a look in LABSE. This model was specially designed for embedding extraction. https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html?m=1
Devinco001 OP t1_ix710w3 wrote
This looks really interesting, thanks. Is it open source?
pagein t1_ix71gqd wrote
There are several pretrained implementations:
- Pytorch implemenatation using HuggingFace Transformers Library under Apache 2.0 license
- Original Tensorflow model on Tensorflow Hub under the same Apache 2.0 license.
Devinco001 OP t1_ix75z7w wrote
Will surely check them out, thanks
GitGudOrGetGot t1_ix3s761 wrote
>First the Bert model generates word embeddings by tokenizing strings into a pre trained word vector, then you run those embeddings through a transformer for some type of inference
Could you describe this a bit further in terms of inputs and outputs?
I think I get htat you go from a string to a list of individual tokens, but when you say you then feed that into a Pre Trained Word Vector, does that mean you output a list of floating point values representing the document as a single point in high dimensional space?
I thought that's specifically what the transformer does, so not sure what other role it performs here...
Viewing a single comment thread. View all comments