Viewing a single comment thread. View all comments

Devinco001 OP t1_ix08pyr wrote

I am going to use the embeddings for clustering the text in an unsupervised manner to get the popular intents actually.

1,2. Would be fine with a bit of trade off in accuracy. Time is the main concern, since I want it not to take more than a day. Maybe, I have to use something other then BERT

  1. Googled them out and RoBERTA seems to be the best choice. Much better than base BERT or larger BERT

  2. I actually asked this because Google collab has some restrictions on the free usage

  3. Thanks, really good article

1

pagein t1_ix2wkue wrote

If you want to cluster sentences, take a look in LABSE. This model was specially designed for embedding extraction. https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html?m=1

2

Devinco001 OP t1_ix710w3 wrote

This looks really interesting, thanks. Is it open source?

1

pagein t1_ix71gqd wrote

There are several pretrained implementations:

  • Pytorch implemenatation using HuggingFace Transformers Library under Apache 2.0 license
  • Original Tensorflow model on Tensorflow Hub under the same Apache 2.0 license.
2