Submitted by Devinco001 t3_105la5f in MachineLearning
Hi everyone, I have to cluster a large chunk of textual conversational business data to find relevant topics in it.
Since there is lot of abstract info in every text like phone, url, numbers, email, name, etc., I have done some basic NER using regex and spacy NER to tag such info and make the texts more generic and canonicalized.
But there are some things like product names, raw materials, brand/model, company, etc. which couldn't be tagged. Also, the accuracy of regex and spacy NER isn't high enough.
Can anyone suggest a good python NER library, which is accurate and fast enough, preferably has pre-trained models and can tag diverse fields.
Thanks.
stu1011 t1_j3bp0om wrote
If spaCy’s NER isn’t picking up what you need, you’ll probably need to look into creating your own annotations and fine tuning a model or training a custom model. It isn’t too hard using BIO/BILOU tags. Things like “raw materials” and particularly niche models and brands are unlikely to be picked up by off the shelf solutions.