Quantcast
Channel: Active questions tagged string-manipulation - Mathematica Stack Exchange
Viewing all articles
Browse latest Browse all 189

How to train custom named entity recognition?

$
0
0

Version 12 has basic NER support for some entities, but how does one recognize a custom entity?

For example, I want to parse text describing products, and parse out three entities: prices, size, and color, as custom entities. Using TextCases and TextContents works nicely for price and color:

TextCases["A red tshirt costs $5 and is medium", {"Color", "CurrencyAmount"}]

enter image description here

But there is no way to parse entities that are not listed in guide/TextContentTypes, like sizes:

TextCases["large women's leather jacket", {"Size"}]

enter image description here

And there is no way to add additional synonyms or spellings to an entity:

TextCases[# <> " feather", "Color"] & /@ {"violet", "lilac", "lavender", "royal", "purpled", "plum", "grape", "maroon", "magenta", "purplish"}

enter image description here

I want to extend the built-in NER model with custom training data, i.e. substring labels:

newTrainingData = <|"Who is Nishanth?"-> {8, 16, "Name"},"Who is Kamal Khumar?" -> {8, 20, "Name"},"I like London and Berlin." -> {{8,14, "City"}, {19,25,"City"}|>

If this is not supported in 12.2, perhaps there is a 3rd party paclet, resource function, or some NN repo entry to extend? or maybe it's coming in 12.3?

Related:


Viewing all articles
Browse latest Browse all 189

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>