top of page

Additional Ideas for Making Stronger NER Formatting Models


In the PyData Israel workshop we recently, learned how to use RNN and CNN networks with Word embeddings to make a Named Entity Recognition Model to automatically bold, italicize and underline your text.


If you didn’t catch the workshop check out the amazing slides and repo by uri goren below.


https://github.com/urigoren/nlp_ner_workshop

Now that we’ve trained our baseline model here are some areas that you can explore to improve the model on your own time.


1. Replace Pretrained embeddings with Contextual Embeddings such as BERT or ELMo

https://github.com/huggingface/pytorch-pretrained-BERT


For an example cloud implementation check out the tutorial I put together on Azure:

https://github.com/microsoft/AzureML-BERT/blob/master/PyTorch/Pretrained-BERT-NER.ipynb


2. Combine Embeddings with Character Level, CNNs or RNNs for handling unseen words

https://eli.thegreenplace.net/2018/understanding-how-to-implement-a-character-based-rnn-language-model/

3. Combine Linguistic Features with your Embeddings

https://spacy.io/usage/linguistic-features

https://allennlp.org/models

4. Add Self-Attention Mechanisms to your RNN

https://towardsdatascience.com/deep-learning-for-named-entity-recognition-2-implementing-the-state-of-the-art-bidirectional-lstm-4603491087f1

5. Add Beam Search To Your Decoder


6. Try annotating more data

https://towardsdatascience.com/text-annotation-on-a-budget-with-azure-web-apps-doccano-b29f479c0c54

These should provide some great next steps for your journey into NLP.


Additionally if the field interests you check out the following posts:

https://medium.com/microsoftazure/7-amazing-open-source-nlp-tools-to-try-with-notebooks-in-2019-c9eec058d9f1

https://towardsdatascience.com/beyond-word-embeddings-part-1-an-overview-of-neural-nlp-milestones-82b97a47977f

https://towardsdatascience.com/beyond-word-embeddings-part-2-word-vectors-nlp-modeling-from-bow-to-bert-4ebd4711d0ec

https://towardsdatascience.com/beyond-word-embeddings-part-3-four-common-flaws-in-state-of-the-art-neural-nlp-models-c1d35d3496d0

https://towardsdatascience.com/beyond-word-embeddings-part-4-introducing-semantic-structure-to-neural-nlp-96cf8a2723fb

If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter.


About the Author Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

2 views0 comments
bottom of page