Combining LSTMs with CNNs seems to a popular theme these days. It makes sense: CNNs can be very fast to train for language recognition tasks, and are certainly easier to understand/debug for the most part. This paper shows how to combine a word-level LSTM with a character level LSTM to effectively perform named-entity recognition.
The model closely follows the title. A bi-directional LSTM is formed using input word embeddings (from GloVe, Google, or locally trained) and a few simple features that are useful for entity recognition: capitalization and a small set of lexicon features. The lexicon features are simple but worth noting: effectively, they map a word to a BIOES annotation (begin, inside, outside, end, single) notation. The entities used for the lexicon are primarily drawn from DBPedia. The lexicon features are divided into 4 types: Location, Organization, Person and Misc. This categorization seems rather arbitrary, but follows that specified by the dataset.
The detailed description of all of the hyper-parameters of the model was very nice to see. While adaptive learning rate methods (e.g. AdaGrad) have made the learning rate less sensitive, it’s still useful to know the whole search space used for a model. A note about the sensitivity of the model to the hyper-parameters would have been great as well.
It’s hard to quantify how much “deep learning” contributes to the results of this paper. While the overall performance ends up being slightly better than previous work, it’s not by much. As the authors suggest, the amount of manual feature engineering required does appear to be less for this model than in previous work: using the character level CNN effectively allows them to opt-out of engineering lexical features, and using LSTMs reduces the need to hunt for a way to identify a reasonable context.