Building a medical search engine — Step 3: Using NLP tools to improve search results
The time caregivers can spend with their patients reportedly declines whereas the amount of available information increases, making their work harder each day (notwithstanding the pandemic). At Posos, we want to enable doctors to support their therapeutic decisions with the latest medical data in order to provide the best possible care for each patient. This is what motivates us to build a search engine enhanced by medical-specific NLP tools built on authoritative data.
Summary
The time caregivers can spend with their patients reportedly declines whereas the amount of available information increases, making their work harder each day (notwithstanding the pandemic). At Posos, we want to enable doctors to support their therapeutic decisions with the latest medical data in order to provide the best possible care for each patient. This is what motivates us to build a search engine enhanced by medical-specific NLP tools built on authoritative data.
This article is the third of a series showing how medical-specific NLP tools can be built and used to build a medical search engine.
- The first article explains why it is necessary to train domain-specific word embedding and how we do it at Posos.
- The second article describes the Named-Entity Recognition (NER) and Named-Entity Linking (NEL) models that we built.
This third article details how those tools can be used to improve the accuracy of a medical search engine like the one we built at Posos. If you are not familiar with word embeddings, NER or NEL, I suggest that you check out those previous articles before reading this one.
BM25: a popular algorithm for document retrieval
At Posos, our textual search engine relies mainly on the Okapi BM25 algorithm [1] through Elasticsearch. It is not the latest state-of-the-art model in NLP for document retrieval but it is a very popular trade-off between accuracy and scalability.
How it works
Okapi BM25 is a scoring model that gives a relevance score of a document with respect to a query. It is a probabilistic models that compares for each word of the query, its frequency in the given document with its frequency in the whole corpus. Intuitively, if a word from the query is frequent in the document but also frequent in a lot of other documents, it will be less important than a word that is frequent in the document but rare in the corpus.
For example, in the query “What are the side-effects of Paracetamol ?”, the word “the”, which should be present in all documents (provided they are all in English), whereas “side-effects” or “paracetamol” should be more specific of a small subset of documents. Hence those latter words will be weighted more in the BM25 score of each documents.
This algorithm is quite popular because it is relatively fast, accurate and implemented in popular search engines like Elasticsearch, itself built upon the widespread search library called Lucene.
Drawbacks of BM25
Despite its popularity, BM25 suffers from two main drawbacks:
- It’s a bag-of-words model, meaning it does not deal with the order of words. Contextualization of a word is not possible without some pre-processing: the “a” in “vitamin a” cannot be treated differently that those in “hepatitis a” or “a drug”.
- It cannot leverage semantic similarity between distinct words such as synonyms. It is an exact term-matching algorithm, if the query contains “side-effects” but a document uses the term “adverse effect” it won’t be scored accurately.
Alternatives to BM25
Knowing these drawbacks of BM25, one might be tempted to replace it with latest models based on deep learning. Those models learn dense representations of words, meaning we can leverage some sort of semantic similarity between words. And some of those models, like BERT [2] are able to learn contextualized representations of words.
However, deep learning models also tend to have their drawbacks. They have millions of parameters (110 millions for BERT base) which means that it is difficult to build a scalable search engine with such models. Huge amount of training data is also needed for those models to be adapted to the domain-specific issue at hand.
These models produce dense representations for which support in popular search library like Lucene is still scarce, not only because it is newer, but mainly because search among a dense space is harder as it cannot benefit from the inverted index technique that makes BM25 so fast.
Hence, BM25 cannot be discarded right away and replaced by a heavy neural network like BERT. At Posos, we chose to use lighter models for simpler auxiliary tasks that allow to enhance the result of BM25. Our NER, NEL and our word embedding are three of those models.
How to improve results with NER, NEL and word embedding
The NER model is a neural network which allows us to improve BM25 results on our corpus of medical documents by reformulating the query and filtering documents based on the detected entities, whereas the word embedding allow us to find words that are semantically close to those of the query and thus expand it.
Filtering entity-annotated documents.
The documents in our corpus that are drug-oriented were tagged with the right drug, and when needed, doses (e.g. 500 mg/L) and dose forms entities (e.g. capsule or liquid solution). For example, the Summaries of Product Characteristics (SmPC, a kind of detailed leaflet intended to healthcare professionals) are split by titles and tagged with the corresponding entities.
Hence, when a user query contains a drug name, even with some misspelling or using synonyms (such as “acetaminophen” instead of “paracetamol”), the NER (Named Entity Recognition) will extract the drug from the query and the NEL (Named Entity Linking) will link it to a normalized name. This drug normalized name enables BM25 to run exclusively on the subset of the corpus that is tagged with the right drug.
Another upside of this method is that it allows to remove the drug from BM25 scoring as it will be the subject of almost every sentence in the already filtered documents.
Reformulating the query (for misspelled or unconventional words)
But even if the documents are not tagged with their relevant entities, NER and NEL can be useful. By mapping some words of the query to entity databases, we can expand the query with synonyms of the said entity. It also corrects some misspelling errors by normalizing the drug name.
Query augmentation with word embedding
Projection on a 2D space of the 30 closest words of “patch” along with grey points for the following 970. Closest words to “patch” are other ways to write the same words (plural “patchs”, “patches”), synonyms like “dispositif transdermique”, then words with close meaning like “timbre” “application”, and also several drug names which are trans-dermal patches. The (signed) norms of the vectors are scaled such that the distance to “patch” in the 2D space is equal to one minus the cosine similarity, hence the fact that you cannot find points farther than a distance of 1.
Word embedding places words in a high-dimension space, where utterances that are semantically similar should have a smaller distance between them. The figure above shows a projection of this multi-dimensional space on a 2D plane with a subset of our vocabulary concerning medical specialties and corresponding anatomical parts.
By looking for the words in our vocabulary that are the closest to those of the query we can expand it and give some context to the query. For example the two closest words of “patch” (which meaning is transparent) in our french embedding are:
- “patchs” with 0.84 similarity (best is 1.0) which is the plural of “patch”.
- “dispositif transdermique” (trans-dermal device) with similarity 0.84 which is a synonym of “patch” and might be the more frequently used of both medical regulatory documents. Note that we used something called a “phraser” in pre-processing steps of the embedding training, allowing to have expression of two words frequently co-occuring in our embedding.
- “transderm” (0.80) which is just the stem of “transdermique”.
- “emlapatch” (0.80) which is the name of a drug that is a patch.
Of course, we might not want these added words to have the same importance as those originally found in the query. Hence we weight them accordingly to their similarity.
Conclusion
To improve user experience with search engines, one might be tempted to use the latest end-to-end state-of-the-art models like BERT. However, from a practical point of view, more classical approaches like BM25 still have their say in domain-specific problems. At Posos, we found that it was more convenient to improve BM25 with auxiliary task-specific neural models.
These tools find multiple use-cases in our product to create a seamless search experience for healthcare professionals:
- Recognizing synonyms and abbreviations to allow physicians to use the first words that come to mind instead of an arbitrary taxonomy,
- Correcting misspelled drugs to prevent errors and save time,
- Suggesting related drugs and queries to contextualize each question,
- Extracting drugs, side-effects and patient profiles from complex queries to retrieve the most pertinent medical documents.
Nevertheless, harnessing the power of BERT-like models is becoming easier and easier. Google has announced that it is using BERT to improve the results of its search engine, although its use is restricted to roughly one over ten queries and solely in English. Posos is keeping up-to-date with state-of-the-art document retrieval research, eager to improve healthcare professionals’ daily practice.
References
[1] Stephen E. Robertson and Karen Sparck Jones.Relevance Weighting of Search Terms, page 143–160.Taylor Graham Publishing, GBR, 1988. ISBN 0947568212.2
[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: pre-training of deepbidirectional transformers for language understanding.CoRR, abs/1810.04805, 2018. URL http://arxiv.org/abs/1810.04805.
Thanks to Patricio Cerda, François, and Emmanuel Bilbault.