Three Machine Learning Lessons Learned for Building Intelligent Applications - TenPoint7
21957
post-template-default,single,single-post,postid-21957,single-format-standard,bridge-core-3.0.1,qode-page-transition-enabled,ajax_fade,page_not_loaded,,qode-theme-ver-30.4,qode-theme-bridge,wpb-js-composer js-comp-ver-6.7.0,vc_responsive

Three Machine Learning Lessons Learned for Building Intelligent Applications

Three Machine Learning Lessons Learned for Building Intelligent Applications

After creating many machine learning models for clients, I would like to share several machine learning lessons we learned while developing TenPoint7 Intelligent Applications. 

Lesson 1 – We need a platform to build machine learning applications.

NLU, natural language understanding, is a core technology in identifying event detection from text data. For example, we used this technology in building a supplier risk application for a customer. State-of-the-art NLU models are mostly using sequence (Deep Learning) models, for example, Long short-term memory (LSTM) and GRU Networks. In plain english,“Sequence” means the model captures not only the words, but also the order of their appearance, i.e. the context. John Rupert Firth, an English linguist and a leading figure in British linguistics during the 1950s, famously said “You shall know a word by the company it keeps”. 

For example, the following two sentences describe the same lawsuit, but the order of words is quite different.

“ABC Corp. is being sued by a former employee.”,

“A former employee has brought charges against ABC Corp.”

A good NLU model can detect a lawsuit risk event from either sentence. 

In a deep learning online course, thought leader Andrew Ng said that in the age of big data, the goodness of a NLU model is largely a function of the quantity and quality of the training data. In reality, most of our customers don’t have any historical news data for their current suppliers. Lack of data, especially labeled data that can be used for training, is the single biggest challenge that we see in the real world when leadership wants machine learning in every business application!”. 

What we learned here is that you need a smart data ingestion platform in order to build machine learning applications. Let’s say, if you want to collect historical news of lawsuits and government investigations so you can train models to recognize those events in the future, what can you do from scratch? You probably google this type of news and wish you have a genie that will open every top result returned by the search engine, scrape all the text and capture the dates automatically. The good news is that we have that genie. In TenPoint7, our search-enabled smart data ingestion platform is very mature.

Lesson 2 – Think differently about rule-based v.s. machine learning. 

The key lesson learned here is that you need both rule-based and machine learning algorithms to build a business app.  You may wonder what is the role of rule-based algorithms in the age of AI.

Not all algorithms are machine learning or AI; but classic algorithms are very powerful. 

Most of the software we use today are still rule-based. I don’t believe we will somehow abandon the rule-based algorithms any time soon. The early versions of the Google search engine are 100% rule-based, but few people complained that it is not “intelligent”. Admittedly, today’s Google search engine is powered by deep learning algorithms. What we learned from Google is that there is a natural path from rule-based to machine learning algorithm. 

In building our apps, we also take this glide-path approach. After we use our search-enabled ingestion platform to acquire large amounts of documents, we define keywords-based rules in order to detect risk events as the first cut. Let’s say one rule for detecting bankruptcy is as follows:

“ABC Corp. … filed for bankruptcy.”

Mostly this rule detects bankruptcy events correctly, but occasionally it is not sufficient, for example,

“ABC Corp has seen stable growth, while its competitor filed for bankruptcy”. 

So ABC Corp did not file for bankruptcy, it’s a false positive case. Human language is very flexible and there are often synonyms of words, rules won’t be perfect even if we make many rules. To obtain higher accuracy, a supervised NLU model is used to understand the context of “bankruptcy” such as the word “competitor”, so it can correctly predict that this is not news about ABC Corp bankruptcy. 

Our analysts inspect the detected events and relabel the incorrect ones as false positives. By combining rule-based first cut and human labeling, we acquire the highest quality training dataset that can be used to train NLU models. In summary, we employ a two step process. First we use rules to take an initial guess about what type of event the news is about. Second we use event specific NLU models to eliminate false positives.

Last but not least, rule-based programs create a baseline of event detection accuracy. When measuring how good ML event detection is, having this baseline of comparison is highly valuable. For example, Google would have made sure that search results from their deep learning engine is superior to those from their early generation of rule-based engines.

Lesson 3Transfer learning model is very helpful when you have small data.

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize general sentiment could apply when trying to recognize financial news sentiment.Transfer learning is very effective when you only have limited labeled training data for NLU models.

You’ve probably heard of word embedding. Word embedding is a good example of transfer learning. A word embedding is a learned representation for text where words that have the same meaning have a similar representation. Many organizations have released pre-trained embeddings on very large text corpora. For example, Google News word vectors released by Word2Vec cover 3 million words and phrases. In our app, we trained classification models on top of word embedding to eliminate false positive events.

Another good example of applying transfer learning is the sentiment risk detection. Strong negative sentiment often exists in the news indicating supplier risk. We employed Universal Language Model Fine-tuning (ULMFiT), another transfer learning approach to achieve best accuracy with smaller dataset of labeled supplier news.  

In summary, by applying these three essential lessons

  1. We need a platform to build machine learning applications.
  2. Think differently about rule-based v.s. machine learning. 
  3. Transfer learning model is very helpful when you have small data

to machine learning, we have been able to develop advanced applications for clients quickly and cost effectively.

Fen Wei is the director of Data Science at TenPoint7 and he has been with TenPoint7 for more than 4 years. He previously worked at MathWorks for 15 years and developed several MATLAB signal processing and communications products. In his second phase of MathWorks career, Fen worked in the Asia Pacific regional offices, helping customers applying MATLAB machine learning solutions in business and engineering problems. Fen received a master of computer and electrical engineering degree from Cornell University.



TenPoint7