MATLAB Statistics and Machine Learning in Credit Risk Modeling
post-template-default,single,single-post,postid-18375,single-format-standard,bridge-core-2.1.2,ajax_fade,page_not_loaded,,qode-theme-ver-21.6,qode-theme-bridge,wpb-js-composer js-comp-ver-6.1,vc_responsive

MATLAB Statistics and Machine Learning in Credit Risk Modeling

MATLAB Statistics and Machine Learning in Credit Risk Modeling

This is my first blog here; my name is Fen. I was with MathWorks for almost 15 years. MathWorks is the creator of MATLAB andI built some of the modules in MATLAB. In the past 5 years, I led a team of consulting engineer to build data analytics solution for customers in APAC. In recent years, commercial banks and asset management companies in China started to build more quantitative models to measure credit risk. With our help, some of them chose to use MATLAB statistics and machine learning modules to build credit risk models; I would like to share a common approach I used on credit risk projects.

Imagine I am a quant who oversees a decently-sized portfolio of corporate bonds. These bonds are scattered across several industries and have maturities over the next few years. One day, my manager walks into my office and says, “What kinds of risks are we taking by holding these bonds?” In order to answer my manager’s question, I first need to decide how I want to quantify risk. Let’s say that I settle on Value at Risk as my metric. Value at Risk (VaR) is the maximum loss not exceeded with a given probability defined as the confidence level, over a given period of time.

Before I can calculate a metric like Value at Risk, I need to have lots of information about my bonds and about the bond markets. First off, I’ll need to know everything about the bonds themselves: their maturity dates, their coupon rates, their seniority classes, and so forth. But that’s not all. I’ve captured and organized most of what we need as follows. To make things a little more interesting, we are also going to assume that our credit ratings are not a given.

One of the fundamental tasks in credit risk management is to assign a credit grade to a borrower. Grades are used to rank customers according to their perceived creditworthiness: better grades mean less risky customers; similar grades mean similar level of risk. Grades come in two categories: credit ratings and credit scores. Credit ratings are a small number of discrete classes, usually labeled with letters, such as ‘AAA’, ‘BB-‘, etc. Credit scores are numeric grades such as ‘640’ or ‘720’. Credit grades are one of the key elements in regulatory frameworks, such as Basel II.

Assigning a credit grade involves analyzing information on the borrower. If the borrower is an individual, information of interest could be the individual’s income, outstanding debt (mortgage, credit cards), household size, possibly zip code, etc. For corporate borrowers, one may consider certain financial ratios (e.g., sales divided by total assets), industry, etc. Here, we refer to these pieces of information about a borrower as features or predictors. For larger loans, accessible to small- to medium-sized companies and larger corporations, credit ratings are usually used, and the grading process may involve a combination of automated algorithms and expert analysis.

There are rating agencies that keep track of the creditworthiness of companies. Yet, most banks develop an internal methodology to assign credit grades for their customers. Rating a customer internally can be a necessity if the customer has not been rated by a rating agency, but even if a third-party rating exists, an internal rating offers a complementary assessment of a customer’s risk profile.

Implementing credit rating policies and procedures from scratch is a complex endeavor. Let’s assume that historical information is available in the form of a data set where each record contains the features of a borrower and the credit rating that was assigned to it. These may be internal ratings, assigned by a committee that followed policies and procedures already in place. Alternatively, the ratings may come from a rating agency, whose ratings are being used to “jump start” a new internal credit rating system, in the sense that the initial internal ratings are expected to closely agree with the third-party ratings while the internal policies and procedures are assimilated and fine-tuned.

The existing historical data is used to train an automated classifier; in the vocabulary of statistical learning, this process falls in the category of supervised learning. The classifier is then used to assign ratings to new customers. In practice, these automated or predicted ratings would most likely be regarded as tentative, until a credit committee of experts reviews them. The type of classifier we use can usually facilitate the revision of these ratings, because it provides a measure of certainty, a classification score, for the predicted ratings. Here is an example of a rating system we built:


We used features from Altman’s z-score:

Working capital / Total Assets (WC_TA)

Retained Earnings / Total Assets (RE_TA)

Earnings Before Interests and Taxes / Total Assets (EBIT_TA)

Market Value of Equity / Book Value of Total Debt (MVE_BVTD)

Sales / Total Assets (S_TA)

We also have an industry sector label, an integer value ranging from 1 to 12.

We evaluate many classifiers ranging from Naïve Bayesian model all the way to Artificial Neural Networks. In my experience, many customers found the TreeBagger algorithm gives higher accuracy. We use the predictors X and the response Y to fit a particular type of classification ensemble called tree bagger. “Bagging,” in this context, stands for “bootstrap aggregation.” The methodology consists in generating a number of sub-samples, or bootstrap replicas, from the data set. These sub-samples are randomly generated, sampling with replacement from the list of customers in the data set. For each replica, a decision tree is grown. Each decision tree is a trained classifier on its own, and could be used in isolation to classify new customers. The predictions of two trees grown from two different bootstrap replicas may be different, though. What the tree bagger does is to aggregate the predictions of all the decision trees that are grown for all the bootstrap replicas. If the majority of the trees predict one particular class for a new customer, it is reasonable to consider that prediction to be more robust than the prediction of any single tree alone. Moreover, if a different class is predicted by a smaller set of trees, that information is useful, too. In fact, the proportion of trees that predict different classes is the base for the classification scores that are reported by a tree bagger when classifying new data as follows:

Customer 60644:

RE/TA    =  0.22
MVE/BVTD =  2.40
Industry =  6
Predicted Rating : AA
Classification score :

AA : 0.6418
A : 0.3299
BBB : 0.0282


Customer 33083:

RE/TA    =  0.24
MVE/BVTD =  1.51
Industry =  4
Predicted Rating : BBB
Classification score :

A : 0.0669
BBB : 0.9331

Customer 63830:

RE/TA    =  0.18
MVE/BVTD =  1.69
Industry =  7
Predicted Rating : A

Classification score :
AA : 0.0198
A : 0.6588
BBB : 0.3194
B : 0.0020


One measure to evaluate the classifier is to build a Receiver Operating Characteristic (ROC) and check the area under the curve (AUC) as follows:


The AUC seems high enough, but it would be up to the committee to decide which level of AUC for the ratings should trigger a recommendation to update the automated classifier.

Our next task is to figure out what these ratings really mean. The ratings we use are defined in terms of how often different obligors transit between ratings, especially how often they go to a default. The only way that we can give real meaning to our custom rating system is by looking at our historical data and using it to generate transition probabilities as follows:


For example, looking at second row, there is 2.44% chance a AA bond will be upgraded to AAA during one sample time period, 92.6% chance stay in AA, 4.03% chance downgraded to A and so on.

Our final task is to put all of this information together and arrive at our answer: a Credit Value at Risk number. First, we use financial analysis to value each of the bond in our current portfolio.  Next, we run a large Monte Carlo simulation that’s loosely based on JP Morgan’s CreditMetrics approach to simulate what these bonds may do over the next year of interest.  Finally, we re-value our portfolio for each simulation and compare it against the current value.  This leads to a result and we can visualize it as follows:


So there is 5% chance that the portfolio can lose 798,232 USD in one year if there is no change of bonds in the portfolio.

As you can see, MATLAB is a tool that can effectively manage most of the workflow activities of such a data science project. MATLAB is also a tool that is preferred many data scientists who came from engineering background such as myself. To make it interesting, I have explored the credit rating problem in some up and coming tools such as DATO, which I will share with you and compare results to what I got in MATLAB in my next blog. Please stay tuned.

Fen Wei (Data Scientist,

Do NOT follow this link or you will be banned from the site!