Вы находитесь на странице: 1из 7

Expert Systems with Applications 38 (2011) 1328413290

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Discovering cardholders payment-patterns based on clustering analysis


Chien-Chou Shih a,, Ding-An Chiang b, Yi-jen Hu c, Chun-Chi Chen b
a

Department of Information and Communication, Tamkang University, Danshui Dist., New Taipei City 251, Taiwan Department of Computer Science and Information Engineering, Tamkang University, Danshui Dist., New Taipei City 251, Taiwan c Department of Insurance, Tamkang University, Danshui Dist., New Taipei City 251, Taiwan
b

a r t i c l e

i n f o

a b s t r a c t
This paper sampled approximately 9.3 million entries of data, concerning payments from 300,000 credit card customers over the past two years of Bank A in Taiwan. By applying data mining techniques to decipher customers behavior and perform risk analysis, the clustering algorithms divides card users into 9 groups of different levels of contributions and risk proles, according to their consumption patterns. We generalize a set of clustering rules to identify high risk customer groups in advance. Therefore, the proposed suggestions could tell who was a bad risk and either deny their application or, for those who were already cardholders, start shrinking their available credit and increasing minimum payments to squeeze out as much cash as possible before they defaulted. On the other hand, banks are advised to adjust credit limits in a timely manner for the customer groups whose risks are low and contributions are high, in addition to the provision of value added services, in order to enhance earnings. 2011 Elsevier Ltd. All rights reserved.

Keywords: Credit card Data mining Clustering algorithms

1. Introduction In the past decade, several Asian nations, including South Korea, China, Hong Kong and Taiwan, have experienced not only credit card booms but also nancial risks (Kang, 2007), and both governments and credit card companies have endeavored to overcome these nancial risks. In Taiwan, the liberalization and reformation of the nancial system and the growing diversity of corporate funding sources, consumer banking has been booming. Credit card business has become a very important earning source for consumer nance. According to the nature of businesses, consumer nance can be divided into consumer loans and credit cards. Based on the classications made by the Bureau of Monetary Affairs, Ministry of Finance of the ROC, credit card businesses can be divided into revolving credits, cash advances, and balance transfers. Due to the characteristics of the credit card business, any mismanagement of card issuing mechanisms or insufcient analysis of the consumption behavior of customers are likely to result in credit risk problems. Jordan (1998) argued that a main reason contributing to high overdue loans of banks is poor management policies or bad management skills. Therefore, to control overdue loans and reduce bad debt ratios, credit reviewing before card issuances, as well as authorization and risk control after card issuances, is key to the control of credit risks of credit customers (Jordan, 1998). Generally speaking, credit card risks can be classied into two
Corresponding author. Tel.: +886 2 26204762; fax: +886 2 26234052.
E-mail address: ccs@mail.tku.edu.tw (C.-C. Shih). 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.04.148

types: credit risks and ID theft risks. Credit risks refer to the risks of defaults by borrowers, who are not able to repay debts, as required in a timely manner, and remain impossible after actions have been taken to urge the repayments. The occurrence of credit risks affects debt-servicing capacities. This paper focuses on the analysis of credit risks. Most banks in Taiwan resort to the basic data of credit card customers as the foundation for credit reviews for card issuances and credit limits. However, the credit card debt has become an increasingly serious problem in Taiwan over recent years. Given the deterioration toward bad debts in the banking industry, it is necessary to shorten the lead-time required for credit analysis to facilitate prevention. Meanwhile, risk management based on simpler and less complicated data analysis can help to reduce the probability of bad debts. This paper samples approximately 9.3 million entries of data, concerning payments from 300,000 credit card customers over the past two years of Bank A in Taiwan, as the data source for cluster analysis. The analysis results are grouped into clusters with customers of similar natures in accordance with target variables. The professional domain knowledge of credit cards is used to interpret the reasons that customers are divided into a number of clusters, and then to assess all kinds of risk control strategies. This paper aims to provide credit card divisions of banks a reference for screening the main customer groups, adjusting the rights of credit card customers, and market campaigns, where appropriate, simultaneously minimizing risks and marketing costs and maximizing prots. For the credit card divisions of banks, there are three sources of revenues

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

13285

from each customer. These three sources are annual fees, transaction fees, and resolving interests for late payments. Therefore, to enhance the value of any customer, it is necessary to upgrade the credit limits, increase fee rates, and encourage the customers to spend more with credit cards. However, market competition in the credit card business has become erce over recent years, and except for the top-tier customers of charge cards and innite cards that still pay annual fees, most banks have cancelled annual fees or capped fee rates in order to remain in this highly competitive market. Therefore, this paper intends to analyze different groups of cardholders and examine their risk proles, in order to identify the customers with potential for high contributions, and determine methods of controlling risks. Meanwhile, banks are advised to implement appropriate marketing campaigns in a timely manner in order to enhance the spending of the group of customers with high potential, in order to increase the bottom line. By analyzing a list of high-risk customers, it is possible to limit the over-spending of the risky groups, and hence, avoid bad debts. Banks are suggested to increase the credit limits of cardholders whose risks are controllable and have high contribution potential, which is in accordance with suitable marketing events that can effectively enhance the earnings of credit card issuing banks.

Step 3: Identify the winner, with the Euclidean Distance Formula, to determine the neurons with the shortest time and distance
N 1 X X i t W ij t2 : j0

dj

Step 4: Adjust the vector of the linking values

W j t 1

Wt gtxt W j t; W j t; j 2 Aj t

j 2 Aj t

 :

Let Aj(t) be the learning speed, Aj(t) be the neighboring area of the winner j, which are both functions of time t. Step 5: Return to Step 2 recursively, and then progress one step at a time to shorten the neighboring diameter and reduce the learning speed until a Self Organization Map is formed. According to the algorithm of a Self Organization Map, the analysis should meet the following criteria: (1) Data probabilities are distributed in a non-linear manner within the map. (2) Data inputs and network structures should have identical vector values. (3) The set-up of relevant parameters should be appropriate; otherwise, the nets may be tangled or twisted. (4) The learning process of the Map is probability oriented. The learning speed g(t) should be adjusted over time. g(t) = 0.9(1 t/100) is a reasonable parameter. According to Kohonen, the neighboring function Aj(t) is usually the squared format that surrounds the winner neuron j. (5) There is no rule for the number of learning. Kohonen recommends over 500 times of the number of neurons in the network. (6) Input data should be normalized.

2. Data mining method for clustering analysis of credit card customers 2.1. Choice of data mining algorithm With the assistance of information technology, such as data warehousing, parallel processing, and data mining, decision makers are able to reveal hidden and undiscovered information from the gigantic ows of data, in order to assist companies to enhance their competitiveness. Data mining has become a way of extracting knowledge and hidden rules from within the myriad of data. Berry and Linoff suggested that data mining has been widely adopted by companies because transaction-processing systems have generated a large sum of data sources (Berry & Linoff, 1998, 1999). Gerritsen used articial neural networks to assess the data of credit ratings of customers, in order to ensure risk control of credit card loans and reduce non-performing loan ratios of banks (Gerritsen, 1999). Zhang and Zhou described data mining in the context of nancial application from both technical and application perspectives. In addition, they compared different data mining techniques and discussed important data mining issues involved in specic nancial applications (Zhang & Zhou, 2004). This paper applies the Self Organizing Map, or known as the Kohonen Feature Map (SOM), to perform a cluster analysis in accordance with business targets of credit card marketing in order to identify low-risk, high-contributing value customers. The Self Organizing Map, or known as the Kohonen Feature Map (SOM) was proposed by Kohonen in 1980, as an articial neural network model (Kohonen, 1995, 2001). The Map aims to obtain training demonstrations (only input variables required) from problems, and learn from demonstrations a set of rules for internal clusters. The purpose is to apply the ndings to new cases (input variables and inference is required, as well as applications for other training set from the same group). This application is similar to clustering. The map is a unique type of an articial neural network, able to identify the most appropriate input variables from among the output results, and can be use to detect clusters. The calculation steps of the Self Organizing Map are as follows: Step 1: Randomly set up weightings of linking values, Wj(n), n e N. Let N be the number of neurons. Step 2: Randomly input data variables from training set.

2.2. Classications of credit card consumers The raw data used in this paper is payment records of cardholders. The goal is to group customers with similar payment patterns into clusters, as based on the historical records of payments. Priorities are determined for all the clusters by referring to their characteristics. Domain knowledge is then applied to determine marketing strategies targeted at priority clusters, in order to enhance utilization rates of these cardholders, adjust credit card offerings and offer more cash cards. Alternatively, it is possible to identify customers who require small loans, which will improve the performance of consumer nance departments. This paper provides the following explanation on the variables and cluster analysis methods used in the credit card consumption behavior model. Banks can divide their credit card customers into three groups, as based on their distinctive consumption behaviors (Crovelli, 1995; Grupe & Owrang, 1995). They are revolver users, translator users, and convenience users, and these three groups increase earnings for banks. Their behavior proles are as follows:

2.2.1. Revolver users They are known for maintaining unpaid balances over a long period of time, and only repaying slightly more than the minimum requirements each month, and therefore, they pay high revolving interest to banks. In fact, they are the source of the highest earnings, and understandably, banks have to bear the highest risks.

13286

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

2.2.2. Translator users They are usually high spenders; however, they pay off all debts in the following month. Banks cannot obtain income from revolving interests from translator users; however, they provide a steady source of incomes for banks due to their contributions in transactional fees. Banks undertake relatively low risks for this type of cardholder. 2.2.3. Convenience users They are occasional users of credit cards. In case of a need for large amounts of spending, convenience users take advantage of the ease-of-use of credit cards and repay their borrowed amounts in several installments over the following months. Therefore, compared to revolver users, the contributions of revolving interests by convenience users is not as high. However, banks undertake lower risks for this group of users than for revolver users. Meanwhile, compared to translator users, convenience users make installment repayments, and thus, they contribute higher incomes in revolving interests. Banks assume higher risks for this group of users than for translator users. Banks prot from three sources of incomes for their credit card businesses, which are annual fees, transaction fees, and revolving interests for unpaid debts. Therefore, the consumption behavior patterns of revolver users, convenience users, and translator users result in different debt-servicing capabilities. This paper applies data mining and clustering techniques to perform an analysis, and identies high contribution and low risk groups based on the clustering results. Therefore, the debt servicing capacities of revolver users, convenience users, and translator users are used as the variables for repayment capabilities. This variable is applied to discriminate between different groups of consumption behaviors. Before clustering, the holders of idle cards are eliminated from the sample to ensure an accurate result of the cluster analysis. Fig. 1 classies the three consumption patterns of the three groups, i.e. revolver users, convenience users, and translator users, as based on their distinctive debt servicing capacities. When the debt servicing capacity reaches close to 0, the consumption behavior pattern is similar to a revolver user. When the debt servicing capacity is between 0 and 1, the consumption behavior pattern is dened as a convenience user. When the debt servicing capacity is close to 1, the consumption behavior pattern resembles a translator user. Variables are detailed in empirical results. 2.3. Risk evaluation and value customer analyzing Based on the previously mentioned, credit card customers can be grouped into revolver users, translator users, and convenience users. Their different behaviors stem from their distinctive needs.

Banks are advised to raise the credit limits of the customers with potential for value added services, and initiate relevant marketing campaigns. However, before doing so, it is necessary to understand consumption patters and risk proles of customers. Actions can be taken to target those low-risk customers, who have the potential to make high contributions. It should not be a one-size-ts-all standard. Based on the payment behaviors of customers, this paper sets out the analysis goals for this stage, as follows: (1) Risk assessment. (2) Identication of potential customers and increase of credit limits in a real-time manner. (3) Identication of the customers with the needs for cash cards or small loans (e.g. small personal credit loans). Therefore, the debt servicing capability, available credit limit, and utilization of revolving interest of the three types of customers is analyzed to provide a reference for risk control of credit cards.

3. Empirical analysis 3.1. Preprocess of credit card data sets Preprocessing of credit card data is to eliminate the data of idle cardholders. About 9.3 million entries of transaction data of the remaining 15,000 customers are analyzed for credit control. In an economic recession, many people cannot make repayments due. To identify high-risk credit card customers, it is necessary to conduct a statistical analysis. After sifting through all the training data, it is found that customers below the age of 40 account for a higher percentage of the default group. However, the statistical result fails to identify who the risky groups are, and in fact, if data mining is used, age is not the most important factor. The most important factor should be repayment behaviors, and other relevant elements. Generally speaking, credit card customers can be divided into revolver users, translator users, and convenience users. These customers exhibit different behaviors because of their varying needs. Based on the denition of these three types of customers, a cluster analysis of their behavior is conducted. Fig. 2 shows the aggregation of the cluster analysis of behaviors. Different clusters of differing behaviors are proled, and these clusters use the following variables:

1: X : Debt servicing Capability No: of months with repayments in full : No: of months with credit card spending

Fig. 1. Grouping of consumption behavior patterns, according to debt servicing capacities.

Debt servicing capability indicates debt-servicing behavior of cardholders. It is a classication based on the consumption behavior of customers. When the debt servicing capability is equal to 0, it means the customer is a revolver user. They are known for maintaining unpaid balances over a long period. They repay slightly more than the minimum amount, and shoulder high revolving interest each month, and while they bring about high earnings, the banks undertake high risks as well. When the debt servicing capability is equal to 1, it means the customer is a translator user. They are known for maintaining spending per month over long periods; however, repay their borrowed amounts in full in the following month, and therefore, their contribution in revolving interest income is low. However, they contribute to bank transaction fees, and represent one of the stable earning sources. Hence, translator users are premier customers of banks, even though their earning contributions are lower than revolver users, their risk is relatively low.

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

13287

Fig. 2. Aggregation of cluster analysis.

When the debt servicing capability is between 0 and 1, it means the customer is a convenience user. They are known for making repayments over a number of months following their card spending, and because they repay in installments, their contributions in revolving interests are not as high as revolver users. For banks, the convenience users bring higher earnings than translator users, but lower than revolver users. Correspondingly, their risk prole is lower than that of revolver users, but higher than that of translator users. Convenience users are a good group of customers for banks. (2) TOTAL_FIN: Revolving interest. The interest incurred by cardholders if they do not repay in full for what they have spent. (3) HISTORY1_1: Overdue payments, which can be classied into the following:

HISTORY1 1 8 > 1 : No records of intentionally delayed payments > > > > 2 : Payments delayed for a month > < 3 : Payments delayed for two months > > > > 4 : Payment delayed for three months; > > : and cards have been suspended:
(4) DIS_CL1: Available balance. It represents the usable balance of the cardholder. The value can be 05. The smaller the value, the less the available balance is. (5) C1: The maximum credit limit during the consumption period. The value is between 0 and 1. The closer it is to 0, the smaller the available credit limit. The clustering analysis suggests that, when a customer behaves like a revolver user, he/she presents higher risk; in contrast, when a customer behaves like a translator user, he/she

presents lower risk. Therefore, the revolver users are used as a case study to analyze risk proles. Figs. 35 show the cluster analysis of behaviors of these three groups. It is found that among all the three clusters, X is equal to or close to 0; therefore, the cluster behavior resembles that of a revolver user. Meanwhile, when the available balance of a revolver user becomes smaller, and the revolving interests become higher, the risk also becomes higher. For example, according to Fig. 3, 55% of revolver users make timely payments, and thus, their revolving interests (TOTAL_FIN) are higher. As far as the maximum credit limit (C1) during the consumption period, the smaller it becomes, the higher a risk it becomes. A comparison of Figs. 3 5 indicate that the DIS_CL1 (available balance) in Fig. 5 remains high. The spending is in small amounts, however, repayments are delayed. According to HISTORY1_1, for those who have not deliberately made any late repayments, 85% make timely repayments. However, the smaller the available balance, the higher the risks become. This paper further reviews risk proles and contributions based on transactional data of credit cards. After clustering algorithms calculation and analysis, it divides card users into 9 groups of different levels of contributions and risk proles, according to their consumption patterns. Table 1 summarizes the group details and the percentage of each group in the total. Among all the customers, translator users amount to 62,763 people (39.69% of the total). They are further divided into two sub-groups based on their behavior characteristics. One group accounts for 26.22%, and the other group accounts for 13.47%. The number of convenience users amounts to 61,675 people (39.69% of the total). They are further classied into ve sub-groups. They account for 13.27%, 8.53%, 4.02%, 7.83%, and 5.35% of the total, respectively. The number of revolver users reaches 33,688. They are further categorized into two sub-groups, and these groups account for 9.89% and 11.42%, respectively.

13288

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

Fig. 3. Behavior analysis of cluster 15.

Fig. 4. Behavior analysis of cluster 14.

During the data mining process of the clustered data, this paper attempts to segment the usage behaviors of revolver users, translator users, and convenience users, with their behavior patterns

as the main variables. Meanwhile, the statistical analysis is performed on the contributions of these three groups in transaction fees and revolving interests by taking into account their respective

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

13289

Fig. 5. Behavior analysis of cluster 13.

Table 1 Contributions of different customer groups and their percentage in total. Customer consumer behavior Translator User_1 Translator User_2 Convenience User_Short_1 Convenience User_Short_2 Convenience User_Middle_1 Convenience User_Middle_2 Convenience User_Middle_3 Revolver User_1 Revolver User_2 The deal of contribution Low Low Middle High High Low Middle Low High Contribution of interest rate Low Low Low Middle High Middle Middle Middle Highest Risk evaluation Low Low Middle High High Middle Middle Middle High Proportion (%) 26.22 13.47 13.27 8.53 4.02 7.83 5.35 9.89 11.42

risk proles. According to the results of the clustering analysis, this paper nds the following (as Table 1 shows):  The customer groups with the highest contribution to earnings are Revolver User_2, and Convenience User_Short_2. However, they also present high risks.  Due to the pattern of installment repayments of convenience users, Convenience User_Short_1 and Convenience User_Middle_3 are the ideal targets for the promotion of small-sum installment credit loans and cash cards because they have real-time cash needs or requests to increase credit limits in a given month.  Given the long-term utilization of revolving interests and high risks, Revolver User_1 and Convenience User_Middle_2 are ideal candidates to convert their credit cards into cash cards with limits. As the contributions made by translator users mostly comes from transaction fees for the transactions they repay in full each month, the risk prole of translator users is low. However, their

contributions to banks earnings are relatively lower because they do not incur revolving interests. As shown in Table 1, the contribution of translator users (Translator User_1) and translator users (Translator User_2) is of lower value, as far as the research goal of this paper is concerned. Therefore, they are not included in the recommendation list. In addition to transaction fees, revolver users also pay high revolving interest each month. Therefore, their contributions to earnings are high, despite their high risks. On top of the same transaction fees, convenience users also bring about good contributions to earnings since they make installment repayments, and they present lower risks, as compared to revolver users. Based on the abovementioned reasons, this paper not only analyzes the risk proles of credit card users, but also aims to identify the cardholders whose contributions are high and risks are low and controllable. The results can serve as references for risk management after card issuances, assist banks in the prevention of unnecessary bad debts, and present a recommendation list of card users whose contributions are high and risks are controllable, in order to facilitate the marketing efforts of banks. With timely and appropriate mar-

13290

C.-C. Shih et al. / Expert Systems with Applications 38 (2011) 1328413290

keting, banks are able to provide suitable value-added options to cardholders of different behaviors in order to improve their bottom lines. Therefore, the ndings can be served as references to banks for timely marketing with a recommendation list of customers whose contributions are high and risks are controllable, after analysis of revolver users and convenience users. 3.2. Validation This paper adopts training data to perform a risk control analysis during the process of data mining of the clustered data. To verify the accuracy of the analysis results and further improve the precision of the modied analysis results, the analysis results are submitted to the issuing bank for analysis tests and validation. The results show that, after one year, the bank indeed enjoyed an increase in revenues in card applications and spending, after it initiated appropriate value-adding marketing campaigns to the list recommended by this paper. Marketing was targeted at customers whose risks could be controlled, and who had potential to make higher contributions. The bank increased the credit limits in a timely and appropriate manner to the customers in the recommended list. Marketing events promoting small-sum installmentrepayment credit loans and cash cards were targeted at convenience users (Convenience User_Short_1) and convenience users (Convenience User_Middle_3). As a result, card applications and spending rose, and the bank created more earnings than from other customers who had not been analyzed for their risk proles. Assisted by the analysis of controllable risks, the bank saw idle card rates and bad debt ratios of the recommended convenience users improve over the customers not yet analyzed. As for recommended revolver users (Revolver User_1) and convenience users (Convenience User_Middle_2), their applications for cash cards with a credit limit improved more than the customers not on the recommended list. Their idle card rates and bad debt ratio are also lower than the customers not on the recommended list. As regards to the revolver users (Revolver User_2) and convenience users (Convenience User_Short_2), whose contributions and risks are high, analysis is conducted to conrm the risk groups, and fraud analysis is also performed to improve the accuracy of risk analysis. Based on the study results, the list of recommended products is examined. In addition to strictly monitoring the subsequent spending of this group of high-risk cardholders, the issuing bank also adopts a stringent standard for all the follow-up applications of these cardholders. After the experiment conrms the analysis results, the bank intends to modify the contractual clauses for credit cards, in order to reduce the potential losses resulting from this high-risk group. Also, a follow-up analysis is performed on the high-risk group by working with the bank to assess their credit status and incomes. An examination of the extended periods of the installment repayments is made to identify the optimal period and number of

installments. This aims to ensure sure that high-risk cardholders are able to make repayments and reduce bad debts for the bank. Finally, in terms of translator users (Translator User_1) and translator users (Translator User_2), whose risks are low and contributions are not high, this paper suggests to the issuing bank that they should be upgraded to charge cards or innity cards based on the companies they work for, the positions they hold, their leverage, and incomes. 4. Conclusion and future work In conclusion, a cluster analysis of the payment history of credit card customers is conducted on the behavior patterns of the customers. The analysis results can serve as references for banks in card issuances and credit reviews, as well as providing a benchmark for banks to identify potential customer groups whose risks are low and controllable. An analysis is performed on credit risks and consumption patterns of credit card customers to assist banks in cross or vertical marketing of value added products. This paper hopes that banks can make use of its nding to enhance prots, reduce bad debts, and lower marketing and advertising expenses by avoiding the recruitment of high-risk customers. This paper only analyzes the transactional data of payments, and does not explore the potential correlation between basic attributes of customers associated with transactional risks. Future studies can further examine the variables that affect credit card risks in each group by combining the basic data of customers, which can serve as a reference for banks to accurately determine high-risk groups, and to adjust credit limits in a timely manner to customers with the potential to make higher contributions. References
Berry, M. J. A., & Linoff, G. (1998). Data mining techniques: For marketing, sales, and customer support, 1998. Berry, M. J. A., & Linoff, G. (1999). Mastering data mining. The Art & Science of Customer Relation Management. Crovelli, R. A. (1995). The generalized 20/80 law using probabilistic fractals applied to petroleum eld size. Natural Resources Research, 4(3), 15207439. Gerritsen, R. (1999). Accessing loan risks: A data mining case study. IT Professional, 1(6), 1621. Grupe, F. H., & Owrang, M. M. (1995). Database mining discovering new knowledge and cooperative advantage. Information Systems Management, 12(4), 2631. Jordan, J. S. (1998). Problem loans at New England Banks, 19891992 evidence of aggressive loan policies. New England Economic Review, 2328. Kang, T.-S. (2007). Recent episodes of credit card distress in Asia. BIS Quarterly Review, 5568. Kohonen, T. (1995). Self-organizing maps. Series in information sciences (Vol. 30. 2nd ed.). Heidelberg: Springer. Kohonen, T. (2001). Self-organizing maps, 3rd ed. New York. Zhang, D., & Zhou, L. (2004). Discovering golden nuggets: Data mining in nancial application. IEEE Transactions on Systems, Man, and Cybernetics Part C: Applications and Reviews, 4(4), 513522.

Вам также может понравиться