Академический Документы
Профессиональный Документы
Культура Документы
MARCH 2012
M S Ramaiah Institute of Technology (Autonomous Institute Affiliated to VTU) Bangalore 560 054
TABLE OF CONTENTS 1.0 2.0 2.1 2.2 2.3 2.4 2.5 3.0 3.1 3.2 3.3 4.0 4.1 4.2 5.0 6.0 Abstract Introduction to Data Mining Data Mining Techniques Telecommunication Marketing Information about WEKA Telecommunication Fraud Detection Network Fraud Isolation Telecommunication Fraud Subscription fraud Bad Debt Call Detail Record Problem Definition Algorithms used Snapshots Conclusion References
1. Abstract
Huge amounts of data are being collected as a result of the increased use of mobile telecommunications. Insight into information and knowledge derived from these databases can give operators a competitive edge in terms of customer care and retention, marketing and fraud detection. One of the strategies for fraud detection checks for signs of questionable changes in user behavior. Although the intentions of the mobile phone users cannot be observed, their intentions are reflected in the call data which define usage patterns. Over a period of time, an individual phone generates a large pattern of use. While call data are recorded for subscribers for billing purposes, we are making no prior assumptions about the data indicative of fraudulent call patterns, i.e. the calls made for billing purpose are unlabeled. Further analysis is thus, required to be able to isolate fraudulent usage. An unsupervised learning algorithm can analyze and cluster call patterns for each subscriber in order to facilitate the fraud detection process. This research investigates the unsupervised learning potentials of two neural networks for the profiling of calls made by users over a period of time in a mobile telecommunication network. Our study provides a comparative analysis and application of SelfOrganizing Maps (SOM) and Long Short-Term Memory (LSTM) recurrent neural networks algorithms to user call data records in order to conduct a descriptive data mining on users call patterns. Our investigation shows the learning ability of both techniques to discriminate user call patterns; the LSTM recurrent neural network algorithm providing a better discrimination than the SOM algorithm in terms of long time series modeling. LSTM discriminates different types of temporal sequences and groups them according to a variety of features. The ordered features can later be interpreted and labeled according to specific requirements of the mobile service provider. Thus, suspicious call behaviors are isolated within the mobile telecommunication network and can be used to identify fraudulent call.
2. Introduction
Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations. Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?" This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications illustrate its relevance to todays business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.
3. Clustering : Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. To make the concept clearer, we can take library as an example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the whole library. 4. Prediction: The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.
5. Sequential Patterns: Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data.
not covered by the algorithms included in the Weka distribution is sequence modeling. The Explorer interface features several panels providing access to the main components of the workbench: The Preprocess panel has facilities for importing data from a database, a CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria. The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree). The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data. The Cluster panel gives access to the clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the expectation maximization algorithmfor learning a mixture of normal distributions. The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset. The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.
3.Telecommunication Fraud
Telecommunication fraud can be defined as the theft of services or deliberate abuse of voice or data networks. Telecommunication fraud can be broken down into several generic classes. These classes describe the mode in which the operator was defrauded, for example, subscription using false identity. Each mode can be used to defraud the network for revenue based purposes or nonrevenue based purpose. Most of these frauds are perpetrated either by the fraudster impersonating someone else or technically deceiving the network systems.
3.2 Bad Debt : Bad Debt occurs when payment is not received for
good/services rendered. This is, for example, in a telecommunication company, where the callers or customers appear to have originally intended to honour their bills but have since lost the ability or desire to pay. If someone does not pay their bill, then the telecom company has to establish if the person was fraudulent or was merely unable to pay.
4.Problem Defnition
Over a period of time, an individual handsets Subscriber Identity Module (SIM) card generates a large pattern of use. The pattern of use may include international calls and time-varying call patterns among others. Anomalous use can be detected within the overall pattern such as subscribers abuse of free call services such as emergency services. Anomalous use can be identified as belonging to one of two types : 1. The pattern is intrinsically fraudulent; it will almost never occur in normal use. This type is relatively easy to detect. 2. The pattern is anomalous only relative to the historical pattern established for that phone. In order to detect fraud of the second type, it is necessary to have knowledge of the history of SIM usage. Hence, a descriptive analysis of the call profiling for each subscriber can be used for knowledge extraction. Interpretation by way of clustering or grouping of similar patterns can help in isolating suspicious call behaviour within the mobile telecommunication network. This can also help fraud analysts in their further investigation and call pattern analysis of subscribers. While call data are recorded for subscribers for billing purposes, it is interesting to know that no prior assumptions are made about the data indicative of fraudulent call patterns. In other words, the calls made for billing purposes are unlabeled. Further analysis is thus required to be able to identify possible fraudulent usage. Because of the huge call volumes, it is virtually impossible to analyse without sophisticated techniques and tools.
SNAPSHOTS
5. Conclusion
In this project report, detection of subscription fraud and bad debts in telecommunication using BayesNet and JRip algorithm pattern learning have been mentioned. Theoritical and experimental results have been demonstrated which showed that pattern learning technique can be useful in detecting subscription fraud and bad debts in telecommunication.
6.
References
Cortes, C., Pregibon, D. Signature-based methods for data streams. Data Mining and Knowledge Discovery 2001; 5(3):167-182. Cortes, C., Pregibon, D. Giga-mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining; 174-178, 1998 August 27-31; New York, NY: AAAI Press, 1998. Ezawa, K., Norton, S. Knowledge discovery in telecommunication services data using Bayesian network models. Proceedings of the First International Conference on Knowledge Discovery and Data Mining; 1995 August 20-21. Montreal Canada. AAAI Press: Menlo Park, CA, 1995. Fawcett, T., Provost, F. Adaptive fraud detection. Data Mining and Knowledge Discovery 1997; 1(3):291-316. Fawcett, T, Provost, F. Activity monitoring: Noticing interesting changes in behavior.