Вы находитесь на странице: 1из 25


1 INTRODUCTION TO DRILLING Drilling is a cutting process that uses a drill bit to cut or enlarge a hole in solid materials. The drill bit is a multipoint, end cutting tool. It cuts by applying pressure and rotation to the workpiece, which forms chips at the cutting edge. Deep hole drilling is defined as a hole depth greater than five times the diameter of the hole. These types of holes require special equipment to maintain the straightness and tolerances. Other considerations are roundness and surface finish. Deep hole drilling is generally achievable with a few tooling methods, usually gun drilling or BTA drilling. These are differentiated due to the coolant entry method (internal or external) and chip removal method (internal or external). Secondary tooling methods include trepanning, skiving and burnishing, pull boring, or bottle boring. A high tech monitoring system is used to control force, torque, vibrations, and acoustic emission. The vibration is considered a major defect in deep hole drilling which can often cause the drill to break. Special coolant is usually used to aid in this type of drilling. The cutting edges produce more chips which continue the movement of the chips outwards from the hole. This is successful until the chips pack too tightly, either because of deeper than normal holes or insufficient backing off (removing the drill slightly or totally from the hole while drilling). Cutting fluid is sometimes used to ease this problem and to prolong the tool's life by cooling and lubricating the tip and chip flow. Coolant may be introduced via holes through the drill shank, which is common when using a gun drill. In computer numerical control (CNC) machine tools a process called peck drilling, or interrupted cut drilling, is used to keep swarf from detrimentally building up when drilling deep holes (approximately when the depth of the hole is three times greater than the drill diameter). Peck drilling involves plunging the drill part way through the workpiece, no more than five times the diameter of the drill, and then retracting it to the surface. This is repeated until the hole is finished. A modified form of this process, called high speed peck drilling or chip breaking, only retracts the drill slightly. This process is faster, but is only used in moderately long holes otherwise it will overheat the drill bit. It is also used when drilling stringy material to break the chips.

1.2 DRILLING PARAMETERS 1.2.1 Feed Rate An important aspect of the drilling operation is the user's ability to program the machine's z-axis to stroke with different drill diameters at different rates. These rates are typically engineered to achieve the best in output and hole quality while avoiding tool failure or breakage. Measured in inches per minute (ipm), feed rates depend on the tool diameter, machine capability and stability, laminate material type, copper layers, and stack height (see formulas.) 1.2.2 Spindle Speeds Spindle speed settings control the rotation of the drill bit during the drilling cycle. Measured in revolutions per minute (rpm), speed also impacts the quality of the drilled hole as well as the drill bit's cutting edge condition. High speeds can result in excessive cutting edge corner breakdown, or "rounding". This is an undesirable condition, resulting in the drill "punching" rather than shearing through the material stack. Particulate materials drill or fabricate favorably with certain cutting speeds and should remain constant throughout the drilling range except when limited by the spindle capability (see formulas.) 1.2.3 Retract Rates The z-axis return stroke can be programmed as well. This rate, known as retract rate and is measured in inches per minute (ipm) and should be set to a value that will minimize the duration of the drill bit inside the hole. Z-axis stability and drill bit diameter often influence optimum retract conditions. An unstable z-axis or a well worn z-axis can lead to poor depth control, poor hole quality, and high breakage occurrences. Most affected is the micro and small drill bit diameter range (.002".020".)

1.2.4 Hit Count The number of holes or hits a drill bit can produce depends highly on the material through which it is drilling, the feeds and speeds, and the hole quality criteria that a PCB manufacturer has predetermined as its specification. Each drill stroke results in added wear, causing the cutting edge to become slightly less effective. Typically hit count ranges are from 1,000 to 2,000 hits for "via" diameter range (0.0130" 0.0300") and 1,500 to 3,000 for "component" diameter range (0.032" - 0.050".) 1.2.5 Z-Axis Offsets Controlling penetration into the stack of panels is a key aspect of good drilling process. Minimum desired depth into the backup material for each drill bit is the amount of distance it takes to clear the point angle. This will ensure that the drilled hole has not been drilled shallow. Entering this value into the controller as a z-axis offset will go a long way in controlling penetration into the backup material. The point clearance is directly related to the drill bit's diameter and point angle. As the diameter increases, so does the distance needled to clear the point angle (see formulas.) 1.2.6 Surface feet per minute (SFPM or SFM) It is a unit of velocity used in machining to identify the machinability ratings of a material. SFM is a combination of diameter and the velocity (RPM) of the material measured in feet-per-minute as the spindle of a milling machine or lathe. The faster the spindle turns, and/or the larger the diameter, the higher the SFM. The goal is to tool a job to run the SFM as high as possible to increase hourly part production. However some materials will run better at specific SFMs. When the SFM is known for a specific material (ex 303 annealed stainless steel = 120 SFM for high speed steel tooling), a formula can be used to determine spindle speed for live tools or spindle speeds for turning materials. SFM can be calculated using the following equation:

1.2.7 Chip load It is a term that is commonly used in the world of tool manufacturers. The values that are given by tool providers for each tool have certain chip loads. In general terms, chip load refers to the thickness of the chip after it has been machined. In order to target the proper chip load, workers can use a certain formula that involves speeds and rotations per minute: Surface Feet per Minute = RPM x 262 x Tool Diameter RPM = Surface Feet per Minute x 3.82/ Tool Diameter Feed Rate (inches per minute) = RPM x Chip Load per Tooth x Number of Flutes Chip Load per Tooth = inches per minute / RPM x Number of Flutes Another thing to consider is that the chip load is a measure of how much heat is taken away from the cut, and as a consequence, from the cutting tool. If not enough heat is taken away, the tool will overheat and will not last as long as one might expect. Manufacturers must consider other factors that are responsible for heating, as well.

1.3 INTRODUCTION TO DATA MINING We are surrounded by data, numerical and otherwise, which must be analyzed and processed to convert it into information that informs, instructs, answers, or otherwise aids understanding and decision-making. Due to the ever-increasing complexity and size of todays data sets, a new term, data mining, was created to describe the indirect, automatic data analysis techniques that utilize more complex and sophisticated tools than those which analysts used in the post to do mere data analysis. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers.

In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: 1. Predictive data mining, which produces the model of the system described by the given data set, or 2. Descriptive data mining, with produces new, nontrivial information based on the available data set. 1.3.1 Data Mining Software: Weka

Weka is an open-source Java application produced by the University of Waikato in New Zealand. This software bundle features an interface through which many of the aforementioned algorithms (including decision trees) can be utilized on preformatted data sets. Using this interface, several test-domains were experimented with to gain insight on the effectiveness of different methods of pruning an algorithmically induced decision tree. The implementations are part of a system called Weka, developed at the University of Waikato in New Zealand. Weka stands for the Waikato Environment for Knowledge Analysis. (Also, the weka, pronounced to rhyme with Mecca, is a flightless bird with an inquisitive nature found only on the islands of New Zealand.) The system is written in Java, an objectoriented programming language that is widely available for all major computer platforms, and Weka has been tested under Linux, Windows, and Macintosh operating systems. Java allows us to provide a uniform interface to many different learning algorithms, along with methods for pre- and post processing and for evaluating the result of learning schemes on any given dataset.

There are several different levels at which Weka can be used. First of all, it provides implementations of state-of-the-art learning algorithms that you can apply to your dataset from the command line. It also includes a variety of tools for transforming datasets, like the algorithms for discretization discussion. The data sets were pre processed, fed into a learning scheme, and the resulting classifier was analyzed and its performanceall without writing any program code at all. As an example to get started, we have explained how to transform a spreadsheet into a dataset with the right format for this process, and how to build a decision tree from it.



In this chapter few selected research paper related to Application of Data Mining technique in various fields of manufacturing particularly using J48 Algorithm have been discussed. Z. Song and A. Kusiak [1] optimised product configurations with a data-mining approach. The development of the sub-assemblies and configurations allows for effective management of enterprise resources, contributes to the innovative design of new products, and streamlines manufacturing and supply chain processes.

J. Jenkol, P. Kralj, N. Lavrac and A. Sluga [2] describes a data mining and visualization experiment performed on a real-world problem and results achieved in the experiment. A subgroup discovery algorithm was used in order to find useful patterns in production data. Workshop and manufacturing work system utilization visualization plots were created in order to gain some useful information. Chun-Che Huang, Yu-Neng Fan, Tzu-Liang Tseng, Chia-Hsun Lee, and Horng-Fu Chuang [3] improves the effectiveness during a QA process, using a hybrid approach incorporated with data mining techniques such as rough set theory (RST), fuzzy logic (FL) and genetic algorithm (GA) is proposed in this paper . Based on an empirical case study, the proposed solution approach provides great promise in QA. Hovhannes Sadoyan, Armen Zakarian, Pravansu Mohanty [4] presented a new data mining algorithm based on the RS theory for manufacturing process control. The algorithm extracts useful knowledge from large datasets obtained from the manufacturing process and represents this knowledge using if/then decision rules. An application of the data mining algorithm was presented with the industrial example of rapid tool making (RTM).

A. Kusiak [5] provided a framework for organizing and applying knowledge for decision-making in manufacturing and service applications. The framework uses decision-making constructs such decision tables, decision maps, and atlases. It offers a new data-driven paradigm of importance to modern manufacturing and service organisations. The fact that the proposed data mining approach involves two phases: learning and decision-making, makes data mining suitable for system-on-achip applications.

Yan Cao, XuXu, Fengru Sun [6] studied the optimization of production scheduling in FMS by Data Mining technique. The waiting time is considered to find the final results. By the experiments, the best production scheduling is yielded. Because this approach is tested in multi-agent distributed environment, the waiting time is not very exactly for the sake of network flow.

F. Chien, C. Cheng, S. Lin [7] obtained from the decision tree result, the left leaf in the first split and left leaf in the second split can be identified as low yield groups. The right leaf in the second split can be regarded as a normal yield group. In the first split, the combined attribute of Ta60 Ta47 and combined level of AT02-BT25 imply that 19 wafer lots of the left leaf node passed through the machine AT02 in station Ta60 and the machine BT25 in station Ta47. The machine AT02 in station Ta60 and the machine BT25 in station Ta47 may be causing the low yield. On the left leaf in the second split, the machines T702 and T707 in station Ta17 are also responsible for other low yield. The decision tree model shows that these two production paths cause low yield.

N. Saravanan , V.N.S. Kumar Siddabattuni, K.I. Ramachandran [8] observed vibration signals extracted from rotating parts of machineries carries lot many information within them about the condition of the operating machine. Further processing of these raw vibration signatures measured at a convenient location of the machine unravels the condition of the component or assembly under study. The effectiveness of wavelet-based features for fault diagnosis of a gear box using artificial neural network (ANN) and proximal support vector machines (PSVM) has been found out. The statistical feature vectors from Morlet wavelet coefficients are classified using J48 algorithm and the predominant features were fed as input for training and testing ANN and PSVM and their relative efficiency in classifying the faults in the bevel gear box was compared



3.1 RESOURCES Necessary analyses for this research study performed using data belonging to the drilling of high speed steel and other alloys. The available data which has been used in this analysis have been provided by Balasore Alloys Limited.

A Specimen of the total data collected has been shown below. A total of 998 data has been collected.
Diameter Feed Speed Retract Z Axis Offset Chipload SFM Quality 0.1086585 0.513361985 79.89265013 16.36076331 -0.061859888 2.817973972 84.79763191 EXCELLENT 0.112141358 0.527872899 79.87316392 12.29563048 -0.029388129 2.467239955 85.38609306 EXCELLENT 0.120552213 0.530648593 79.85429354 20.32606623 -0.037886548 2.982751972 85.64652571 EXCELLENT 0.127234298 0.530819745 79.8522001 10.86665913 -0.034205681 2.947717266 88.53517819 EXCELLENT 0.152475291 0.534387291 79.84236377 13.22727678 -0.074924447 0.402537769 88.63111598 EXCELLENT 0.165651014 0.540339154 79.81831377 11.14174929 -0.038688605 1.671095186 89.67022839 EXCELLENT 0.168806918 0.545056536 79.74517719 11.38931781 -0.072636232 2.756054418 89.72225957 EXCELLENT 0.169621666 0.547730097 79.73804226 20.26513073 -0.08743773 1.753569089 90.17638919 EXCELLENT 0.171576726 0.576543334 79.72404178 19.92893382 -0.060908453 1.475009744 90.35996853 EXCELLENT 0.17552562 0.580731323 79.62042945 9.768950138 -0.032711351 2.042547487 91.0407287 EXCELLENT 0.185297519 0.583917598 79.31934617 14.37331672 -0.06029146 0.738900117 91.52268984 EXCELLENT 0.189019013 0.594368998 79.29943992 20.10202755 -0.091866705 0.685848963 91.64462466 EXCELLENT 0.19107429 0.602243043 79.21143253 16.19700965 -0.09240698 1.723110572 93.59791553 EXCELLENT 0.19296783 0.611946221 79.16828057 10.36968562 -0.04816706 1.222392582 93.71929665 EXCELLENT 0.202410992 0.614131222 79.12469405 24.12203609 -0.084534852 2.657792912 94.74600869 EXCELLENT 0.208783886 0.614874167 79.10627561 5.526320205 -0.087342912 0.678176735 96.227494 EXCELLENT 0.211329531 0.61659959 78.99246263 24.21234543 -0.044299525 0.731771912 96.5758212 EXCELLENT 0.224048931 0.623243632 78.96402536 11.69494947 -0.075095848 0.341717842 96.83865853 EXCELLENT 0.229539082 0.634594868 78.9456239 24.61251058 -0.036386859 1.317494903 97.00844415 EXCELLENT 0.230637988 0.638548678 78.93568073 19.99659831 -0.04489915 2.695318024 97.6029899 EXCELLENT 0.23515761 0.639974003 78.840189 20.20856306 -0.083742914 0.759419985 97.8567528 EXCELLENT 0.237647225 0.641617266 78.71534216 12.80592969 -0.083813552 2.380471939 98.02471268 EXCELLENT 0.244179222 0.642031107 78.5511776 20.85597559 -0.067077662 0.981761503 98.72816902 EXCELLENT 0.253925462 0.642601853 78.54912828 10.21940039 -0.094007945 2.846951324 99.05406408 EXCELLENT 0.271632201 0.649840312 78.52361163 7.688124534 -0.091929694 0.411691273 100.5406926 VERY GOOD 0.280886373 0.668036775 78.52171478 5.262579351 -0.060731557 2.592573193 100.8972393 VERY GOOD 0.283015816 0.669046352 78.46054696 5.742639889 -0.060723808 0.510032174 101.5650004 VERY GOOD 0.289595628 0.669064939 78.42563695 11.92961321 -0.055458264 1.773668725 101.5813657 VERY GOOD 0.291755787 0.677430135 78.35573372 9.769720311 -0.043327846 2.51539749 101.5871168 VERY GOOD

Table 1: A sample of the data collected from Balasore Alloys containing 30 data

The input data are Diameter, Speed, SFM, Feed and Z Axis Offset and output data includes Chipload, and Quality.

Only quality is the Nominal Data and rest are numerical data. The Quality has got five nominal values: EXCELLENT, VERY GOOD, GOOD, AVERAGE and BAD.

The sorting of Quality is based on the grading which was provided by Balasore Alloys Ltd according to following criterions

Grade A B C E F


Table 2: Grading of Quality on the basis of surface finish The company has classified the data in terms of surface finish only. But since surface property is affected by several parameters, quality also thus depends on the input parameters. But this data does not show any clear relationship between them which will be investigated by Data Mining Tools and technique. The experiment was performed by a MAXIMART- T Series CNC Drilling machine with a standard drill bit. The workpiece is made of 1018 cold rolled Mild Steel 3.2 MODELS Decision trees and decision rules are data-mining methodologies applied in many real-world applications as a powerful solution to classification problems. Therefore, at the beginning, let us briefly summarize the basic principles of classification. In general, classification is a process of learning a function that maps a data item into one of several predefined classes. Every classification based on inductive-learning algorithms is given as input a set of samples that consist of vectors of attribute values (also called feature vectors) and a corresponding class. The goal of learning is to create a classification model, known as a classifier, which will predict, with the values of its available input attributes, the class for some entity (a given sample). In other words, classification is the process of assigning a discrete label value (class) to an unlabeled record, and a classifier is a model (a result of classification) that predicts one attribute-class of a sample-when the other attributes are given.

A more formalized approach to classification problems is given through its graphical interpretation. A data set with N features may be thought of as a collection of discrete points (one per example) in an N-dimensional space. A classification rule is a hypercube in this space, and the hypercube contains one or more of these points. When there is more than one cube for a given class, all the cubes are OR-ed to provide a complete classification for the class. Within a cube the conditions for each part are AND-ed. The size of a cube indicates its generality, i.e., the larger the cube the more vertices it contains and potentially covers more sample-points. In a classification model, the connection between classes and other properties of the samples can be defined by something as simple as a flowchart or as complex and unstructured as a procedure manual. Data-mining methodologies restrict discussion to formalized, "executable" models of classification, and there are two very different ways in which they can be constructed. On the one hand, the model might be obtained by interviewing the relevant expert or experts, and most knowledge-based systems have been built this way despite the well-known difficulties attendant on taking this approach. Alternatively, numerous recorded classifications might be examined and a model constructed inductively by generalizing from specific examples that are of primary interest for data-mining applications. The statistical approach to classification explained in gives on type of model for classification problems: summarizing the statistical characteristics of the set of samples. The other approach is based on logic. Instead of using math operations like addition and multiplication, the logical model is based on expressions that are evaluated as true or false by applying Boolean and comparative operators to the feature values. These methods of modeling give accurate classification results compared to other non-logical methods, and they have superior explanatory characteristics. Decision trees and decision rules are typical data-mining techniques that belong to a class of methodologies that give the output in the form of logical models. There exists many prominent machine language algorithms used in modern computing applications. These algorithms have been engineered over the last several decades, and many are open-source and available for public usage and modification. A common application for these algorithms often involves decision based classification and adaptive learning over a training set.


The decision tree is a popular utility for implementing such tactics. A decision tree is a decision-modelling tool that graphically displays the classification process of a given input for given output class labels. This paper will discuss the algorithmic induction of decision trees, and how varying methods for optimizing the tree, or pruning tactics, affect the classification accuracy of a testing set of data. Pruning decision trees is a fundamental step in optimizing the computational efficiency as well as classification accuracy of such a model. Applying pruning methods to a tree usually results in reducing the size of the tree (or the number of nodes) to avoid unnecessary complexity, and to avoid over-fitting of the data set when classifying new data. For example, when classifying an example in a decision tree and reaching a certain node, there are two possible outcomes: positive and negative. Depending on the percentage of total error caused by pruning, however, the noisy data incurred may not be deemed negligible depending on the tolerance from the data set (and/or a sufficiently dominant consistency of classification by the test node). A smaller sub-tree can be inserted instead of replacing a test node with a class label to prune a decision tree as well. This would involve replacing a node such as t2 with a smaller test node, like t5. In either case, the resulting tree is smaller and ideally more efficient than the pre-pruned tree. The process of pruning traditionally begins from the bottom of the tree (at the child leaves), and propagates upwards. 3.4 THE J48 DECISION TREE INDUCTION ALGORITHM AND MONK The algorithm used by Weka and the MONK project is known as J48. J48 is a version of an earlier algorithm developed by J. Ross Quinlan, the very popular C4.5. Decision trees are a classic way to represent information from a machine learning algorithm, and offer a fast and powerful way to express structures in data. It is important to understand the variety of options available when using this algorithm, as they can make a significant difference in the quality of results. In many cases, the default settings will prove adequate, but in others, each choice may require some consideration. The J48 algorithm gives several options related to tree pruning. Many algorithms attempt to "prune", or simplify, their results. Pruning produces fewer, more easily interpreted results. More importantly, pruning can be used as a tool to correct for potential overfitting.

The basic algorithm described above recursively classifies until each leaf is pure, meaning that the data has been categorized as close to perfectly as possible. This process ensures maximum accuracy on the training data, but it may create excessive rules that only describe particular idiosyncrasies of that data. When tested on new data, the rules may be less effective. Pruning always reduces the accuracy of a model on training data. This is because pruning employs various means to relax the specificity of the decision tree, hopefully improving its performance on test data. The overall concept is to gradually generalize a decision tree until it gains a balance of flexibility and accuracy. J48 employs two pruning methods. The first is known as subtree replacement. This means that nodes in a decision tree may be replaced with a leaf -- basically reducing the number of tests along a certain path. This process starts from the leaves of the fully formed tree, and works backwards toward the root. The second type of pruning used in J48 is termed subtree raising. In this case, a node may be moved upwards towards the root of the tree, replacing other nodes along the way. Subtree raising often has a negligible effect on decision tree models. There is often no clear way to predict the utility of the option, though it may be advisable to try turning it off if the induction process is taking a long time. This is due to the fact that subtree raising can be somewhat computationally complex. Error rates are used to make actual decisions about which parts of the tree to replace or raise. There are multiple ways to do this. The simplest is to reserve a portion of the training data to test on the decision tree. The reserved portion can then be used as test data for the decision tree, helping to overcome potential overfitting. This approach is known as reduced-error pruning. Though the method is straightforward, it also reduces the overall amount of data available for training the model. For particularly small datasets, it may be advisable to avoid using reduced error pruning. Other error rate methods statistically analyze the training data and estimate the amount of error inherent in it. The mathematics are somewhat complex, but this approach seeks to forecast the natural variance of the data, and to account for that variance in the decision tree. This approach requires a confidence threshold, which by default is set to 25 percent. This option is important for determining how specific or general the model should be.


If the training data is expected to conform fairly closely to the data you'd like to test the model on, this figure can be lowered. The reverse is true if the model performs poorly on new data; try decreasing the rate in order to produce a more pruned (i.e, more generalized) tree. There are several other options that determine the specificity of the model. The minimum number of instances per leaf is one powerful option. This allows you to dictate the lowest number of instances that can constitute a leaf. The higher the number, the more general the tree. Lowering the number will produce more specific trees, as the leaves become more granular. The binary split option is used with numerical data. If turned on, this option will take any numeric attribute and split it into two ranges using an inequality. This greatly limits the number of possible decision points. Rather than allowing for multiple splits based on numeric ranges, this option effectively treats the data as a nominal value. Turning this encourages more generalized trees. There is also an option available for using Laplace smoothing for predicted probabilities. Laplace smoothing is used to prevent probabilities from ever being calculated as zero. This is mainly to avoid possible complications that can arise from zero probabilities. The MONK naive Bayes documentation offers a brief tutorial on probability that may be useful for understanding this point. The most basic parameter is the tree pruning option. If you decide to employ tree pruning, you will need to consider the options above. Be aware that depending on how the training and test data have been defined that the performance of an unpruned tree may superficially appear better than a pruned one. As described above, this can be a result of overfitting. It is important to experiment with models by intelligently adjusting these parameters. Often, only repeated experiments and familiarity with the data will tease out the best set of options.




The process is to collect the drilling data from different drilling sites on real-time basis, store them in a central computer. The determined optimum drilling parameters which when utilized are going to be reducing the drilling related costs and give the desired output. Following the analysis of the drilling parameters database, a relation Between the drilling parameters and surface finish trend could be determined and during actual drilling activity a predetermined surface finish could be agreed between the parties (such as Operator and Contractor).

Step 1: Pre-processing of Data A spread sheet was created using MS Excel which contains all our input and output data (viz. Diameter, Feed, Speed, Retract, Z Axis Offset, Chipload, SFM, Quality) in a tabular format. The spread sheet which was in .xls format was saved as .csv

Figure 1: Screenshot of Data pre-processing using MS Excel


Step 2: Feeding data in Weka The .csv file was opened using Weka Explorer and was saved as .arff file format (i.e Attribute- Relational File Format), a format made exclusively for Weka. Weka explored automatically removed noise and found the trend between quality and the various input parameters. The quality was plotted as a function of controllable parameters.

Figure 2: Screenshot of Data Feeding into Weka

Step 3: Data Classification We selected the Nominal data (i.e Quality) and in the Classifier section we selected J48 under tree category from the classifier explorer tree. After pressing the Start button we observed the result in the Classifier output section.


Figure 3:Screenshot of Data classification using J48 Decision tree

Step 4: Data visualisation It was found in the Visualise tab. Several scatter plots are obtained which shows relationship between various parameters. Noise level, Jitter and resolution of the curve can be adjusted using the scroll bars.

Figure 4: Screenshot of Data visualisation using Scatter plot


CHAPTER 4 RESULTS AND DISCUSSION 4.1 HISTOGRAMS 4.1.1 Relation with Feed Feed varies from 0.51 to. 5.45. It has been classified into 10 sub groups of equal periods in ascending order of feed rate. For first and second group all the product have excellent quality. For third and forth group the most product have very good quality. For 5th and 6th group most product have good quality. For group 7th and 8th most product have average quality while 9th and 10th have poor quality.

Figure 5: Histogram showing relationship wrt feed 4.2 Relation with Speed The speed varies from 20 rpm to 80 rpm. Like feed speed has also been classified into 10 sub-group. In 1st and 2nd group most of the product have poor quality while some have average. In 3rd and 4th group all product have good quality. In 5th and 6th group most have god good quality. In 7th and 8th group most product have very good quality and in the 9th and 10th group most of the product are of excellent quality.

Figure 6: Histogram showing relationship wrt speed


4.3 Relation with Other Parameters

From the fig it is clearly visible that their is hardly any variation in quality with z-axis offset, chip load. Only diameter, feed and speed affects SFM and Quality

Figure 7: Other histograms 4.2 SCATTER PLOT Scatter plots were obtained during visualisation of the data. Jitter is used to adjust the noise level of the plot. The noise is lowered to an optimal level by extrapolation which rounds of the data variation into a linear plot and the data with large variation are eliminated. PlotSize and PointSize were tools used to adjust the resolution of the plot.


4.2.1 Diameter v/s Quality

Figure 8: Diameter v/s Quality It shows Quality remains constant with respect to the diameter within the particular range. For higher diameter Quality gets poorer. The increasing inertial effect of higher diameter may attribute to this variation. 4.2.2 Diameter v/s SFM

Figure 9: Diameter v/s SFM A linear plot has been obtained with slight disturbances. The randomness has been reduced to a certain optimal level by the noise cancellation feature of data mining.


4.2.3 Feed v/s Quality

Figure 10: Feed v/s Qty plot It shows the same variation as Dia v/s Quality plot which shows the similarity of the effect of Dia and Feed over Quality and linear relationship of Dia and Feed. 4.2.4 Feed v/s SFM

Figure 11: Feed v/s SFM plot As expected Feed v/s SFM also shows similar trend as Dia v/s SFM plot which is a linearly expanding plot with slight disturbances.


4.2.5 Speed v/s Quality

Figure 12: Speed v/s Quality plot Speed shows a just the reverse of the trend as shown by feed and dia. This shows the inverse relationship between speed and diameter and feed. The increase in feed reduces quality grade wise, while within the grade quality is independent of speed. 4.2.6 Combined Effect of Speed and Feed on Quality

Figure 13: Combined effect of Speed and Feed v/s Quality plot


High concentration of pink dots corresponding to high feed and low speed shows that at high feed and low speed surface finish is poor. On the other hand for low feed and high speed, high concentration of blue dots show that at low feed and high speed surface finish is excellent. 4.3 PRUNED TREE Pruned tree was obtained as one of the Classifier input by using J48 Algorithm. It has been represented in a hierarchical form as shown below

It clearly shows the relationship between Speed, Feed, Retract which can be simplified by a flow chart as shown below



Feed <= 2.51

Feed > 2.51

Speed < = 59.72

Speed > 59.72 Result = E

Feed < 1.49

Feed > 1.49 Result = C Speed > 39.02 Result = D Speed <=59.85 Result = C

Feed <= 4.487

Speed <= 39.02 Result = C

Feed > 4.487

Speed <= 43.1 Speed > 59.85 Result = B Retract <= 16.48 Result = A

Speed > 43.1 Retract <= 8.587 Result = A

Retract > 8.587 Retract > 16.48 Result = B Result = B

Figure 14: Flowchart showing the decision tree



Using data mining technique we concluded the relationship between various input and output parameters. From the decision tree it is clearly seen that Quality grading depends on Feed, Diameter and speed basically. Retract plays a prominent part at higher speed and results in improve of quality. The scatter plots show the trend and relationship of various combinations of input and output data. It is concluded that diameter and feed increases the quality grade wise, but within the grade it shows a constant plot showing no effect. But feed and diameter continuously increases the SFM. On the other hand speed has got just the opposite effect on Quality. We witnessed a declining curve; graded declining for Quality. This also shows the inverse relationship of what obtained wrt diameter and feed. So relationship between input parameters can also been drawn using data mining technique.



[1].Song, Kusiak, 2007. Optimising product configurations with a data-mining approach, International Journal of Production Research, Vol. 47, No. 7, 1 April 2009, 17331751. [2]. Jenkol, Kralj, Lavrac, Sluga, 2007. A Data Mining Experiment on Manufacturing Shop Floor Data.Proc., 40th CIRP International Manufacturing Systems Seminar. [3]. Huang, Fan, Tseng, Lee, Chuang,2008. A Hybrid Data Mining Approach to Quality Assurance of Manufacturing Process, IEEE International Conference on Fuzzy Systems. [4]. Sadoyan, Zakarian, Mohanty, 2006. Data mining algorithm for manufacturing process control, Int J AdvManuf Technology, 28: 342350 [5]. Kusiak, 2006. Data mining: manufacturing and service applications, International Journal of Production Research, Vol. 44, Nos. 1819 [6].Cao, XuXu, Sun,2008. Data Mining for the Optimization of Production Scheduling in FlexibleManufacturing System, 3rd International Conference on Intelligent System and Knowledge Engineering [7]. Chien, Cheng, Lin, 2007. A Hybrid Decision Tree Approach for Semiconductor Manufacturing Data Mining and an Empirical Study, IEEE Transactions On Electronics Packaging Manufacturing, Vol. 23, No. 4. [8]. Saravanan, Siddabattuni, Ramachandran, 2010. Fault diagnosis of spur bevel gear box using artificial neural network (ANN), Decision Tree using J48 algorithm, and proximal support vector machine (PSVM), Journal of Applied Soft Computing, Vol. 10, Page 344360