Вы находитесь на странице: 1из 38

A Novel Aggregations Approach for Preparing Datasets

CHAPTER 1 INTRODUCTION
In a relational database, especially with normalized tables, a significant effort is required to prepare a summary data set that can be used as input for a data mining or statistical algorithm. Most algorithms require as input a data set with a horizontal layout, with several records and one variable or dimension per column. That is the case with models like clustering, classification, regression and PCA; consult. Each research discipline uses different terminology to describe the data set. In data mining the common terms are point-dimension. Statistics literature generally uses observation-variable. Machine learning research uses instance-feature. This article introduces a new class of aggregate functions that can be used to build data sets in a horizontal layout (demoralized with aggregations), automating SQL query writing and extending SQL capabilities. We show evaluating horizontal aggregations is a challenging and interesting problem and we introduced alternative methods and optimizations for their efficient evaluation.

1.1PROBLEM STATEMENT:
Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g. point-dimension, observation-variable, instance-feature), which is the standard layout required by most data mining algorithms. Our proposed horizontal aggregations provide several unique features and advantages. First, they represent a template to generate SQL code from a data mining tool. Such SQL code automates writing SQL queries, optimizing them and testing them for correctness.

A Novel Aggregations Approach for Preparing Datasets 1.2 MOTIVATION: Building a suitable data set for data mining purposes is a time- consuming task. This task generally requires writing long SQL statements or customizing SQL Code if it is automatically generated by some tool. There are two main ingredients in such SQL code: joins and aggregations; we focus on the second one. The most widely-known aggregation is the sum of a column over groups of rows. Some other aggregations return the average, maximum, minimum or row count over groups of rows. There exist many aggregations functions and operators in SQL. Unfortunately, all these aggregations have limitations to build data sets for data mining purposes. The main reason is that, in general, data sets that are stored in a relational database (or a data warehouse) come from On-Line Transaction Processing (OLTP) systems where database schemas are highly normalized. But data mining, statistical or machine learning algorithms generally require aggregated data in summarized form. Based on current available functions and clauses in SQL, a significant effort is required to compute aggregations when they are desired in a cross tabular (Horizontal) form, suitable to be used by a data mining algorithm. Such effort is due to the amount and complexity of SQL code that needs to be written, optimized and tested. There are further practical reasons to return aggregation results in a horizontal (cross-tabular) layout. Standard aggregations are hard to interpret when there are many result rows Especially when grouping attributes have high cardinalities. To perform analysis of exported tables into spreadsheets it may be more convenient to have aggregations on the same group in one row (e.g. to produce graphs or to compare data sets with repetitive information). OLAP tools generate SQL code to transpose results (sometimes called PIVOT). Transposition can be more efficient if there are mechanisms combining aggregation and transposition together. With such limitations in mind, we propose a new class of aggregate functions that aggregate numeric expressions and transpose results to produce a data set with a horizontal layout. Functions belonging to this class are called horizontal aggregations. Horizontal aggregations represent an extended form of traditional SQL aggregations, which return a set of values in a horizontal layout (somewhat similar to a multidimensional vector), instead of a single value per row.

A Novel Aggregations Approach for Preparing Datasets

1.3 SCOPE:
Data mining algorithm requires suitable input in the form of cross tabular (horizontal) form significant effort is required to compute aggregations. Such effort is due to the amount and complexity of SQL code which needs to be written, optimized and tested.

1.4 OUTLINE:
Data aggregation is a process in which information is gathered and expressed in a summary form, and which is used for purposes such as statistical analysis. A common aggregation purpose is to get more information about particular groups based on specific variables such as age, name, phone number, address, profession, or income. Most algorithms require input as a data set with a horizontal layout, with several records and one variable or dimension per column. That technique is used with models like clustering, classification, regression and PCA. Dimension used in data mining technique are point dimension.

A Novel Aggregations Approach for Preparing Datasets

CHAPTER 2 BACKGROUND
Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. The methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.

2.1 WHAT IS SQL :


SQL is Structured Query Language and is a widely used database language, providing means of data manipulation (store, retrieve, update, delete) and database creation. SQL can insert data into database tables. SQL can modify data in existing database tables. SQL can delete data from SQL database tables. Finally SQL can modify the database structure itself create/modify/delete tables and other database objects.SQL uses set of commands to manipulate the data in relational databases. For example SQL INSERT is used to insert data in database tables. SQL SELECT command is used to retrieve data from one or more database tables. SQL UPDATE is used to modify existing database records.

2.2 WHAT IS DATA MINING:


Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD). Data mining can answer questions that cannot be addressed through simple query and reporting techniques.

A Novel Aggregations Approach for Preparing Datasets The key properties of data mining are:

Automatic discovery of patterns Prediction of likely outcomes Creation of actionable information Focus on large data sets and databases

2.3 WHAT CAN DATA MINING DO AND NOT DO:


Data mining is a powerful tool that can help us find patterns and relationships within our data. But data mining does not work by itself. It does not eliminate the need to know our business, to understand our data, or to understand analytical methods. Data mining discovers hidden information in our data, but it cannot tell us the value of the information to our organization. Data mining can confirm or qualify such empirical observations in addition to finding new patterns that may not be immediately discernible through simple observation. It is important to remember that the predictive relationships discovered through data mining are not necessarily causes of an action or behavior.

2.4 THE SCOPE OF DATA MINING:


Data mining derives its name from the similarities between searching for valuable business information in a large database for example, finding linked products in gigabytes of store scanner data and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find exactly where the value resides. Given databases of sufficient size and quality, data mining technology can generate new business opportunities by providing these capabilities.

2.4.1 AUTOMATED PREDITION OF TRENDS AND BEHAVIORS


Data mining automates the process of finding predictive information in large databases. Questions that traditionally required extensive hands-on analysis can now be answered directly from the data quickly. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting
5

A Novel Aggregations Approach for Preparing Datasets bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.

2.4.2 AUTOMATE DISCOVERY OF PREVIOUSLY UNKNOWN PATTERNS:


Data mining tools sweep through databases and identify previously hidden patterns in one step. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.

2.4.3 THE MOST COMMONLY USED TECHNIQUES IN DATA MINING:


Decision trees Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID). Genetic algorithms Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution. Nearest neighbor method A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule Induction The extraction of useful if-then rules from data based on statistical significance.

2.5 WHAT IS K-MEANS CLUSTERING:


K-Means Clustering is a method of vector quantization originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims

to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.

A Novel Aggregations Approach for Preparing Datasets

CHAPTER-3 EARLIER WORK


3.1 BAYESIAN CLASSIFIERS PROGRAMMED IN SQL:
Classification is a fundamental problem in machine learning and statistics. Bayesian classifiers stand out for their robustness, interpretability, and accuracy. They are deeply related to maximum likelihood estimation and discriminate analysis, highlighting their theoretical importance. In this work, we focus on Bayesian classifiers considering two variants: Naive Bayes and a Bayesian classifier based on class decomposition using clustering. Bayesian classification algorithms into a DBMS. Such integration allows users to directly analyze data sets inside the DBMS and to exploit its extensive capabilities (storage management, querying, concurrency control, fault tolerance, and security). We use SQL queries as the programming mechanism, since it is the standard language in a DBMS. More importantly, using SQL eliminates the need to understand and modify the internal source, which is a difficult task. Unfortunately, SQL has two important drawbacks: it has limitations to manipulate vectors and matrices and has more overhead than a systems language like C. Keeping those issues in mind, we study how to evaluate mathematical equations with several tables layouts and optimized SQL queries.

3.2 INTEGRATING K-MEANS CLUSTERING WITH A RELATIONAL DBMS USING SQL:


Data mining research on analyzing large data sets is extensive, but most work has proposed efficient algorithms and techniques that work outside the DBMS on flat files. Wellknown data mining techniques include association rules clustering and decision trees, among others. The problem of integrating data mining techniques with a DBMS has received scant attention due to their mathematical nature and DBMS software complexity. Thus, in a modern database environment, users generally export data sets to a statistical or data mining tool and perform most or all of the analysis outside the DBMS. SQL has been used as a mechanism to integrate data mining algorithms since it is the standard language in relational DBMSs, but unfortunately, it has limitations to perform complex matrix operations, as required by statistical
7

A Novel Aggregations Approach for Preparing Datasets models. Some statistical tools (e.g., SAS) can directly score data sets by generating SQL queries, but they are mathematically limited. In this work, we show that UDFs represent a promising alternative to extend the DBMS with multidimensional statistical models. The statistical models studied in this work include linear regression, principal component analysis (PCA), factor analysis, clustering, and Naive Bayes. These models are widely used and cover the whole spectrum of unsupervised and supervised techniques. UDFs are a standard Application Programming Interface (API) available in modern DBMSs. In general, UDFs are developed in the C language (or similar language), compiled to object code and efficiently executed inside the DBMS like any other SQL function. Thus, UDFs represent an outstanding alternative to extend a DBMS with statistical models, exploiting the C language flexibility and speed. Therefore, there is no need to change SQL syntax with new data mining primitives or clauses, making UDF implementation and usage easier. The UDFs proposed in this paper can be programmed on any DBMS supporting scalar and aggregate UDFs

3.3 MODEL COMPUTATION WITH UDFs:


Pivot and Unpivot are complementary data manipulation operators that modify the role of rows and columns in a relational table. Pivot transforms a series of rows into a series of fewer rows with additional columns. Data in one source column is used to determine the new column for a row, and another source column is used as the data for that new column. Unpivot provides the inverse operation, removing a number of columns and creating additional rows that capture the column names and values from the wide form. The wide form can be considered as a matrix of column values, while the narrow form is a natural encoding of a sparse matrix. Figure 1 demonstrates how Pivot and Unpivot can transform data between narrow and wide tables. For certain classes of data, these operators provide powerful capabilities to RDBMS users to structure, manipulate, and report data in useful ways Implementations of pivoting functionality already exist for the purpose of data presentation, but these operations are usually performed either outside the RDBMS or as a simple post-processing operation outside of query processing. Microsoft Excel, for example, supports pivoting. Users can perform a traditional SQL query against a data source, import the result into Microsoft Excel, and then perform pivoting operations on the results returned from that data source. Microsoft Access (which uses the Microsoft Jet Database Engine) also provides
8

A Novel Aggregations Approach for Preparing Datasets pivoting functionality. This pivot implementation is a post-processing operation through cursors. While existing implementations are certainly useful, they fail to consider Pivot or Unpivot as first-class RDBMS operations,

Inclusion of Pivot and Unpivot inside the RDBMS enables interesting and useful possibilities for data modeling. Existing modeling techniques must decide both the relationships between tables and the attributes within those tables to persist. The requirement that columns be strongly defined contrasts with the nature of rows, which can be added and removed easily. Pivot and Unpivot, which exchange the role of rows and columns, allow the a priori requirement for pre-defined columns to be relaxed. These operators provide a technique to allow rows to become columns dynamically at the time of query compilation and execution. When the set of column cannot be determined in advance, one common table design scenario employs property tables, where a table containing (id, property name, property value) is used to store a series of values in rows that would be desirable to represent columns. Users typically use this design to avoid RDBMS implementation restrictions (such as an upper limit for the number of columns in a table or storage overhead associated with many empty columns in a row) or to avoid changing the schema when a new property needs to be added. This design choice has implications on
9

A Novel Aggregations Approach for Preparing Datasets how tables in this form can be used and how well they perform in queries. Property table queries are more difficult to write and maintain, and the complexity of the operation may result in less optimal query execution plans. In general, applications written to handle data stored in property tables can not easily process data in the wide (pivoted) format. Pivot and Unpivot enable property tables to look like regular tables (and vice versa) to a data modeling tool. These operations provide the framework to enable useful extensions to data modeling

Including Pivot and Unpivot explicitly in the query language provides excellent opportunities for query optimization. Properly defined, these operations can be used in arbitrary combinations with existing operations such as filters, joins, and grouping. For example, since Unpivot transposes columns into rows, it is possible to convert a filter (an operation that restricts rows) over Unpivot into a projection (an operation that restricts columns) beneath it. Algebraic equivalences between Pivot/Unpivot and existing operators enable consideration of many execution strategies through reordering, with the standard opportunity to improve query performance. Furthermore, new optimization techniques can also be introduced that take advantage of unique properties of these new operators. Consideration of these issues provides powerful techniques for improving existing user scenarios currently performed outside the confines of a query optimizer

10

A Novel Aggregations Approach for Preparing Datasets

CHAPTER 4 EXISTING SYSTEM


In existing work, preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. Standard aggregations are hard to interpret when there are many result rows, especially when grouping attributes have high cardinalities. There exist many aggregation functions and operators in SQL. Unfortunately, all these aggregations have limitations to build data sets for data mining purposes. Horizontal aggregation is new class of function to return aggregated columns in a horizontal layout. Most algorithms require datasets with horizontal layout as input with several records and one variable or dimensions per columns. Managing large data sets without DBMS support can be a difficult task. Trying different subsets of data points and dimensions is more flexible, faster and easier to do inside a relational database with SQL queries than outside with alternative tool. Horizontal aggregation can be performing by using operator, it can easily be implemented inside a query processor, much like a select, project and join. PIVOT operator on tabular data that exchange rows, enable data transformations useful in data modeling, data analysis, and data presentation.

4.1 EXECUTION STRATEGIES IN HORIZONTAL AGGREGATION: Horizontal aggregations propose a new class of functions that aggregate numeric expressions and the result are transposed to produce data sets with a horizontal layout. The operation is needed in a number of data mining tasks, such as unsupervised classification and data summation, as well as segmentation of large heterogeneous data sets into smaller homogeneous subsets that can be easily managed, separately modeled and analyzed. To create datasets for data mining related works, efficient and summary of data are needed. For that this proposed system collect particular needed attributes from the different fact tables and displayed columns in order to create date in horizontal layout. Main goal is to define a template to generate

11

A Novel Aggregations Approach for Preparing Datasets SQL code combining aggregation and transposition (pivoting). A second goal is to extend the SELECT statement with a clause that combines transposition with aggregation.

4.2 HORIZONTAL AGGREGATION METHODS:

4.2.1 SPJ METHOD: The SPJ method is based on only relational operators. The basic concept in SPJ method is to build a table with vertical aggregation for each resultant column. To produce Horizontal aggregation FH system must join all those tables. There are two sub-strategies to compute Horizontal aggregation .First strategy includes direct calculation of aggregation from fact table. Second one compute the corresponding vertical aggregation and store it in temporary table.

4.2.2 CASE METHOD: In SQL build-in case programming construct are available, it returns a selected value rather from a set of values based on Boolean expression. It can be used in any statement or clause that allows a valid expression. The case statement returns a value selected from a set of values based on Boolean expression. The Boolean expression for each case statement has a conjunction of K equality comparisons. Query evaluation needs to combine the desired aggregation with case statement for each distinct combination of values.

4.2.3 PIVOT METHOD: Pivot transforms a series of rows into a series of fewer numbers of rows with additional columns Data in one source column is used to determine the new column for a row, and another source column is used as the data for that new column. The wide form can be considered as a matrix of column values, while the narrow form is a natural encoding of a sparse matrix. In current implementation PIVOT operator is used to calculate the aggregations. One method to express pivoting uses scalar sub queries. Each pivoted is created through a separate sub query. PIVOT operator provides a technique to allow rows to columns dynamically at the time of query compilation and execution.

12

A Novel Aggregations Approach for Preparing Datasets

4.3 EXISTING SYSTEM DISADVANTAGES


Existing SQL aggregations have limitations to prepare data sets. Returns one column per aggregated group. Our existing method didnt provide the efficient SQL queries. Its not used for very large warehousing projects It takes more time to create the data sets. This Existing system was not reduces the manual work.

13

A Novel Aggregations Approach for Preparing Datasets

CHAPTER 5 PROPOSED SYSTEM


In the Proposed system, a new standard of PIVOT option is incorporated using a dataming. This can be achieved with the tool (SAAS) SQL server analysis services. The data will be taken and it will be transformed into knowledge cubes. This can be achieved with MDX queries. On the top of that knowledge data will be customized based on Generalized and suppression algorithm.

5.1ADVANTAGES OF PROPOSED SYSTEM:


The SQL code reduces manual work in the data preparation phase in a data mining project. The SQL code is automatically generated it is likely to be more efficient than SQL code written by an end user. The data sets can be created in less time. The data set can be created entirely inside the DBMS.

5.2PROPOSED PROCESS FLOW:

FIGURE 3: PROCESS FLOW

5.3 MODULES:
Admin Module User Module View Module Download Module

14

A Novel Aggregations Approach for Preparing Datasets

5.3.1: ADMIN MODULE


Admin will upload new connection form based on regulations in various states. Admin will be able upload various details regarding user bills like a new connection to a new user, amount paid or payable by user. In case of payment various details regarding payment will be entered and separate username and password will be provided to users in large.

5.3.2: USER MODULE


User will be able to view his bill details on any date may be after a month or after months or years and also he can to view the our bill details in a various ways for instance, The year wise bills, Month wise bills, totally paid to bill in EB. This will reduce the cost of transaction. If user thinks that his password is insecure, he has option to change it. He also can view the registration details and allowed to change or edit and save it.

5.3.3 VIEW MODULE


Admin has three ways to view the user bill details, the 3 ways are SPJ: While using SPJ the viewing and processing time of user bills is reduced. PIVOT: This is used to draw the user details in a customized table. This table will elaborate us on the various bill details regarding the user on monthly basis. CASE: Using CASE query we can customize the present table and column based on the conditions. This will help us to reduce enormous amount of space used by various user bill details. It can be viewed in two difference ways namely Horizontal and Vertical. In case of vertical the number of rows will be reduced to such an extent it is needed and column will remain the same on other hand the Horizontal will reduce rows as same as vertical and will also increase the columnar format

5.3.4 DOWNLOAD MODULE:

15

A Novel Aggregations Approach for Preparing Datasets User will be able to download the various details regarding bills. If he/she is a new user, he/she can download the new connection form, subscription details etc. then he/she can download his /her previous bill details in hands so as to ensure it.

CHAPTER-6 SOFTWARE REQUIREMENTS SPECIFICATION


6.1 HARDWARE REQUIREMENTS
The hardware requirements may serve as the basis for a contract for the implementation of the system and should therefore be a complete and consistent specification of the whole system. They are used by software engineers as the starting point for the system design. It should what the system do and not how it should be implemented. PROCESSOR RAM MONITOR HARD DISK CDDRIVE KEYBOARD MOUSE : : : : : : : PENTIUM IV 2.6 GHz, Intel Core 2 Duo. 2 GB DD RAM 15 COLOR 40 GB LG 52X STANDARD 102 KEYS 3 BUTTONS

6.2 SOFTWARE REQUIREMENTS


The software requirements document is the specification of the system. It should include both a definition and a specification of requirements. It is a set of what the system should do rather than how it should do it. The software requirements provide a basis for creating the software requirements specification. It is useful in estimating cost, planning team activities, performing tasks and tracking the teams and tracking the teams progress throughout the development activity. Operating system IDE Front End Database : Windows 07/ XP Professional : Visual Studio 2010 : ASP.Net : SQL Server 2005
16

A Novel Aggregations Approach for Preparing Datasets

CHAPTER-7 DESIGN DIAGRAM


7.1 USE CASE DIAGRAM:

FIGURE: 7.1 USE CASE DIAGRAM

DESCRIPTION:
User: User gets registered and login, user can view the bill details in aggregation wise view, vertical view and horizontal view depending on user need. Admin: Admin will upload new connection form and enters the bill details using SPJ wise view, CASE wise view and PIVOT wise view to display bill details to user.
17

A Novel Aggregations Approach for Preparing Datasets

7.2 CLASS DIAGRAM:

FIGURE: 7.2 CLASS DIAGRAM

DESCRIPTION:
User gets register and login in his account. Admin will create bill entry like meter no, Issue date, reading, total cost etc. Admin will upload the form on request
18

A Novel Aggregations Approach for Preparing Datasets Both user and admin can view the bill details like meter no, paid date, paid amount etc.

19

A Novel Aggregations Approach for Preparing Datasets

7.3 SEQUENCE DIAGRAM:

FIGURE: 7.3 SEQUENCE DIAGRAM

DESCRIPTION:
Admin will upload new connection form on server. User will register. Admin will upload bill details on server. Both user and admin and view bill details from server Aggregate wise display to user from server

20

A Novel Aggregations Approach for Preparing Datasets

7.4 ACTIVITY DIAGRAM

FIGURE: 7.4 ACTIVITY DIAGRAM

DESCRIPTION:
When there is need for Search form, and then download new search form from the database will display user bill details in SPJ wise view, CASE wise view, PIVOT wise view when admin login and upload the form. When user login, database will display options like change password, view details and aggregate view

21

A Novel Aggregations Approach for Preparing Datasets

CHAPTER 8 IMPLEMENTATION
8.1 CODE TEMPLATE
According to this project the implementation was carried out with the continuous revision of the requirements, matching them on the relevance of tasks expected to be performed by the system.

8.2 DESIGN IMPLEMENTATION The design for the system was formulated manually by me under the internal guides supervision. The design for follow of the system and data was revised many times for changes and finalized. In the same way the design for the forms and Reports was made subject to user convenience and organizations needs respectively. The design implementation made me to gain knowledge about the user convenience methods of designing the forms. The reason for making the product entry screen was based on making the user feel more convenient with a Graphical User Interface. I have used check boxes to ensure that user checks the eligible deductions. The design implementation was in need of knowing the required inputs and controls that could accomplish the design from the paper to the forms in windows. This implementation phase helped me to be aware of the limitations of the controls that could be used in developing the software. 8.3 CODE IMPLEMENTATION The code for this project was implemented step by step starting from deciding the events in which the code must get executed and deciding the logic which the code has to execute. Since my project was done in Menu driven interface the coding was done form by form. The code was mainly focused to execute in a sequence of the logic that is expected from it. After implementing the code the testing was carried out.

22

A Novel Aggregations Approach for Preparing Datasets

8.4 IMPLEMENTATION DETAILS In horizontal aggregations we used three types of aggregations function to develop horizontal layout tabular form. PIVOT METHOD: A common scenario where PIVOT can be useful is when you want to generate crosstabulation reports to summarize data. For example, suppose you want to query the User bill details table in the Horizontal database to determine the number of bills placed by certain users. The following query provides this report.
USE Adv SELECT <non-pivoted column>, [first pivoted column] AS <column name>, [second pivoted column] AS <column name>, ... [last pivoted column] AS <column name> FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS <alias for the pivot table> <optional ORDER BY clause>;

CASE METHOD: Using CASE query we can customize the present table and column based on the conditions. This will help us to reduce enormous amount of space used by various user bill details. It can be viewed in two difference ways namely Horizontal and Vertical.
23

A Novel Aggregations Approach for Preparing Datasets

CHAPTER-9 TESTING
9.1 TEST CASES:
Login Name Action Input Parameters Test Case Login User ID, Password

Expected Out Put

If user id and password is correct redirect to Admin or User login page else show warning message that enter valid user id/password.

Actual Out Put

We tried in different ways but it giving correct out put that while entering correct Id Password its redirecting to users home page otherwise showing login failed message

Result

Login action performing good.

Table no: 9.1 Test Case 1


24

A Novel Aggregations Approach for Preparing Datasets User Registration

Name Action Input Parameters

Test Case User Registration Details regarding new user.

Expected Out Put

While submitting button complete user details should be stored in user registration database table.

Actual Out Put

While submitting button complete user details are storing in user database table.

Result

User Registration action performing good.

Table no: 9.2 Test Case 2

User Form Downloading

Name Action Input Parameters

Test Case Download form Download button, click save

Expected Output

When user click on download button when it displays save the file toolbar and when you click on save button it should download.

Actual Output

We tried in different ways it downloaded the form

Result

Form downloaded

25

A Novel Aggregations Approach for Preparing Datasets Table no: 9.3 Test Case 3 User Bill Details Name Action Input Parameters Test Case User bill details User name , Meter number

Expected Output

When user enters his name and meter number it should immediately display user bill details

Actual Output

We entered username and meter number after clicking on submit button it automatically moving to next page user bill details.

Result

It will display User bill details

Table no: 9.4 Test Case 4

Database Connectivity Name Action Input Parameters Test Case Check database connectivity User id and password

Expected Output

When we enter username and password it should retrieve the user bill details.

Actual Output

We tried in different ways it retrieving different data

26

A Novel Aggregations Approach for Preparing Datasets

Result

Database is connected

Table no: 9.5 Test Case 5

27

A Novel Aggregations Approach for Preparing Datasets

CHAPTER-10 SCREEN SHOTS

Screen no: 10.1 Screen shots for Home Page

Screen no: 10.2 Screen shots for NCForm Page

28

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.3 Screen shots for Admin Login Page

Screen no: 10.4 Screen shots for Admin SPJ View

29

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.5 Screen shots for Admin Case View

Screen no: 10.6 Screen shots for Admin Vertical View

30

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.7 Screen shots for Admin Pivot View

Screen no: 10.8 Screen shots for admin update

31

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.9 Screen shots for NA Form Upload

Screen no: 10.10 Screen shots for Home Page

32

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.11 Screen shots for User login page

Screen no: 10.12 Screen shots for User registration page

33

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.13 Screen shots for User bill details

Screen no: 10.14 Screen shots for User date wise details

34

A Novel Aggregations Approach for Preparing Datasets

Screen no: 10.15 Screen shots for User Average Details

Screen no: 10.16 Screen shots for total details of user

35

A Novel Aggregations Approach for Preparing Datasets

CHAPTER-11
CONCLUSION AND FURTHER DEVELOPMENT
We proposed three query evaluation methods. The first one (SPJ) relies on standard relational operators. The second one (CASE) relies on the SQL CASE construct. The third (PIVOT) uses a built-in operator in a commercial DBMS that is not widely available. The SPJ method is important from a theoretical point of view because it is based on select, project, and join (SPJ) queries. The CASE method is our most important contribution. It is in general the most efficient evaluation method and it has wide applicability since it can be programmed combining GROUPBY and CASE statements. We proved the three methods produce the same result. We have explained it is not possible to evaluate horizontal aggregations using standard SQL without either joins or case constructs using standard SQL operators. Our proposed horizontal aggregations can be used as a database method to automatically generate efficient SQL queries with three sets of parameters: grouping columns, sub grouping columns, and aggregated column. The fact that the output horizontal columns are not available when the query is parsed (when the query plan is explored and chosen) makes its evaluation through standard SQL mechanisms infeasible. Our experiments with large tables show our proposed horizontal aggregations evaluated with the CASE method have similar performance to the built-in PIVOT operator. We believe this is remarkable since our proposal is based on generating SQL code and not on internally modifying the query optimizer.

Horizontal aggregations produce tables with fewer rows, but with more columns. Thus query optimization techniques used for standard aggregations are inappropriate for horizontal aggregations. In future, this work can be extended to develop a more formal model of evaluation methods to achieve better results. Also then we can be developing more complete I/O cost models.

REFERENCES
36

A Novel Aggregations Approach for Preparing Datasets [1] G. Bhargava, P. Goel, and B.R. Iyer, Hypergraph Based Reordering of Outer Join Queries with Complex Predicates,Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD 95), pp. 304-315, 1995. [2] J.A. Blakeley, V. Rao, I. Kunen, A. Prout, M. Henaire, and C. Kleinerman, .NET Database Programmability and Extensibilityin Microsoft SQL Server, Proc. ACM SIGMOD Intl Conf. Management of Data (SIGMOD 08), pp. 1087-1098, 2008. [3] J. Clear, D. Dunn, B. Harvey, M.L. Heytens, and P. Lohman, Non- Stop SQL/MX Primitives for Knowledge Discovery, Proc. ACMSIGKDD Fifth Intl Conf. Knowledge Discovery and Data Mining (KDD 99), pp. 425-429, 1999. [4] E.F. Codd, Extending the Database Relational Model to Capture More Meaning, ACM Trans. Database Systems, vol. 4, no. 4,pp. 397-434, 1979. [5] C. Cunningham, G. Graefe, and C.A. Galindo-Legaria, PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS, Proc. 13th Intl Conf. Very Large Data Bases (VLDB 04),pp. 998-1009, 2004. [6] C. Galindo-Legaria and A. Rosenthal, Outer Join Simplification and Reordering for Query Optimization, ACM Trans. Database Systems, vol. 22, no. 1, pp. 43-73, 1997.

[7] H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book, first ed. Prentice Hall, 2001. [8] G. Graefe, U. Fayyad, and S. Chaudhuri, On the EfficientGathering of Sufficient Statistics for Classification from LargeSQL Databases, Proc. ACM Conf. Knowledge Discovery and DataMining (KDD 98), pp. 204-208, 1998. [9] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Total, Proc. Intl Conf. Data Eng., pp. 152-159, 1996.
37

A Novel Aggregations Approach for Preparing Datasets

[10] J. Han and M. Kamber, Data Mining: Concepts and Techniques, firsted. Morgan Kaufmann, 2001.

Sites Referred: http://www.sourcefordgde.com http://www.networkcomputing.com/ http://www.ieee.org http://blog.sqlauthority.com/sql-server-pivot-and-unpivot-table-examples/ http://www.computer.org/publications/dlib http://www.ceur-ws.org/Vol-90/

38

Вам также может понравиться