Вы находитесь на странице: 1из 5

RELATED STUDIES (DATABASE INFERENCE)

The inference problem has received a great deal of attention for many years. This issue has been discussed a great deal over the past two decades. Database inference problem is a well-known problem in database security and information system security. Many opinions to address the inference problem were summarized in this part. Particularly in the article Protection of Database Security via Collaborative Inference Detection (Chen & Chu), pointed out that Database inferences could have been extensively studied. Particularly when Delugach and Hinke used database schema and human-supplied domain information to detect inference problems during database design time. Garvey has developed a tool for database designers to detect and remove specific types of inference in a multilevel database system. Both approaches use schema-level knowledge and do not infer knowledge at the data level. These techniques are also used during database design time and not at run time.

However, Yip pointed out the inadequacy of schema-level inference detection, and he identifies six types of inference rules from the data level that serve as deterministic inference channels. In order to provide a multilevel secure database management system, an inference controller prototype was developed to handle inferences during query processing. Rule-based inference strategies were applied in this prototype to protect the security. Further, since data update can affect data inference, they proposed a mechanism that propagates update to the user history

files to ensure no query is rejected based on the outdated information. To reduce the time in examining the entire history log in computing inference, he also proposed to use a prior knowledge of data dependency to reduce the search space of a relation and thus reduce the processing time for inference. Open inference techniques were proposed to derive approximate query answering when network partitions occurred in distributed databases. Feasible open inference channels can be derived based on query and database schema.

To shorten the article of Chen and Chu, they said that inference refers to a process in which various data sets available to a user at a lower security level may be pieced together to produce a larger and more important data form, which as a whole is only available at a higher security level. Inference is a severe issue, which hackers or other such external or even internal threats may use to effectively obtain data that they do not have rightful access to, and can use it to manipulate, contaminate or even falsify data. This can be effectively tackled by constructing a Semantic Inference Model (SIM), by mapping it into a Bayesian network, which monitors each possible inference channel and computes inference probability. Thus, any possibility of possible security breach through inference is removed.

In the article Parity-based inference control for multi-dimensional range sum queries by Lingyu Wang, Yingjiu Li, Sushil Jajodia and Duminda Wijesekera points out that inference control has been extensively studied in statistical databases and

the proposed methods are usually categorized into two categories: restriction based techniques and perturbation-based techniques. Restriction-based techniques include restricting the size of query sets, restricting the size of overlaps between query sets, detecting inferences through auditing queries, suppressing tabular data to prevent inferences, partitioning values, and restricting queries to complete blocks in the partition. Perturbation-based techniques add random noises to source data, outputs, or the structure of databases. Other aspects of the inference problem include the inferences in multi-level databases and the inferences targeting approximated values. Directly applying the inference control methods in statistical databases is not desired, because these methods are intended for arbitrary queries and they typically ignore the unique structures of MDR queries.

Controlling inferences of a special class of MDR queries, namely, data cube queries, have been studied. First, the study shows that a SUM-only data cube is free of inferences if the number of previously known values is below a tight upper bound (the bound is tight in the sense that no better bound exists). However, the converse is not necessarily true, and an inference-free data cube may be mistakenly taken as causing inferences by the method proposed in. Second, the study restricts queries such that the adversary cannot combine multiple queries for an inference. This approach greatly eases inference control and applies to any aggregation functions as long as certain algebraic properties are satised. Third, the study addresses the approximate inferences in terms of lower and upper bounds of the actual values. However, these methods do not directly apply to the case, because data cube

queries are only a subset of all MDR queries based on explicit dimension hierarchies.

The inference problem of one-dimensional range queries has been studied before, and the author considers the multi-dimensional case difficult. The usability (i.e., the highest possible ratio of the number of inference-free queries to that of all queries) of MDR queries in the absence of previously known values has been studied. The restriction of even MDR queries is mentioned but not fully explored, and the more general case with arbitrary known values is regarded as challenging. Chin gives necessary and sufficient condition for the sum-two queries to be inference-free, and they show that nding the maximal inference-free subsets of sum two queries are NP-hard. However, in practice queries are rarely limited to sum-two queries. In this paper, we generalize the results on sum-two queries to even MDR queries. Perturbation-based methods have been proposed for preserving privacy in data mining applications. Random noises are added to destroy the sensitive information while the statistical distribution is approximately reconstructed from the per-420 L. Wang Parity-based inference control for multi-dimensional range sum queries turbed data to facilitate data mining tasks.

The methods proposed can approximately reconstruct COUNTs from perturbed data with statistically bound errors, so OLAP tasks like classication can be fullled. However, in general protecting sensitive data in OLAP is different from that in data mining, because OLAP tasks may demand small details, such as

outliers, that cannot be obtained from distribution models alone. Potential errors in individual values may be signicant, preventing OLAP users from gaining trustful insights. The methods we study are based on restrictions and hence do not introduce any noises. Secure multi-party data mining allows multiple distrusted parties to cooperatively compute data mining results with minimal disclosures of their own data. This problem is different from inference control, because the threat of inferences comes from what users know, not from the way they know it.

Inference problems are an important but still relatively unexplored aspect of database security. However, the works surveyed in this essay show a number of promising ways of attacking these problems and a number of interesting angles from which they can be approached. A complete and general solution to the inference problem is impossible. However, in the future, given a good understanding of ones problem area and the ways in which in inferences are made, it should be possible to construct a database that is both usable and reasonably secure against inference.

Вам также может понравиться