Вы находитесь на странице: 1из 5

ACEEE Int. J. on Network Security, Vol. 01, No.

03, Dec 2010

Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Clouds using the concept of Application Metadata
Bharath C. and Spoorthy A. Raman
International Institute of Information Technology, Bangalore, India Email: {bharath.c, spoorthya.raman}@iiitb.net
AbstractCloud computing is an emerging computing paradigm. It aims to share data, resources and services transparently among users of a massive grid. Although the industry has started selling cloud-computing products, research challenges in various areas, such as architectural design, task decomposition, task distribution, load distribution, load scheduling, task coordination, etc. are still unclear. Therefore, we study the methods to reason and model cloud computing as a step towards identifying fundamental research questions in this paradigm. In this paper, we propose a model for load distribution on cloud computing by modeling them as cognitive systems and using aspects which not only depend on the present state of the system, but also, on a set of predefined transitions and conditions. The entirety of this model is then bundled to cater the task of job distribution using the concept of application metadata. Later, we draw a qualitative and simulation based summarization for the proposed model. We finally evaluate the results and draw up a series of key conclusions in cloud computing for future exploration. Index TermsCloud Computing, Load Balancing, Application Metadata.

I.

INTRODUCTION AND RELATED WORK

The problem of load balancing has been ever challenging, be it in terms of a mechanical machine or be it distributed systems and cloud-computing systems [1]. This is attributed to the fact that these systems have polymorphic requirements that span over different domains and domain parameters. Cloud computing, in particular, has had a wide range of solutions to this problem, ranging from the strata involving operating systems, to distributed application design, to designing high level game theoretic optimizations and even building machines to host computing sub-trees (similar to a remote computing machine) [1]. However, even though these models are different morphologically, they all have had a common goal of increasing the economic efficiency and providing supplementary computing power to a cloud of users when it is needed. These methods also have the same problem to sort out, which is to Optimally Balance and Distribute Loads. The classic approach to solving this problem is to use algorithms, which try to solve the problem by analyzing the current state of the system as worked by Shivaratri et al. in [4]. This method has some advantages, but, it also has several disadvantages like the need for a medium or large computing power and due to these problems, the 2010 ACEEE DOI: 01.IJNS.01.03.85 23

scalability is poor on larger systems. The other approach is to try and make estimations to somehow predict the future system state and then behave in a pre-coded manner as required as in [7, 8]. This is a far efficient solution but, it is rather complicated in terms of its algorithmic complexity and ease of modeling. But, all said and done - the grass roots of this approach is elegant and imposing- and this in fact seems to be the future in this domain, more so, because of the problem in the classical approach that can drive to dramatic increase of need for resources. In another words the achieved speedup in the classical approach is either very low or worst. This happens due to the workstation user comportment, which can decide to change the computing needs at any moment. For this reason remote tasks are unacceptably delayed and might even be considered as dead by some elements and therefore would increase the need for the cloud elements to compute the task locally or send to other low charged station elsewhere. One way of estimating the workstations load is by using various statistical approaches such as load functions which are obtained through repeated measurements and a large number of apriori experimentations. These functions often have a Gaussian aspect, like many other models of natural processes. But the basic flaw in these methods is that they ignore the very cause of the load of a workstation, which is dependent on the behavior of the individual user. Since, both types of algorithms mentioned above fail to solve practical problems, there must be some suitable variations made to them to appreciate their capabilities. In this paper, we modify algorithms of the first type which include continuous monitoring of the system state. But in addition to monitoring, there are a set of transition rules specified for the system states based on the type of the inputs received. So given an input a, when the present state of the system is q1, then the system goes to state q2. But a simple transition of the above type is not possible without some apriori knowledge of the input a. This knowledge is given in terms of parameters mentioned in the application metaheader that accompany every task that are to be processed on the cloud. The major parameters included for the efficient working of the system include the number of processor cycles required for applications execution as observed during development and testing phases, the total memory occupied by it while executing

ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

and architecture based metrics. These parameters further help in defining an optimal transition function for the system. Thus, the system is able to cognitively think as to what should be done with the load and where and how it must be scheduled to balance the entire system load. II.
SYSTEM MODEL

system is already loaded, the load balancer must schedule the new process as well as balance the load globally across the system. The value t in the Equation(2) R (to reflect the situation at a certain moment) and k is termed the fatigue factor and is in the closed interval [0, 1]. The value k can be represented as follows (As per Equation (3)).

A. The Application Metaheader Model Application Metaheader is a concept that is conceived by us. It deals with imbibing the apriori knowledge about the cloud application in consideration. The overview of this technique is to affix a compulsory, descriptive and a structured metaheader to each of the incoming cloud application (analogous to the well known TCP/IP headers). Formally this technique could be worded as follows: Let S be the header of an incoming cloud application . S is modeled as a set containing the sub-application parameters as shown Where, denotes the ith sub-application parameter of the n sub-application parameters constituting the cloud application . Now, each inturn is composed of a set of the following task associated data. It could be mathematically formulized as an ordered set as follows: Where, Estimated percentage of processer usage that is needed for the sub-application i as observed on an architecture . (Multiple such values of are present for different architectures of processors.) - Estimated memory utilized by the process in bytes. - Description of each sub-application component along with its length and displacement from the beginning of the code. The source of the sub-application and application related data are obtained during the development and testing phase of the cloud application. They are affixed to the code before it is run on the cloud system. They form the metaheader since they are the meta-knowledge required to provide information about the application in the system. B. The Mathematical Model Starting with the mathematical modeling of the system, the load of the system is defined as follows: The system load of a cloud computing system is defined as the sum of loads on each processing node forming the cloud that is determined by the processor activity. The values of are determined by the on or off state of the particular node. Let us associate a value of 1 if is on and a 0 otherwise. The load on each node, according to [9], is given by: Now, there are two ways of looking at in the above equation. In the start state of the entire system, the values of will be zero since there is no load on the system. In this case, the load balancer is expected to do an optimal load scheduling before balancing. On the other hand, when the 2010 ACEEE DOI: 01.IJNS.01.03.85 24

Here, k is the Fatigue factor,is the Present Performance Capacity of the CPU which is basically a complex statistically determined function that is dependent on factors like the age of the CPU, its average fan speed, clock speed, average fetch time etc., and are the present temperature and Normal (Optimal) temperature rating of the CPU respectively and is the deterioration quotient which is determined by real life temperature tests and is provided by the vendor.(Its calculated by running the CPU at a constant 60oC, then 70oC, and the resultant decrease in CPU life determines (Explained as in [3]). Of course, apart from these other indicators may be considered for a detailed analysis like video memory, virtual memory and so on as depicted in the following relation: Now each of these values is dependent on the user behavior, thereby leading to various values of the subapplication parameters of different tasks. Given these relations and function for the system, the load balancer algorithm must be a function of the following type: This equation basically states that, given a present state of the system and an input task with the sub-application metaheader given as , and the present load on the system being , then the load balancer acts cognitively such that it puts the system state to and the load being. These above equations illustrate the situation for an incremental load on the system since its start. Now, let us model the case of a decremental load, where the load decreases in the system. This is modeled as another incoming task with the values of the sub-application metaheader reading all zeros for the values of. Now that we have values of the metaheader, let us see how the total system load is recalculated. Let be the load of the system when the sub-application with header was executing. Let the load contributed by the sub-application be . So, the total load after the sub-application in consideration exits will be as follows: This change is triggered once there is an immediate change in the processor and memory utilization of a particular node. III.
DETAILED ALGORITHM

Since we have got a basic mathematical model of the algorithmic prototype, our next aim would be to wire these mathematical models to the working of the proposed algorithm. Our basic objective as discussed earlier is to derive a new state for the total load in a cloud, by intelligently modeling the transition rules and transition

ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

parameters. For, this we define three basic criterion of parameterization: Current-State-of-System , Affinity-towork ,the-possible-system-fatigue-state k and the expecteddominant-behavior-rule , which maps the objective parameterized values of the load to a more generic abstract defined set of three labels High, Moderate and Low. The importance of the sub application metaheader is demonstrated by arriving at a suitable value of the Affinity to work parameter using the load contribution as mentioned in the metaheader. Since the value of is dependent on the affinity to work and the incoming application, the arriving decision can be assumed to be accurate and optimal. To begin the modeling process from the scratch, let us now consider some basic behavioral rules from the user point of view which would be to define a set of rules in the form of If the current-state-of-system is , when the fatigue is k and the affinity-to-work is , then the favorable behavior would be . The above mentioned concepts are shown in Figure 1 as a state transition diagram as shown below.

Figure 2: A simple state transition diagram for mapped tasks.

To model the global system, we must try to integrate all these basic user behavior and form a higher order abstraction and approximate them into a global behavior. IV.
WORKING OF THE MODEL

Figure 1: A simple state transition diagram for the user behavior model.

Having the user behavior model and the transition rules, there could also be a generalization performed on the above model that maps the node states of node 1 to 5 to daily performed common computing tasks that would be the ideal clients to run on a cloud computing system. These generalizations are made keeping in account the broad resource utilization pattern of these ubiquitous task. For e.g. if there is a system state such that the processor utilization is high and the memory utilization is also high, we conclude that there has been a heavy computation intensive task being accomplished at that node. Similarly, if the memory utilization is moderate and processor utilization is low, it could be an Input/output intensive task like a document processing task. These values of high, low and moderate are obtained as a result of the metaheader knowledge used cognitively to define the system state. Figure 2 shows the state transition diagram for the Figure 1 based on the task mapping using the state parameters and characteristics. Now that we have modeled the user based behavior for some basic state clauses, we can now extrapolate the user behavior to work in tandem with the system behavior.

After the mathematics and the state diagrams behind the proposed model, it now becomes important for us to know how exactly the entire algorithm is supposed to be working. We will then carry forward this ideal expectation to the simulation phase to compare the correctness of the model. For the working of this model the basic requirements in terms of the inputs would be the values _start (Start Time of work), _work (Total time of work), etc. along with the dependencies that are needed to calculate the values like k, , etc. The functions that are binary in behavior take in values equivalent to on and off which we have modeled to be identified with a crisp values like 1 and 0. On the other hand if we consider the values for subjective entities like fatigue, affinity-to-work, etc. we cannot model them to be along the lines of either being a 0 and 1 or low and high - For this we might make a simulation phase assumption that a low motivation may be considered if the value of the function previously defined X<X_threshold, where the value of is purely based on the aggressiveness and the granularity that is demanded from the algorithm (for example for our simulation we have considered a to be at about 30% for each scale jump). It must also be emphasized that all those values and types of activities are arbitrary, and only studying real users in real working situations can produce a valid ecological model. Now, with the help of our user behavior and system behavior models, we can then approximate the loads of the corresponding workstations in the cloud. Three load functions were selected as follows: i. State, which shows whether the workstation is on or off. ii. Processor, which shows the processor activity. iii. Memory, which shows the memory use on the workstation. These functions can then be defined in a fuzzy manner, as presented in Figure 2, with the help of the types of behavior already computed. We then apply the characteristics of using previously mentioned functions and mathematical models in a simulated environment to find 25

2010 ACEEE DOI: 01.IJNS.01.03.85

ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

out the working of the model. During the phase of simulation- there were a number of key efficiency enumerations made. Since algorithms in most of the literature solely use the instantaneous run-queue length (i.e. the number of jobs being served or waiting for service at the sampling instant) as the load index of a computing node, it is slightly based on the do not overload others intuition. The run-queue length may be a good load index if we assume that all the nodes of the system are homogeneous and the inter-node communication delay is negligible or constant. However, it is not a reliable load indicator in a heterogeneous environment, since; it ignores the variations in computing power. Thus, we made modifications to these two and decided upon using an accumulative job execution load based algorithm. V.
SIMULATIONS AND OBTAINED RESULTS

Figure 3: Load Prediction Test

In order to verify our model an application was designed and implemented using a combination of Perl, NetSim-2, AWK Scripting and Java development platform. In our application the user behavior characteristics are modeled using the following data values:
TABLE I: ASSUMPTIONS DURING SIMULATIONS

B. Simulation #2 The second test involved a method to test the load balancing ability of the proposed model. The simulations result is shown below via the Graph (in Figure 4). Here, the red line represents the actual load thats applied on the cloud, whereas the green line represents the balanced load of the system.

Data

Value 8+-2 a.m. 8+-1 (in hours) 21 (in hours) 30% (each stage) up to 90%

Number of Computing Elements

5 (Heterogeneous 2+2+1) 600C (arbitrary)

1 (arbitrarily chosen) Figure 4: Load Balancing Test

The descriptions of simulations along with their corresponding results are written below. A. Simulation #1 The first test that was performed via simulation was to test the load predicting ability of the proposed model. For this we simulated a processor which in the Graph (Figure 3) is represented by a red line. We then ran our algorithm simultaneously to predict the load, which in the Graph (Figure 3) is represented by a green line. A detailed analysis of the graph gave a positive result with respect to the ability of the model to predict the load. We found that the initial latency of the model to predict the load jump of the first 100% spike was about 1.7 seconds and we observed that at a higher load the prediction is usually higher by about 6%, which we felt was a good observation. On the other hand, the downward fall was predicted at a very fast pace (perhaps, due to its affinity to sense the fatigue levels). It could predict a downward fall of about 18.67% in about 0.5 seconds.

A detailed analysis of the graph shows us that the result has been positive. We observed that the system did not allow the load to cross the 85% mark, even though the actual load applied was at about 98%. It was also observed that the load was brought down considerably at very high loads (by about 40% when actual load is about 98%), but, when the load is low the balancing capability reduces to less than 5% (perhaps, due to the fact that the model basically concentrates at high load systems and because of the fact that we have capped the overload buffer to around 90%. C. Simulation #3 The final simulation was to test how the system behaves as integration of elements. The results of this test are depicted in the Graph (show in Figure 5). The basic testing criterion here was not a normal one. In order to test the integral performance along with the flexibility, we decided to simulate three systems and run two of them(shown as red and green line respectively in Figure 5) in such a way that they are on from t=0 to t=10 and after t=10 they are 26

2010 ACEEE DOI: 01.IJNS.01.03.85

ACEEE Int. J. on Network Security, Vol. 01, No. 03, Dec 2010

switched-off (i.e. removed from the cloud ) and the third one is idle in the interval [0,10] and then after 10 we load the third(shown as a blue line in the Figure 5) to work at 100\% load.

more easily. The simulation results are in good agreement with this model. However, it is very important for us to test the system as a whole in a real world model. This would perhaps need a slightly different approach, more so, because of the bystander behavior [10] of the cognitive elements in the real world. It is also important for us to extrapolate this model in terms of more heterogeneity, more load variance and model it to provide a better transient behavior in balancing loads. This would form a major cluster of work that could be done in times to come. REFERENCES
[1] Ian Foster (Editor), Carl Kesselman, The Grid: Blueprint for a New Computing Infrastructure, The Elsevier Series in Grid Computing, Morgan Kaufman, 2005. [2] S. Zhou, et al., A trace-driven simulation study of dynamic load balancing, IEEE Transactions on Software Engineering, 14(9): pp.1327-1341, Sept. 1988. [3] Xbit Labs, X-bit labs Investigation: Influence of Intel Pentium 4 Core Temperature on CPU Performance, Available at http://www.xbitlabs.com/articles/cpu/display /p4-temp.html. [4] Shivaratri NG, Krueger P, Singhal M. Load distributing for locally distributed systems, ComputerMagazine, 1992; pp. 33-44. [5] Zomaya, A.Y. and Yee-Hwei Teh, Observations on using genetic algorithms for dynamic load-balancing, IEEE Transactions on Parallel and Distributed Systems, Volume 12, Issue 9, Sept. 2001 Page(s):899-911. [6] Tony Bourke, Server Load Balancing, O'Reilly, ISBN 0-59600050-2 [7] University of Paderborn, Dynamic Load Balancing and Scheduling. Available at http://wwwcs.unipaderborn.de/cs/ag-monien/RESEARCH/LOADBAL/ [8] SocketPro, Article on Dynamic Load Balancing across Many Real Servers with Disaster Recovery. Available at http://www.udaparts.com/document/Tutorial/TutorialFour.ht m. [9] Domenico Ferrari and Songnian Zhou, An Empirical Investigation of Load Indices For Load Balancing Applications, The 12th International Symposium. On Computer Performance Modeling, Measurement, and Evaluation, Page(s) 515-528. [10] Peter Prevos, Explanation Models for the Bystander Effect in Helping Behaviour, Monash University, Victoria, Australia, Unpublished.

Figure 5: Integration and Flexibility testing

The result was totally in-sync with our expectation. A careful analysis of the graph (Figure 5) shows that even though the third system (blue line) is idle in the interval [0,10], it still shows a positive value of load in it (Perhaps, the load was distributed to it to balance the load in the first system(depicted by a red line) and after 10th second we observed that 3rd system continues to work at 100% showing lack of balancing.
CONCLUSIONS AND FUTURE WORKS

After a series of simulations over the proposed model one can notice that the amount of resources that is available for a cloud is not exactly a function of the Gaussian predictability. Although the shape of the behavior may seem similarly Gaussian, it is clear that its not congruent, and its dependent not only on a individual system but also on the behavioral aspect of it. So, it is important to model them along the lines of a behavior model rather than purely statistical models. It was observed that fatigue and behavior prediction played a vital role in this models success on simulation tests, so, even though the user has different subjective parameters it is very important to capture the key cognitive aspects of the user and model it to the cloud as a system of cognitive users, which in turn would allow us to predict the future state of the system

2010 ACEEE DOI: 01.IJNS.01.03.85

27

Вам также может понравиться