Вы находитесь на странице: 1из 12

Technical White Paper

_______________________________________________ In-Memory Computing/In-Memory Data Management- A foundation to SAP HANA _______________________________________________

Harun-Ul-Rasheed Shaik SAP BI IBM GBS July 2013

Abstract: This paper discusses about how In-Memory technology changes the
present environment of Enterprise Applications. It discusses about Hype Cycle analysis for the companies who are interested to Implement In-Memory now or seriously thinking to make use of In-Memory Computing in near future. It emphasizes on data volume growth which effects operational and analytical applications, evolution of InMemory data management in detail with detailed analysis on Multi core processors, Memory, Parallel and distributed computing. It explains the benefits of merging Analytical and operational applications and also concentrates on benefits and limitations of In-Memory technology. Finally concludes with how SAP has started its journey in In-Memory by the development of proprietary solution SAP HANA for its vast Business Customer base.

Introduction:
For all Computer science Researchers and IT Professionals, by hearing/ seeing word Computing will be lured into the concepts behind it and attracts their utmost attention. With the maturity of Computing Paradigm, in recent times people use Parallel and Distributed Computing, Grid Computing, Mobile Computing, High Performance Computing/High Availability Computing, Cluster Computing, Cloud Computing and InMemory Computing. The technological advances are driving us to get into new realms for better usage of power of Computing such as Mobile, Business Analytics or Cloud environments. The Information technology (IT) in last 40 years has come up with lot of inventions, made lot of progress towards easing the core functionalities of Business Processes in daily operations of the companies. Mostly, running a Business with IT by leveraging ITs Potential to the fullest is being tried by both IT Researchers and Business Leaders by looking into the real time problems and their solutions by working collaboratively. Conjunction of Business and IT gave rich dividends such as avoiding repetitive tasks and visibility of changes in a firm instantaneously to the higher level management. The data in enterprise is widely distributed across the different applications which are used in heterogeneous systems. The expansion of using new applications and adding new landscapes is a continuous process all the time in an Organization which can be analogous to the expansion of city with new buildings, bridges and communities. To cope up with the information needed to make decisions in Business most of the companies are using DSS (Decision Support Systems). From last few years most of the companies/IT experts are using term Business Intelligence which helps the firm to make decisions with the available data. Currently market uses synonymous term Business Analytics more often in place of Business Intelligence. Data is the key for making decisions, due to constraints in DBMS, there are OLTP (On Line Transactional Processing) and OLAP (On Line Analytical Processing) systems. Higher management most of the times needs reports to run the businesses which are coming from BI systems. BI professionals always give the data to run the reports based on yesterdays (previous days) data most of the times, the reason behind it is data needs to be loaded in ware houses where cleansing and aggregation is must before running the required reports. Adhoc reporting and Up-to-current time data is highly impossible because data which is available in OLTP systems need to be transferred to OLAP systems for reporting. New data available in system should be available to user as it is entered and also give a flexibility to analyze it by the user irrespective of the volume and complexity.

To achieve this, concept of In-Memory Computing (IMC) has developed on the basis of In-Memory Database (IMDB).Analyzing the data in blink of an eye is core objective of IMC.

Market Drivers:

Business Dynamics of In-Memory Computing:


In-memory computing is about storing the database of record in main memory, not on disk. Driven by fast hardware evolution and maturing software technologies, IMC is quickly moving toward mainstream adoption- Gartner, Inc, Massimo Pezzini , Roxane Edjlali and Donald Feinberg. Businesses are now a days thinking about simplification of Business Analytics in the firms by using current developments in technology for better analysis of huge operational data.

Hype Cycle Analysis:


To know about IMC current trend and dynamics in industry one should analyze Hype Cycle, based on the outside analysis of market and internal evaluation Organization can take a decision on what should be done for the best interests of the company. Hype Cycle 2012 by Gartner shown in Fig 1.0 put IMC in Trough of Disillusionment.

Sliding Into the Trough Points to be noted


Repetition of early success stories Failures concentration on inappropriate uses of the technology Media projects challenges faced by the technology rather than the opportunities created by it Discredit is given to Technology There will be no drop in adoption

Sliding Into the Trough- Indicators


Start of Supplier consolidation Suppliers use successful adopters case studies and references

Key Takeaway from Hype Cycle of IMC:


Industry experts should not carry away with Hype Cycle trap -Giving up too soon at the phase of Trough of Disillusionment. Firms who have road map to implement IMC should seriously look into the successful stories in current market to fulfill their vision.

Figure 1.0: Emerging Technologies Hype Cycle 2012 Source:Gartner As per Gartner, The rapid maturation of application infrastructure technologies and a continued dramatic decline in the cost of semiconductor technologies are paving the way for mainstream use of in-memory computing (IMC). It is predicted that although the in-memory data grid (IMDG)* market, a key IMC segment, is small, it is likely to grow fast and to reach $1 billion by 2016 - Gartner, Inc, Egham, UK, April 3, 2013 .

Data Volume growth and Business needs:


Enterprise Resource Planning (ERP) systems were developed in mid 60s to maintain the assets/materials and resources of an organization effectively. Presently, ERP systems are inundated with the data particularly after a technological shift of exposing systems to web by incorporating web servers during early 2000s .Different modules data such as Financials and Manufacturing, Human Resources, Supply chain, Customer relationship Management and Business Intelligence in industries like retail, manufacturing, utilities, and energy is growing exponentially. Information at fingertips was coined by Bill gates in 1994; subsequently the vision has become truth in software industry so as ERP world. Enterprise Applications consists of Persistence Layer, Application layer and Presentation layers. In SAP, DBMS is in Persistence layer, Business logic is attached to Application servers in Application layer and presentation layer will be having Normal clients/Web Client/Mobile Clients. The objective of SAP R/3 is to switch as much as load possible from Database server to number of Application servers. To optimize the performance Application server caches are used, using this concept of caches will ensure that every time database is not accessed. The results needed are put into Application server cache for quick response. But this approach at later point of time yielded into higher loads on Application servers.

Business Challenges and Technical limitations:


Performance:
An Organization generally has transaction data and Analytical data with huge volumes. Processing of the total data and executing required business reports will have significant impact of performance. The requirement for ad-hoc queries and business reports to run daily operations in BI needs lot of tasks on data in terms of read operations and aggregations which is totally different when you compare with traditional data bases.

Not Meeting Sub-Second response and Decision making:


To overcome performance bottleneck, the data was separated in two systems. One is Transactional system (OLTP) and other one is Analytical system (OLAP).The division of data into OLTP and OLAP has many disadvantages such as the time gap between data entered in Operational system to the data availability of data in reports in Analytical system spans between hours to days. Applications which need Operational and Analytical processing like demand planning, dunning and sales order processing & analysis will not be available for sub-second response and decision making will be effected by the time gap.

Loss of granularity because of Aggregation:


In Enterprise Analytical applications data will undergo additional processing of complex aggregations, most of the times it must be prepared to cater the user requirements when data comes from different source systems. A subset of total data with summarization is only being available to users for their analysis which in turn lands up into losing of its granularity.

Evolution of In-Memory Computing/In-Memory Data Management: Drivers in Technology:


Multi-Core Processors- Integrated Circuit with Multiple CPUs on a single Chip. Main Memory Column-oriented data storage

Moores Law: Intel Co-founder Gordon E .Moores Law, States that the number of
transistors on a chip will double approximately every two years. Interesting interpretation of the law is, it also holds good for processing power (performance of Central Processing Units (CPUs), in other words apart from transistors the processing power will also be doubled. In recent decades, faster transistors increased processor clock speed, the number of transistors per CPU per square meter are also increased rapidly. A processor with number of transistors now we have when compared with that of 1970s has grown exponentially, the achievement made in this area is, it is the same processor with billions of transistors available with the same price. Front Side Bus (FSB) speed and Clock speed after exponential growth have been stagnated. Refer below figure 1.0 on Clock speed, FSB speed and Transistor development.

Figure 1.1: Clock speed, FSB Speed and Transistor development Source: Plattner/Zeier, In-Memory Data Management In 2005, Multi-Core processors revolution has started, please refer below Figure 1.2 on development of number of cores.

Figure 1.2: Development of Number of cores Source: Plattner/Zeier, In-Memory Data Management Intel has come up with Hyper threading technology in 2002, computations can be done in parallel on the single processor. This can be applicable to both single-core and also to multi-core processors.

Parallel and Distributed Computing: Distributed Computing: Known as Loosely coupled architecture with number of
processors put in a network. The communication between processors is through Message Passing, Every component has its own memory (local memory) and Global memory called Distributed Shared Memory.

Parallel Computing: Tightly coupled processors access shared memory across to


exchange information via the common bus or on-chip links. Every component has its Cache memory, it doesnt always access Main memory instead uses cache memory most of the times. Lot of research has been going on to optimize cache memory until now. In- Memory Computing uses Parallel computing paradigm where the server has many processors and shares common memory.

Parallel processing using Multi-Core Processors: ERPs database grows with


time and also number of users will go up as time passes on. Around 2002, there is a condition of clock speed is stagnated and industry prefers to increase number of cores per CPUs to maximize the performance. Gene Amdahl, once an employee of IBM, with his widely accepted Amdahls law stated that there is a limitation of parallel computing. If C is the proportion of program that is sequential and (1-C) is proportion that can use parallelization then the maximum speedup that can be achieved by using P processors is Speedup S (P) = 1/(C + (1-C)/P)

Case Study:
a) 90% proportion is parallelized, 10% is sequential, then maximum speed up can be achieved is on 10 processors is 1/ (0.1+ (1-0.1)/10) is 5.26 which is around 5.3.i,e 5.3 x faster than on 10 processors when compared with single processor. b) 90% proportion is parallelized, 10% is sequential, then maximum speed up can be achieved is on 20 processors is 1/ (0.1+ (1-0.1)/20) is 6.89 which is around 6.9.i, e if we double hardware components, we can only increase speedup by 30%. In Hypothetical conditions if we can use 2000 processors in future then c) If 90% proportion is parallelized, 10% is sequential then the maximum speedup on 2000 processors is 1/ (0.1+ (1-0.1)/2000) is 9.95 which is around 10.

No matter how large we use number of processors, we can get maximum speed up factor of 10.So parallel computing can be best suited for small number of processors or high values of proportion of program made parallel.

Main Memory is All:


Referring to his 1936 publication, British mathematician, Pioneer of Theory of Computation Alan Turing wrote that ...an unlimited memory capacity obtained in the form of an infinite tape marked out into squares Unlimited memory cant be achieved with tape or disk but now days we can make use of huge memory (about 100TB) with the developments in Solid-state technology. Current Relational Database Management Systems (RDBMS)s generally retrieve data from disk which will degrade the performance desired and in turn queries take more time to execute. Caching is being used in recent times for faster access but for large amounts of data disk read is mandatory. "The relentless declines in DRAM and NAND flash memory prices, the advent of solidstate drive technology and the maturation of specific software platforms have enabled IMC to become more affordable and impactful for IT organizations," said Massimo Pezzini, vice president and Gartner Fellow. Dynamic Random Access Memory(DRAM) and Negated AND or NOT AND(NAND) prices become so cheap that Firms are thinking about In-Memory database concept of using Main memory as primary storage location, It is proven that we can get 4X of speed or much more than that with Main memory if we compare it with any disk in usage.

CPU and Main Memory Functionality:


By Fig 1.1, we know about FSB speed, latest trend in market is to integrate memory controller to the processor, Hyper Transport protocol developed by AMD in 2001, to compete with AMD,Intel has come up with Quick Path Interconnect (QPI) which replaces FSB with a point to point interconnect for memory and multiple cores. Concept of Non Uniform Access Memory (NUMA) is about faster access of processor on its local memory compared to non local memory belongs to another processor (local memory for that processor) provided same memory is being shared by processors. In Cache-Coherent NUMA (ccNUMA) systems, coherency is maintained by a protocol implemented in hardware, and available memory is being shared by all CPU caches.Prefetching concept mitigates the local versus remote memory.

Column-oriented data storage:


Using more cores is not like panacea to all the problems we have, there are problems like gap between clock speed of CPUs and memory access speed. The concept of Column-oriented storage which is used in In-Memory databases is well suited as it gives operator parallelism which is discussed in the above section Parallel processing using Multi-Core Processors.

Merging OLTP and OLAP in Business by IMC:


Going back to basics, current development in computing is to club both OLTP (Transactional) and OLAP (Analytical) into a single system. In Memory database will be used for total ERP data. Management Information Systems (MIS) can leverage the single database for up-to-the sec data availability through the real time OLTP and OLAP systems to make efficient decisions.

IMC Technology Benefits:


Sub-second Response Time:
Fast response from Enterprise applications on the queries on data residing on single InMemory database will be an added advantage for the business processes like dunning, planning and forecasting. Higher management can do the decisions based on the ERP data available in the Enterprise.

Improving Database performance: By using single In-Memory database and


replacing traditional database to optimize performance by 100 X times or more than that.

Decommissioning Application Servers: With In-Memory database, the Business


logic is with Database it self, so we can eliminate Application servers. This gives advantages of good performance and avoids network related issues.

IMC Technology Limitations:


Security Concerns: When huge data is exposed to end users there are chances of
data deletion or wrong updates or loss of data. Utmost care has to be taken to maintain data with proper authorizations while giving access to the users.

Cost factors: Mostly it suits for large enterprises where massive data is present, there
is a need of high speed real-time reporting and who can afford proportionate price tag.

SAP HANA (High Performance Analytic Appliance):


SAP HANA is bundle of TREX (Text Retrieval Extraction), P*Time and MaxDB. To cater needs of Point of Sales (POS) of a FMCG company with many Retailers which has around 465 billion data records, SAP has build a system with 10 blades each having 0.5 TB Memory and 32 Processors with total hardware capacity of 5 TB memory and 320 processors. Cost of the system built is around half a million dollars. The pilot project was a great success in handling most complicated query in 60 sec in real-time reporting. Based on the real time In-Memory Application built to handle huge data of a POS system, SAP has developed its proprietary In-Memory database in 2010 ending and named it as SAP HANA (High Performance Analytic Appliance) and unveiled in SAP Tech Ed, lot of concentration was given on SAP HANA by SAP in SAP Influencer Summit 2010. Since then SAP has made a continuous progress and improvement on SAP HANA, recently released SAP HANA 1.0 SPS6 on June 28, 2013. As per SAP Chairman Hasso Plattner, HANA IMC Architecture is based on 4 distinct pillars. Multi-Core computing In-Memory- Main Memory Column and Row Store Insert-Only

HANA can be non disruptively attached to existing ECC, BW and BOBJ systems providing a chance of using its benefits in current landscapes. SAP has also progressed in handling data which comes from Non-SAP systems using HANA. SAP, in SAPPHIRE2013 event has announced about seamless integration of Cloud, Mobility and HANA in its Business Suite used by customers. Applications on HANA is latest term used by Dr. Hasso Plattener in his keynote speech at SAPPHIRE 2013

Conclusion:
Paradigm shift in memory management is one of the key elements of IMC which will set a new trend in enterprise business computing with quick decision making and better productivity to the end user. Large In-memory databases will change present computing power considerably and will also be the foundation for best real time analytics on raw operational data. With IMC technology, Enterprises will get best benefits such as not using Batch programs, latency between OLTP and OLAP systems will be no more, complex computational real-time analysis is done on business in sub-second response time. Single source to the business by combining OLTP and OLAP is remarkable achievement of IMC. IMC can potentially change the enterprise software as a whole. SAP, as ERP market leader has developed its proprietary solution SAP HANA based on In-Memory technology to give maximum benefits to its huge customer base. The number of Implementations of SAP HANA has been increasing steadily from last 2 years. Many applications are developed on HANA, tight integration of Mobility, Cloud and HANA is the current direction of development where SAP is progressing towards.

Вам также может понравиться