Вы находитесь на странице: 1из 2

SMP vs.

MPP for EDW


Neoview whitepaper

The purpose of this document is to help the field EAM or CBM to recognize when Neoview should be sold in response to prospect requirements and the competition involved. Hewlett Packard is and has been a significant player in the data warehousing/Business Intelligence market. HP has installed more servers than any other vendor that are running DBMS, BI analytics, BI reporting or BI visualization. All of the data warehouse servers have been based on the SMP architecture, running merchant DBMS systems like Oracle, SQL Server, DB2, RDBMS and others. So why has HP decided to bring to market a new data warehouse platform, based on a different architecture (MPP) instead of extending our SMP-based product line? The simple answer is that there is an emerging market within the data warehousing space that has been dominated by MPP systems and HP wants to win in that market. This in no way detracts from our leadership, continuing investment and strategy for SMP-based systems in all of the other markets for data warehousing.

So why MPP for an Enterprise Data Warehouse?


Although SMP systems are used for very broad range of data warehouse/data mart solutions, MPP systems are needed for a growing number of uniquely challenging EDW initiatives. As the tools for BI development have evolved to allow business managers to create their own queries, and to empower corporations to delve into billions of rows of detailed data, the number of and magnitude of queries running on many Enterprise Data Warehouses has pushed the limits of the SMP architecture. The very large number of users and the amount of data to be moved about in a single massive EDW for a fortune 1000 company has, in some cases, exceeded the ability of the shared components, in an SMP system to deliver acceptable response time to ad hoc queries (no indexes or aggregations). A 100 node MPP system does not have any shared resources, so the problem is broken into smaller pieces that utilize different CPUs, Memories, I/O subsystems and disk drives until the final merge of the output for the user.

Why MPP for the EDW and gigantic data mart solutions
Giant data mart - Major corporations want to mine detailed data, down to the transaction level, to discover important information about customers, fraud, supply chain efficiency, Network behavior, vendor performance, and other massive data stores. This requires ad hoc processing of 10s or 100s of billions of rows of data (without the help of indexes or aggregations). Some examples of this are Retail market basket analysis a row in the POS table for every line item purchased at every store by every customer over a multi year period. Telco CDR analysis 100 Billion rows in the call detail record table for a tier 1 telco for 1 years worth of calls eRetail click stream analysis. One row in the DBMS for every click by every internet user for several months to a year or two. RFID a row in the DBMS of every movement of every part, sub assembly, and product from the raw material stage, through manufacturing, distribution, inventory and ultimately on store shelves. Financial services compliance every transaction for every operational system for years, accessible via standard queries and ad hoc queries demanded by government auditors.

SMP MPP defined


SMP Symmetric Multi-Processing architecture
This is the most common architecture deployed in IT shops world wide. It evolved from the original single CPU systems and has the identical programming paradigm of those systems. SMP allows growth in power, past that of single CPU system by adding more CPUs that share the system memory, I/O infrastructure, communications hardware, mass storage and all peripherals attached to the system. Solutions written for a single CPU system will run on an SMP system unchanged. Almost all of the major off-the-shelf solutions on the market are based on this architecture (SAP, Oracle financials, PeopleSoft. Etc.)

MPP Massively Parallel Processing architecture


In this architecture a large number of independent processing modules (each contain their own - CPU, Memory, I/O, OS and middleware) are loosely coupled. The independent modules each run internal code that works with other modules to create a totally virtualized environment. (Distributed OS, file system, DBMS and pool of data and peripherals). All applications see a single system image, no matter how many independent modules are in play. The key to this architecture is the massively parallel system software, written by the vendor, to take advantage of all of the hardware in parallel, while hiding any complexity normally associated with clusters. A very high speed/high capacity internal system interconnect between modules is also necessary. Applications that are designed to optimize the performance of SMP systems, do not optimize the performance of MPP systems. Trying to get CPUs on different nodes of an MPP system to share memory across the systems node to node interconnect doesnt get the increase in performance of multiple CPUs in the same SMP system sharing the same memory. So, there are far fewer off-the-shelf applications running on the MPP architecture than SMP. MPP vendors do provide transparent execution of the same SQL statements created for SMP systems. The MPP system has a parallel query optimizer and parallel execution environment to take those queries and take advantage of all of the parallel hardware in the system.

SMP problems with giant data marts


SMP systems that try to scan these enormous, multi-billion row tables, end up swamping the shared backplane, shared memory and sometimes the I/O system. This can result in queries that take days to execute. So, many companies try to work around the massive scanning problem by building indexes and/or aggregations for every possible kind of query they can think of. This would speed up the queries, at the expense of losing information via summarization. Also the DBA effort is enormous to keep this type of complexity tuned and retuned as new queries are needed, new subject areas are added to the warehouse and more hardware is attached to the system. Finally, these types of systems have millions of updates per 24 hour period. A lengthy and complex nightly batch window has to be used to combine these updates and inserts with the existing data and then rebuild the indexes, aggregates and materialized views.

HP Restricted for internal HP and Partner use only

SMP vs. MPP for EDW Sales Battle Card


In these sites, the complexity, number of DBAs and nightly batch window result in very high people cost, lengthy delays in adding new data warehouse services to end users and sometimes an inability to complete the nightly batch processing by the start of business the next day. Also, companies are now moving to a 24x7 use of the data warehouse to aid in operation decisions, as well as strategic. The nightly batch processing is not compatible with this requirement. Data warehouse appliances are a new breed of MPP-based data systems that represent another threat. They are replacing HP SMP-based systems at key customers for the giant data mart solutions. Companies like Amazon.com Click stream analysis, Verizon CDR etc. ) Netezza is the most successful of these data warehouse appliance vendors with 87 very large corporations using their systems. (Bank of America, Orange UK, Ross stores, Neiman Marcus, AT&T wireless, and others).

EDW
An EDW compounds the problems above with the addition of all data from all subject areas of a corporation. Massive loading must go on 24 hours a day, while a very large number of users are submitting queries. There is no allowance for nightly batch periods to rebuild indexes, aggregations or materialized views. Very complex queries, with up to 25 way joins must be efficiently processed. When we talk to the CIOs of these companies, about their main problems with data warehousing, they are almost unanimous in their frustration from rapidly growing costs (hardware, licenses and DBAs), the unmanageable complexity of their massive data stores, performance and the time it takes to add new services.

Neoview HPs MPP data warehouse


HP Has brought together its best of breed technologies to build a platform to win in the EDW and giant data mart markets. Neoview has all of the capabilities of the high-end Teradata systems, but has the low cost and simplicity of the less capable appliance platforms. To contact the Neoview specialist nearest you, send an email to one of the following Neoview specialist sales managers World wide Americas EMEA APJ Lambert Billet Tom Whitelaw Neil Garnett Michelle Dorn Billet, Lambert Whitelaw, Thomas Garnett, Neil Dorn, Michelle

MPP
MPP - based data warehouse vendors dominate the EDW and giant data warehouse market. Teradata has systems ranging in the 850 of the largest corporations in the world. These central EDWs cost from $1 million to $60 million and pay for them selves within a few years. Teradata is the EDW standard in such companies as Wal-Mart, Bank of America, Fedex, UPS, Continental Airlines, Nationwide Insurance, Royal Bank of Canada, etc.

2007 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Publication #XXX-XXXXXX-EWN

HP Restricted for internal HP and Partner use only

Вам также может понравиться