Вы находитесь на странице: 1из 14

This is Google's cache of http://doc.mapr.com/display/MapR/Hive+ODBC+Connector. It is a snapshot of the page as it appeared on 13 Nov 2013 15:32:57 GMT.

The current page could have changed in the meantime. Learn more Tip: To quickly find your search term on this page, press Ctrl+F or -F (Mac) and use the find bar. Text-only version

Spaces Browse o Pages o Blog o Labels o Space Operations

Search

Quick Search

Help
o o o o o o

Online Help Keyboard Shortcuts Feed Builder Whats new Available Gadgets About Scroll Versions

Log In

Tools
o o o o o o

Attachments (5) Page History Restrictions Page Information Link to this Page View in Hierarchy

o o o

View Source Export to PDF Export to Word

1.Latest Documentation 2.Home 3.Development Guide 4.Working with Hive 5.Hive ODBC Connector Hive ODBC Connector
Skip to end of metadata

Page restrictions apply Attachments:5 Added by Peter Conrad, last edited by Peter Conrad on Sep 24, 2013 show comment

Comment: Published by Scroll Versions from space trunk and version 3.0.1 Go to start of metadata This page contains details about setting up and using the ODBC Connector for Hive. This page contains the following topics:

Before You Begin The SQL Connector Software and Hardware Requirements Installation and Configuration Configuring SSL on a DSN Configuring DSN-less Authentication SQLPrepare Optimization Notes Data Types HiveQL Notes

Notes on Applications Microsoft Access Microsoft Excel/Query Tableau Desktop

Before You Begin


The MapR Hive ODBC Connector is an ODBC driver for Apache Hive 0.7.0 and later that complies with the ODBC 3.52 specification. To use the ODBC driver, configure a Data Source Name (DSN), a definition that specifies how to connect to Hive. DSNs are typically managed by the operating system and may be used by multiple applications. Some applications do not use DSNs. You will need to refer to your particular applications documentation to understand how it connects using ODBC. The standard query language for ODBC is SQL. HiveQL, the standard query language for Hive, includes a subset of ANSI SQL-92. Applications that connect to Hive using ODBC may need queries altered if the queries use SQL features that are not present in Hive. Applications that use SQL will recognize HiveQL, but might not provide access to HiveQL-specific features such as multi-table insert. Please refer to the HiveQL wiki for up-to-date information on HiveQL.

The SQL Connector


The SQL Connector feature translates standard SQL-92 queries into equivalent HiveQL queries. The SQL Connector performs syntactical translations and structural transformations. For example:

Quoted Identifiers: When quoting identifiers, HiveQL uses back quotes ( `), while SQL uses double quotes ("). Even when a driver reports the back quote as the quote character, some applications still generate double-quoted identifiers. Table Aliases: HiveQL does not support the AS keyword between a table reference and its alias. The JOIN, INNER JOIN, and CROSS JOIN SQL syntaxes are translated to the HiveQL JOIN syntax. SQL TOP N queries are transformed to HiveQL LIMIT queries.

Software and Hardware Requirements


To use MapR Hive ODBC Connector on Windows requires:

Windows 7 Professional or Windows 2008 R2. Both 32 and 64-bit editions are supported. The Microsoft Visual C++ 2010 Redistributable Package (runtimes required to run applications developed with Visual C++ on a computer that does not have Visual C++ 2010 installed.) A Hadoop cluster with the Hive service installed and running. You should find out from the cluster administrator the hostname or IP address for the Hive service and the port that the service is running on. (The default port for Hive is 10000.)

Installation and Configuration


There are versions of the connector for 32-bit and 64-bit applications. The 64-bit version of the connector works only with 64-bit DSNs; the 32-bit connector works only with 32-bit DSNs. Because 64-bit Windows machines can run both 64-bit and 32-bit applications, install both versions of the connector in order to set up DSNs to work with both types of applications. If both the 32-bit connector and the 64-bit connector are installed, you must configure DSNs for each independently, in their separate Data Source Administrators.
To install the Hive ODBC Connector:

1. Run the installer to get started: o To install the 64-bit connector, download and run http://package.mapr.com/tools/MapRODBC/MapR_odbc_2.1.0_x64.exe. o To install the 32-bit connector, download and run http://package.mapr.com/tools/MapRODBC/MapR_odbc_2.1.0_x86.exe. 2. Perform the following steps, clicking Next after each: 1. Accept the license agreement. 2. Select an installation folder. 3. On the Information window, click Next. 4. On the Completing... window, click Finish. 5. Install a DSN corresponding to your Hive server.
To create a Data Source Name (DSN)

1. Open the Data Source Administrator from the Start menu. Example: Start > MapR Hive ODBC Driver 2.0 > 64-Bit ODBC Driver Manager

2. On the User DSN tab click Add to open the Create New Data Source dialog.

3. Select MapR Hive ODBC Connector and click Finish to open the Hive ODBC Driver DSN Setup window.

4. Enter the connection information for the Hive instance:

o o o o o o o

Data Source Name Specify a name for the DSN. Description Enter an optional description for the DSN. Host Enter the hostname or IP of the server running HiveServer1 or HiveServer2. Port Enter the listening port for the Hive service. Database Leave as default to connect to the default Hive database, or enter a specific database name. Hive Server Type: Set to HiveServer1 or HiveServer2. Authentication If you are using HiveServer2, set the following. Mechanism: Set to the authentication mechanism you're using. The MapR ODBC driver supports user name, user name and password, and username and password over SSL authentication. User Name: Set the user to run queries as. Password: The user's password, if your selected authentication mechanism requires one.

5. Click Test to test the connection.

6. When you're sure the connection works, click Finish. Your new connector will appear in the User Data Sources list.

Configuring SSL on a DSN Select the DSN from the ODBC Data Source Administrator Window, then click Configure to display the Setup dialog. From the Setup dialog, click Advanced Options... to display the Advanced Options dialog.

In the SSL pane, click the box next to Allow Common Name Host Name Mismatch to control whether the driver allows the common name of a CA issued certificate to not match the host name of the Hive server. For self-signed certificates, the driver always allow common name of the certificate to not match the host name. If you wish to specify a local trusted certificates file, click Browse next to the Trusted Certificates field and navigate to the location of your cacerts.pem file. The default setting uses the trusted CA certificates PEM file that is installed with the driver.

Icon The driver always accepts a self-signed SSL certificate.

Advanced Options

Select the Use Native Query checkbox to disable the SQL Connector feature. The SQL Connector feature has been added to the driver to apply transformations to the queries emitted by an application to convert them into an equivalent form in HiveQL. If the application is Hive aware and already emits HiveQL then turning off the SQL Connector feature avoids the extra overhead of query transformation. Select the Fast SQLPrepare checkbox to defer query execution to SQLExecute. When using Native Query mode, the driver will execute the HiveQL query to retrieve the result set metadata for SQLPrepare. As a result, SQLPrepare might be slow. Enable this option if the result set metadata is not required after calling SQLPrepare. In the Rows Fetched Per Block field, type the number of rows to be fetched per block. Any positive 32-bit integer is valid. Performance gains are marginal beyond the default value of 10000 rows. In the Default String Column Length field, type the default String column length to use. Hive does not provide the length for String columns in its column metadata. This option allows you to tune the length ofString columns. In the Decimal Column Scale field, type the maximum number of digits to the right of the decimal point for numeric data types. To allow the common name of a CA issued SSL certificate to not matchthe hostname of the Hive server, select the Allow Common Name Hostname Mismatch checkbox. This setting is only applicable to User Name and Password (SSL) authentication mechanism and will ignored by other authentication mechanisms. Enter the path of the file containing the trusted certificates in the Trusted Certificates edit box to configure the driver to load the certificates from the specified file to authenticate the Hive server when using SSL. This is only applicable to User Name and Password (SSL) authentication mechanisms and will be ignored by other authentication mechanisms. If this setting is not set the driver will default to using the trusted CA certificates PEM file installed by the driver. To create a server-side property, click the Add button, then type appropriate values in the Key and Value fields, and then click OK. Click the Edit button to alter an existing property or Remove to delete a property. Icon Type set -v at the Hive CLI command line or in Beeline to display a list of the Hadoop and Hive server-side properties that your implementation supports.

If you selected Hive Server 2 as the Hive server type, then select or clear the Apply Server Side Properties with Queries check box as needed. If you selected Hive Server 2, then the Apply Server Side Properties with Queries check box is selected by default. Selecting the check box configures the driver to apply each server-side property you set by executing a query when opening a session to the Hive server. Clearing the check box configures the driver to use a more efficient method to apply server-side properties that does not involve additional network round tripping. Some Hive Server 2 builds are not compatible with the more efficient method. If the server-side properties you set do not take effect when the check box is clear, then select the check box. If you selected Hive Server 1 as the Hive server type, then the Apply Server Side Properties with Queries check box is selected and unavailable.

Configuring DSN-less Authentication Some client applications, such as Tableau, provide some support for connecting to a data source using a driver without a DSN. Applications that connect using ODBC data sources work with Hive Server 2 by sending the appropriate authentication credentials defined in the data source. Applications that are Hive Server 1 aware but not Hive Server 2 aware and that connect using a DSN-less connection will not have a facility for sending authentication credentials to Hive Server 2. You can configure the ODBC driver with authentication credentials using the Driver Configuration tool. Icon Credentials defined in a data source take precedence over credentials configured using the Driver Configuration tool. Credentials configured using the Driver Configuration tool apply for all connections made using a DSN-less connection unless the client application is Hive Server 2 aware and requests credentials from the user.
To configure driver authentication for a DSN-less connection:

1. Launch the Driver Configuration program from the Start menu. 2. Select a Hive Server Type from the drop-down. 3. Select an authentication mechanism from the drop-down, then configure any required fields as suited to that mechanism. 4. (optional) Click Advanced and configure any desired advanced options. Icon

The MapR ODBC driver only supports the User Name, User Name and Password, and User Name and Password (SSL) authentication mechanisms.

SQLPrepare Optimization
The connector currently uses query execution to determine the result-sets metadata for SQLPrepare. The down side of this is that SQLPrepare is slow because query execution tends to be slow. You can configure the connector to speed up SQLPrepare if you do not need the result-sets metadata. To change the behavior for SQLPrepare, create a String value NOPSQLPrepare under your DSN. If the value is set to a nonzero value, SQLPrepare will not use query execution to derive the result-sets metadata. If this registry entry is not defined, the default value is 0.

Notes
Data Types
The following data types are supported:
Type Description TINYINT 1-byte integer SMALLINT 2- byte integer INT 4-byte integer BIGINT 8-byte integer FLOAT Single-precision floating-point number DOUBLE Double-precision floating-point number DECIMAL Decimal numbers BOOLEAN True/false value STRING Sequence of characters TIMESTAMP Date and time value

Not yet supported:

The aggregate types (ARRAY, MAP, and STRUCT)

HiveQL Notes
CAST Function

HiveQL doesnt support the CONVERT function; it uses the CAST function to perform type conversion. Example:
CAST (<expression> AS <type>)

Using CAST in HiveQL:

Use the HiveQL names for the eight data types supported by Hive in the CAST expression. For example, to convert 1.0 to an integer, use CAST (1.0 AS INT) rather than CAST (1.0 AS SQL_INTEGER). Hive does not do a range check during CAST operations. For example, CAST (1000000 AS SQL_TINYINT) returns a TINYINT of value 64, rather than the expected error. Unlike SQL, Hive returns null instead of an error if it fails to convert the data. For example, CAST (STRING AS INT) returns null.

Using CAST with BOOLEAN values:


The boolean value TRUE converts to the numeric value 1 The boolean value FALSE converts to the numeric value 0 The numeric value 0 converts to the boolean value FALSE; any other number converts to TRUE The empty string converts to the boolean value FALSE; any other string converts to TRUE

The HiveQL STRING type stores text strings, and corresponds to the SQL_LONGVARCHAR data type. The CAST operation successfully converts strings to numbers if the strings contain only numeric characters; otherwise the conversion fails. You can tune the column length used for STRING columns. To change the default length reported for STRING columns, add the registry entry DefaultStringColumnLength under your DSN and specify a value. If this registry entry is not defined, the default length of 1024 characters is used.
Delimiters

The connector uses Thrift to connect to the Hive server. Hive returns the result set of a HiveQL query as newline-delimited rows whose fields are tab-delimited. Hive currently does not escape any tab character in the field. Make sure to escape any tab or newline characters in the Hive data, indlucing platform-specific newline character sequences such as line-feed (LF) for UNIX/Linux/Mac OS X/etc, carriage return/linefeed (CR/LF) for Windows, and carriage return (CR) for older Macintosh platforms.

Notes on Applications
Microsoft Access
Version tested "2010" (=14.0), 32 and 64-bit. Notes Linked table is not available currently.

Microsoft Excel/Query
Version "2010" (=14.0), 32 and 64-bit. tested From the Data ribbon, use From Other Sources and select either From Data Connection Wizard or From Microsoft Query. The former requires a pre-defined Notes DSN while the latter supports creating a DSN on the fly. You can use the ODBC driver via the OLE DB for ODBC Driver bridge.

Tableau Desktop
Version tested Notes 7.0, 32-bit only. Works with v1 of the ODBC driver only. Prior to version 7.0.n, you will need to install a TDC to maximize the capability of the driver. From version 7.0.n onward, you can specify the driver via the MapR Hadoop Hive option from the Connect to Data tab.

No labels

Powered by Scroll Content Management Add-ons for Atlassian Confluence | 2.1.1 Powered by Scroll Content Management Add-ons for Atlassian Confluence.

Вам также может понравиться