Text Analysis HANA From SAP

Text Search and Text Analysis
with SAP HANA

created by jagadeesh NUNE on Oct 13, 2014 12:20 PM, last modified by jagadeesh NUNE on Oct 13, 2014 12:20 PM
Version 1
Tweet
Text Search and Text Analysis with SAP HANA:-
Text Analysis:Text Analysis is the process of analyzing unstructured text, extracting relevant information and then transforming
that information into structured information that can be queried and leveraged in different ways.
Hidden facts in Text:80% of enterprise information originates in unstructured data, making this a huge source of
information.Unstructured data provides insights into customers perceptions of brands, products, marketing
campaigns, and the like.Text analysis also enables request extraction, a method used to extract wishes or
requests for improvement from customers.
SEO (Search Engine optimization) Analytics:-
Text Data processing
SAP HANA supports in-database Text Analysis (SPS05).

The main goal of this feature is to extract meaningful information from texts.In other words, companies can now process big
volumes of data sources and extract meaningful information without having to read every single sentence.
With SAP HANAs full text analysis, text analysis goes beyond simple key word searches.
The table shows that SAP HANAs text analysis includes entity extraction, sentiment analysis,
and much more.
Various file formats, such as PDF, TXT, XML, and HTML can be loaded and analyzed in SAP
HANA.
Terminology:
Normalization transforming text into a single canonical form, e.g. rsum -> resume
Tokenization - decompose word sequence, e.g. the quick brown fox -> the quick
brown fox
Stemming reducing words to their base form, e.g. flew or flying -> fly
Part-of-speech tagging e.g. quick -> adjective; houses -> noun - plural
Fuzzy Searching approximate string searching
Text analysis with SAP HANA requires that the unstructured data is of a supported file type
and gets loaded into a HANA table.
Text being loaded into HANA tables is saved in individual rows. These rows are called
documents.
Each document must have an ID.
Configuration:Configuration tells SAP HANA which type of analysis the user wants to do.
They are saved in XML format and contain all the important text analysis options.
Users can access configurations through the HANA repository. There are five predefined configurations.
Loading the PDF documents to SAP HANA
The easiest and quickest way to load binary documents into a HANA table is by using a
Python script. The user can use the same script for multiple documents. The only parameters
that have to be adjusted are:
HANA server connection information
Path of the binary document
Schema/table name
Additional information on data provisioning of binary files into HANA can be retrieved at
academy.saphana.com
The SAP HANA Acadamy provides a how-to video on loading data via python script:
Video available @ https://www.youtube.com/watch?v=CUZcDecMnxI
Unstructured data must be of supported file type and gets loaded in to a HANA table.
Unstructured data is saved in source table. Each file is a separate record and receives an ID. This ID will serve as
a foreign key in the results table.
The user chooses what kind of analysis he wants to perform (e.g. sementic analysis, entity extraction, linguistic
analysis).
The user creates a FULLTEXT INDEX.
Results are saved in a separate table with the prefix $TA_
EXERCISE:
Creating Table for inserting text:CREATE SCHEMA TXT;
CREATE COLUMN TABLE "TXT"."DEMOTABLE"
(ID INTEGER PRIMARY KEY,
STRING nvarchar(200));
INSERT INTO "TXT"."DEMOTABLE" VALUES (1, 'Tom enjoys working at Accenture');
Result Table in HANA
The TA_TYPE column specifies the type of entity extracted. For instance, PERSON usually refers to people.
Sentiments are divided into Positive-, WeakPositive-, StrongPositive-, Negative-, WeakNegative-, and
StrongNegative sentiments.
**************************************************************************************8
SAP HANA Text Analysis

created by
zgn EFE
on Feb 23, 2016 4:12 PM, last modified by
zgn EFE
on Feb 23, 2016 4:19 PM
Version 3
Tweet
Asmanyareaware,twentyfirstcenturycorporationsarefacingacrisis.Manycorporationshavebeen
accuratelyandcomprehensivelystoringdataforyears.Thedataisinvarietyofformslikesocialmediaposts,
email,blogs,news,feedback,tweets,businessdocumentsetc.
Itisveryimportanttoextractmeaningfulinformationwithouthavingtoreadeverysinglesentence.Now,what
ismeaningfulinformation.Theextractionprocessshouldidentifythe"who","what","where","when"and
"howmuch"(amongotherthings)fromthesedata.
Forexample,usesocialmediadatatofindout
Whatpeoplearesayingaboutmybrandorproducts?
Howmanypeoplerecommendmybrandvs.advocateagainstit?
Text Analysis is the solution of all this problem.
Inthisarticlewewillexplain:
WhatisTextAnalysis?
WhyTextAnalysisissoimportantforbusiness?
HowdoesSAPHANAsupporttextanalysis?
BeforeunderstandingTextAnalysis,youwillhavetofirstunderstandStructuredDataand
UnstructuredData.
Structured and Unstructured Data:

Structured Data:Data that resides in a fixed field within a record or file is called
structured data. This includesdata contained in relational databases and spreadsheets .
For example data stored in database tables are structured data.
Structured data has the advantage of being easily entered, stored, queried and analyzed.
Unstructured Data:The phrase "unstructured data" usually refers to information that

doesn't reside in a traditional row-column database.
Unstructured data files often include text and multimedia content. Examples include email messages, word processing documents, videos, photos, audio files, presentations,
webpages and many other kinds of business documents.
Digging through unstructured data can be cumbersome and costly. Email is a good
example of unstructured data. It's indexed by date, time, sender, recipient, and subject,
but the body of an email remains unstructured. Other examples of unstructured data
include books, documents, medical records, and social media posts.
Why unstructured data is so important for business? Experts estimate that

80 to 90 percent of the data in any organization is unstructured. And the amount of
unstructured data in enterprises is growing significantly -- often many times faster than
structured databases are growing.
The only problem is extracting meaningful information from unstructured data.

Text Analysis HANA From SAP

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Text Analysis HANA From SAP

Загружено:

Авторское право:

Доступные форматы

Text Search and Text Analysis

with SAP HANA

Text Search and Text Analysis with SAP HANA:-

SEO (Search Engine optimization) Analytics:-

Text Data processing

SAP HANA supports in-database Text Analysis (SPS05).

Loading the PDF documents to SAP HANA

Result Table in HANA

SAP HANA Text Analysis

on Feb 23, 2016 4:12 PM, last modified by

on Feb 23, 2016 4:19 PM

Structured and Unstructured Data:

Unstructured Data:The phrase "unstructured data" usually refers to information that

Why unstructured data is so important for business? Experts estimate that

Вам также может понравиться