Вы находитесь на странице: 1из 4

Home About Us Careers Resources Our Locations

Integrating Apache NiFi with external Search … Search

API’s
Ayyakutty Ramesh — February 22, 2019 in Big data Categories
Analytics
0 comments
Artificial Intelligence

Big data

Blockchain

Cloud

Data Management

Data Science

Gaming

GDPR Compliance

Generic

IoT

Machine Learning
Contents [hide]
ML
 1 What is NiFi?
Product Development
 2 Connecting Nifi to external API:
 3 Implementation: Testing
 3.1 1)Type casting:
 3.1.1 Challenges Faced:
 3.2 2)Handling apostrophe: Recent Post
 3.3 3)Handling a large dataset:
4 Common Machine Learning Mistakes
 3.4 4)Storing records using a timestamp: And How To Fix Them!
 3.5 5)Handling Null values:
What is non-functional testing
 3.6 6)Pagination:
What is Machine Learning?
What is NiFi?
An Overview to DevSecOps

Apache Nifi is a data logistics platform used for automating the data flow between Driving Digital Transformation- Indium
disparate data sources and systems which makes the data ingestion fast and secure. Software

ADA Compliance For Websites – Here Is


Connecting Nifi to external API: Everything You Need To Know

A Quick Guide to Functional Testing


To connect Nifi with the external API we have used the InvokeHttp processor. We can
configure it by right-clicking on the processor and provide the link from where the Game, Set & Match – The Analytics
data needs to be fetched. Provide the URL and the authorization credentials in the Revolution In Tennis

properties tab of the processor. Data Warehousing – Traditional vs


Cloud!

Top 10 Cross Browser Compatibility


Testing Tools

Want a Free Consultation?

Name
00


Implementation: Email


We need to get data from an API and store the necessary columns in postgres Organization
database. We can get the data by using Invokehttp processor. The resultant data is in

json format. To split json array into individual records we use SplitJson processor. In Contact Number
00 some cases, the resultant data is in nested json format. To convert into single json

file we use JoltTransformJson. To evaluate one or more json path expression we use
Type your Request / Inquiry
EvaluateJsonPath processor,resultant of the processor are designate to the flow file
0 attribute. By using evaluate json path processor we can filter required data in json.
SHARES
SHARES

Then we use AttributeToJson processor for converting resultant attributes into json
format. Finally, we use ReplaceText processor for parsing query and ExecuteSql
processor for executing the query.

Não sou um robô


reCAPTCHA
Privacidade - Termos

Send

1)Type casting:
In the above example, we need to store the column ‘active’ as an integer in the
Postgres database.

To achieve this, we have used Update Attribute processor which supports nifi-
expression language. We have added a property as ‘active’ and converted it to integer
by passing a property value as ${active:toNumber()}.

Challenges Faced:
Example:
While using AttributetoJson processor for writing all the
flow file attribute the resultant json values will be of
{
string data type.

“dept_name ” : “CSE”,
In the above example, we need to store the column
‘active’ as an integer in the postgres database. “active” : “1”

To achieve this, we have used Update Attribute processor which supports nifi-
expression language. We have added a property as ‘active’ and converted it to integer
by passing a property value as ${active:toNumber()}.

2)Handling apostrophe:

In our use case, we had to store the values with apostrophe in the database. But
while trying to store it using ExecuteSql process we got an error message as “Invalid
string”. In order to store the values with apostrophe, we have added a property to
replace the apostrophe with an empty character in the Update Attribute processor.

Example:

If the value of dept_name is Royalty’s, add a new property named as “dept_name”


and pass the condition in property value as ${dept_name:replaceAll(‘\”,'”‘)}

3)Handling a large dataset:

MergeContent processor can be used for executing batch queries. It reduces the
execution time taken by inserting bulk data. By default, it inserts 1000 records in a
batch. We can also change the number of records to be inserted.

4)Storing records using a timestamp:

To store recent records based on the updated date in the postgres database, we
have added a property in the UpdateAttribute processor.

Example:

If the value of updated_at is 2017-12-28T01:47:05Z, here we want to convert this


UTC time format into actual date format. To do this, we have added a property as
updated_at and passed the condition in property value as

${updated_at:toDate(“yyyy-MM-dd’T’HH:mm:ss’Z'”):format(‘yyyyMMdd’)}.

Then pass the condition by using which, the data needs to be fetched in the
RouteOnAttribute processor. Here, we have taken the records that are updated
after the given timestamp.

We can also filter the records based on the timestamp while fetching it from the API.
The above property will retrieve all the records from the API and based on the
timestamp, the recent records will be stored whereas by defining it globally we can
just retrieve the records that are updated after the given time frame.

To pass a variable as global, right click on the processor group. Under variables, add
the variable which we want to pass to all the flows.
5)Handling Null values:

To handle null records and route it into failure, we have added a property in Update
Attribute processor: This property will check whether all the columns are empty if
so, then it will not be stored and will be routed to failure.

6)Pagination:

To iterate multiple pages and retrieve records, we had to use GenerateFlowFile and
Set Initial Pagination Parameter processor. In Set Initial Pagination Parameter
processor add a property as given below.

This property will be the value to the parameter in the URL that we have given in the
InvokeHTTP processor.

https://api.example.com/v2/clients?page=${page}

Then add the name of the property where the total number of page information will
be available in the EvaluateJsonPath processor.

In our use case, the next_page attribute contains the total number of pages.

Then we use ExtractText processor to get the current page number.

In RouteOnAttribute processor, add the below property so that it will get iterated till
the last page.

Each time the page argument in the InvokeHTTP URL will be replaced with the current
page number and this will run till the last page. Below is the whole flow.

Ayyakutty Ramesh

apache nifi

Leave a Reply
Your email address will not be published.

Comment
Name Email Website

Post Comment

© 2019 Indium Software By continuing to use this website, you agree to our cookieDigital
policy. Independent
Ok No QA Learn
Gaming
more Industries Inquire Now! Blog

Вам также может понравиться