Академический Документы
Профессиональный Документы
Культура Документы
Technology
In
D.K.Chandradeep 170030297
K L University
K L University
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
(DST-FIST Sponsored Department)
CERTIFICATE
This is to certify that this project based lab report entitled “Ola Cab Service” is a bonafide work done
by D. K. Chandradeep(170030297), in partial fulfillment of the requirement for the award of degree in
BACHELOR OF TECHNOLOGY in Computer Science and Engineering during the academic year
2019-2020.
K L University
DEPARTMENT OF COMPUTER SCIENCE ENGINEERING
(DST-FIST Sponsored Department)
DECLARATION
We hereby declare that this project based skilling report entitled “Ola Cab Service” has been
prepared by us in partial fulfillment of the requirement for the award of degree “BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE ENGINEERING” during the academic year 2019-2020.
We also declare that this project based lab report is of our own effort and it has not been
submitted to any other university for the award of any degree.
Date:
Place: Vaddeswaram
D.K. Chandradeep
170030297
P a g e |4
ACKNOWLEDGMENTS
My sincere thanks to “Mr.Dhawaleswar” in the Lab for their outstanding support throughout the project
We express our gratitude to Dr.V.Hari kiran Head of the Department for Computer Science &
Engineering for providing us with adequate facilities, ways and means by which we are able to complete
We would like to place on record the deep sense of gratitude to the honourable Vice Chancellor, K L
University for providing the necessary facilities to carry the concluded project work.
Last but not the least, we thank all Teaching and Non-Teaching Staff of our department and especially
my classmates and my friends for their support in the completion of our project work.
Finally, it is pleased to acknowledge the indebtedness to all those who deviated themselves directly or
Name Student ID
D.K.Chandradeep 170030297
P a g e |5
TABLE OF CONTENTS
Abstract............................................................................................................................................. 6
Introduction ..................................................................................................................................... 7
Architecture of Hive and Hbase ......................................................................................... 8
Project Introduction ........................................................................................................... 9
Requirement Specifications ............................................................................................. 10
ABSTRACT
Ola is a popular CAB service provider in India especially in all metro cities like
Hyderabad, Bangalore, Chennai & Delhi. The OLA management has categorized its
Employees into drivers and help-desk teams. The basic functionality of the company is to
help customers book a cab over internet and phone. The help desk’s primary task is to help
customers in the booking process. The Help Desk employee will ask details of customer
ID (in case of existing customers) and type of vehicle they need. At the time of booking,
the vehicle of respective type is reserved to the customer. In case where a customer is not
the passenger, the passenger details have to be provided separately which are verified
against the ID shown by the passenger on board. New customer registration requires
customer details such as customer name, address and mobile number. The Help Desk will
generate a unique ID for a new customer. All customers or passengers provide a valid
identity proof such as Aadhar card for verification when they board the CAB for any
route. The same identity card number should be used while booking the cab.
At the time of reservation, customer provides information such as the desired route, pick
up point, destination, date, baggage details. Once a customer reserves a cab, he/she cannot
make any other reservations until the end of the reserved duration. Customer can reserve
one or more vehicles for a reservation. Ideally OLA cab service will have authorized
vehicles, agents with license to lease cab services and authorized drivers. Authorized
vehicle is the one that has all the permission from the RTO department like licenses for
each vehicle, no objection etc. A cab driver is an authorized driver if he/she has a valid
license issued by the RTO. OLA Company stores information of vehicles such as vehicle
reg number, vehicle type, insurance number, insurance expiry date, model number, tax
ID, mileage price (As an example, mini cars are charged at Rs. 10/Km), agent ID, issuer
RTO ID.
The drivers info is stored under employee section such as employee ID, name, address,
gender, date of birth, license ID, salary, employee type(driver, help-desk, attender etc.),
license expiry date. The OLA Cab agent will collect the vehicles from owners and provide
it on lease to the OLA Company. OLA agents store information about their owners such
as owner’s name, address, pan number and account number. As OLA has its own vehicles
it will be treated as one OLA agent with agent id as ‘001’. As we need to understand that
OLA agent is identified by agent id, agent name, agent address, vehicle id, account number
and account type. Once the driver furnishes the ride details, the accounts-team will
generate the bill amount to be paid by the customer along with the payment due date.
Customer can make payment by either cash or card.
P a g e |7
INTRODUCTION
Importance of hive and hbase:
Today, Hadoop has the privilege of being one of the most widespread technologies when
it comes to crunching huge amounts of Big Data. Hadoop is like an ocean with a vast
array of tools and technologies that are exclusively associated with it. One such
technology is Apache Hive. Apache Hive is a Hadoop component that is normally
deployed by data analysts. Even though Apache Pig can also be deployed for the same
purpose, Hive is used more by researchers and programmers. It is an open-source data
warehousing system, which is exclusively used to query and analyze huge datasets stored
in Hadoop.
One of the most important features of HBase is that it can handle data sets which number
in billions of rows and millions of columns. It can extremely well combine the various
data sources that are coming from a wide variety of types, structures and schemas
The Apache HBase carries all the features of the original Google Bigtable paper like the
Bloom filters, in-memory operations and compression. The tables of this database can
serve as the input for MapReduce jobs on the Hadoop ecosystem and it can also serve as
output after the data is processed by MapReduce. The data can be accessed via the Java
API or through the REST API or even the Thrift and AVRO gateways.
P a g e |8
PROJECT INTRODUCTION
The project we are working on is that based on the criteria given and the entities
mentioned above ,we need to design a DataBase in Hive and Hbase Environment.
In order to work on the creation of the DataBase we need to get concentrated on the
environment or platform that we are working on so that we create a appropriate table in
the database with the entities and relationships defined in the problem statement
Here are some of the queries in our project that will be working in both hive and hbase
environment
REQUIREMENT SPECIFICATIONS
Hard Drive
1 GB of minimum disk space is required for installing the software.
To calculate the disk space requirements for the job results folder, see Job Results Folder
Disk Space Calculation.
500 MB of free disk space is required for the log folder.
P a g e | 12
Hive
Big data analytics is the process of examining large amounts of data. Analyzing Big Data
is a challenging task as it involves large distributed file systems which should be fault
tolerant, flexible and scalable The size of data sets being collected and analyzed in the
industry for business intelligence is growing rapidly, making traditional warehousing
solutions prohibitively expensive. Hadoop is a popular open-source map-reduce
implementation which is being used in companies like Yahoo, Facebook etc. to store and
process extremely large data sets on commodity hardware. However, the map-reduce
programming model is very low level and requires developers to write custom programs
which are hard to maintain and reuse. Hive, an open-source data warehousing solution
built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative
language-HiveQL, which are compiled into mapreduce jobs that are executed using
Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into
queries. In this survey paper two most important technologies mapreduce and hive, for
handling big data for solving the problems in hand to deal the massive data has discussed.
Hbase
Apache Hadoop platform is the first user-facing application of Facebook which is built
on Apache HBase is a database-like layer built on Hadoop designed to support billions of
messages per day. This paper describes the reasons why Facebook chose Hadoop and
HBase over other systems such as Apache Cassandra and Voldemort and discusses the
application's requirements for consistency, availability, partition tolerance, data model
and scalability.
P a g e | 13
WHAT IS HADOOP:
Hadoop is an open source distributed processing framework that manages data processing
and storage for big data applications in scalable clusters of computer servers. It's at the
center of an ecosystem of big data technologies that are primarily used to support
advanced analytics initiatives, including predictive analytics, data mining and machine
learning. Hadoop systems can handle various forms of structured and unstructured data,
giving users more flexibility for collecting, processing and analyzing data than
relational databases and data warehouses provide.
Hadoop's ability to process and store different types of data makes it a particularly good
fit for big data environments. They typically involve not only large amounts of data, but
also a mix of structured transaction data and semistructured and unstructured information,
such as internet clickstream records, web server and mobile application logs, social media
posts, customer emails and sensor data from the internet of things
The core components in the first iteration of Hadoop were MapReduce, HDFS and
Hadoop Common, a set of shared utilities and libraries. As its name indicates, MapReduce
uses map and reduce functions to split processing jobs into multiple tasks that run at the
cluster nodes where data is stored and then to combine what the tasks produce into a
coherent set of results. MapReduce initially functioned as both Hadoop's processing
engine and cluster resource manager, which tied HDFS directly to it and limited users to
running MapReduce batch applications.
P a g e | 14
BIBLIOGRAPHY:
Hive:
https://www.gocit.vn/files/Oreilly.Programming.Hive-www.gocit.vn.pdf
https://www.researchgate.net/publication/307145382_Practical_Hive
Hbase:
https://www.mpam.mp.br/attachments/article/6214/HBase%EF%BC%9AThe%20Definitive%20Guide.
pdf
https://hbase.apache.org/apache_hbase_reference_guide.pdf
Hadoop :
https://manning-content.s3.amazonaws.com/download/8/83d06da-05f9-473b-a6a8-
cc83f3f950d9/HadoopiA2E_MEAP_ch01.pdf
https://people.cs.kuleuven.be/~joost.vennekens/DN/bigdata.pdf
P a g e | 15
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-
hoc queries, and the analysis of large datasets stored in Hadoop compatible file
systems. Hive provides a mechanism to project structure onto this data and query the data
using a SQL-like language called HiveQL.
Hive Syntax
Hbase Syntax:
Update:
delete ‘<table name>’, ‘<row>’, ‘<column name >’, ‘<time stamp>’ Scan :
These are the some of the syntaxes related to our project and we need to have a command
on all the hive and hbase commands in order to work effectively.
P a g e | 17
IMPLEMENTATION:
HIVE (CODE)
Tables
Creation
Employee Table:
hive> create table employee(id int,name string,gender string,dob string,address string,salary int,licence
string,emptype string,licence_id bigint)
> comment ‘employee details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
P a g e | 18
Customer Table:
hive> create table customer(id int,name string,gender string,mobile bigint,prooftype string,address
string,pickup string,destination string,rv_id int)
> comment ‘cunstomer details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
Agent Table:
hive> create table agent(id int,name string,gender string,mobile bigint,address string,v_id int,acc_no
bigint,acc_type string)
> comment ‘agent details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
P a g e | 19
Reservation Table
hive> create table reservation(id int,v_id int,c_id int,e_id int)
> comment ‘reservation details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
Vehicle Table
hive> create table vehicle(id int,type string,name string,moden_no int,price int,insurance int,insuexpdate
string,aeid int)
> comment ‘vehicle details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
P a g e | 20
RTO Table:
hive> create table rto(r_id int,address string,dept_id int)
> comment ‘rto details’
> row format delimited
> fields terminated by ‘\t’
> lines terminated by ‘\n’
> stored as textfile;
P a g e | 21
Employee Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put employee.txt
/user/hadoop/project/employee.txt hive> load data inpath '/user/hadoop/project/employee.txt' into table
employee;
Customer Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put customer.txt
/user/hadoop/project/customer.txt
hive> load data inpath '/user/hadoop/project/customer.txt' into table customer;
Agent Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put agent.txt /user/hadoop/project/agent.txt hive>
load data inpath '/user/hadoop/project/agent.txt' into table agent;
P a g e | 22
Reservation Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put reservation.txt
/user/hadoop/project/reservation.txt
hive> load data inpath '/user/hadoop/project/reservation.txt' into table reservation;
RTO Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put rto.txt /user/hadoop/project/rto.txt hive> load
data inpath '/user/hadoop/project/rto.txt' into table rto;
Vehicle Data
hadoop@hadoop-laptop:~/Desktop/project$ hdfs dfs -put vehicle.txt /user/hadoop/project/vehicle.txt
hive> load data inpath '/user/hadoop/project/vehicle.txt' into table vehicle;
P a g e | 23
HIVE OUTPUTS:
Query-1
Display the list drivers who have valid licenses?
hive>select * from employee where emptype='driver' and licence !='0';
Query-2
Display the list of valid CAB’s?
hive>select * from vehicle where licence !='0';
P a g e | 24
Query-3
Display the list of drivers/employees who have sex is M?
hive>select * from employee where gender='M';
Query-4
Display the Cab details of INNOVA vehicle?
hive>select * from vehicle where name ='Innova';
P a g e | 25
Query-5
Display the list of driver having address with starts S and sex is M?
hive>select * from employee where address like 's%' and gender ='M' and emptype='driver';
P a g e | 26
IMPLEMENTATION:
HBase(CODE)
Tables
Creation
Employee Table:
hbase(main):022:0> create ‘employee’,’details’
Customer Table:
hbase(main):025:0> create ‘customer’,’status’
Agent Table:
hbase(main):024:0> create ‘agent’,’details’
P a g e | 27
Reservation Table:
hbase(main):026:0> create ‘reservation’,’bill’
Vehicle Table:
hbase(main):031:0> create ‘vehicle’,’details’
RTO Table:
hbase(main):027:0> create ‘rto’,’details’
P a g e | 28
Customer Data:
hbase(main):027:0> put 'customer','1','status:id','3001' hbase(main):027:0> put
'customer','1','status:name','Teja' hbase(main):027:0> put 'customer','1','status:gender','M'
hbase(main):027:0> put 'customer','1','status:mobile','9553878342' hbase(main):027:0> put
'customer','1','status:prooftype','Aadhar' hbase(main):027:0> put
'customer','1','status:address','vijayawada' hbase(main):027:0> put
'customer','1','status:pickup','bhavanipuram' hbase(main):027:0> put
'customer','1','status:destination','klu' hbase(main):027:0> put 'customer','1','status:rv_id','22002'
Agent Data:
hbase(main):027:0> put 'agent','1','details:id','1001' hbase(main):027:0> put
'agent','1','details:name','vikram' hbase(main):027:0> put 'agent','1','details:gender','M'
hbase(main):027:0> put 'agent','1','details:mobile','9367594261' hbase(main):027:0> put
'agent','1','details:address','guntur' hbase(main):027:0> put 'agent','1','details:v_id','22002'
hbase(main):027:0> put 'agent','1','details:acc_no','120005551' hbase(main):027:0> put
'agent','1','details:acc_type','savings'
Reservation Data:
hbase(main):027:0> put 'reservation','1','bill:id','15001' hbase(main):027:0> put
'reservation','1','bill:v_id','22001' hbase(main):027:0> put 'reservation','1','bill:c_id','3001'
hbase(main):027:0> put 'reservation','1','bill:e_id','5005'
RTO Data:
hbase(main):027:0> put 'rto','1','details:r_id','18001' hbase(main):027:0> put
'rto','1','details:address','vijayawada' hbase(main):027:0> put 'rto','1','details:dept_id','1'
Vehicle Data:
hbase(main):027:0> put 'vehicle','1','details:id','22001' hbase(main):027:0> put
'vehicle','1','details:type','6' hbase(main):027:0> put 'vehicle','1','details:name','Innova'
hbase(main):027:0> put 'vehicle','1','details:model_no','1221' hbase(main):027:0> put
'vehicle','1','details:price','6000' hbase(main):027:0> put 'vehicle','1','details:insurance','456987'
hbase(main):027:0> put 'vehicle','1','details:insuexpdate','10/10/22' hbase(main):027:0> put
'vehicle','1','details:aeid','5005'
P a g e | 33
HBase OUTPUTS:
Query-1:
hbase(main): 027:0> scan ‘employee’
Query-2:
hbase(main): 027:0> scan ‘customer’
P a g e | 35
Query-3:
hbase(main): 027:0> scan ‘reservation’
Query-4:
hbase(main): 027:0> scan ‘vehicle’
P a g e | 36
Query-5:
hbase(main): 027:0> scan ‘agent’
Query-6:
hbase(main): 027:0> scan ‘rto’
P a g e | 37
CONCLUSION:
Hence there by we have gone through the given information regarding the creation of the
database with the entities and the relationships among them and we have created 6 tables
in the Hive & HBase and then we have executed all the given queries according to the
tables and that are created in the hive & Hbase.
Thus we can infer that we have done the project successfully.
P a g e | 38
REFERENCES
https://www.tutorialspoint.com/hbase/hbase_scan.htm
https://www.edureka.co/blog/hive-commands-with-examples
https://www.guru99.com/hive-queries-implementation.html
https://www.dezyre.com/hadoop-tutorial/hive-commands
http://www.corejavaguru.com/books/best-hbase-books
https://www.javatpoint.com/hive-commands