You are on page 1of 15

Apache Avro

Zafar Gilani
Muhammad Adnan Khan
Hui Shang

Outline

Overview
Comparison
Specification
SASL profile and usage
References

Overview

A data serialization system.


An RPC framework.
For: storage & comm.
Purpose:
Provide rich data structures.
A compact and fast binary data format.
Simple integration with dynamic languages.

Overview
Avro uses JSON for Interface Description
Language (IDL).
To specify data types.
To specify protocols.

Review: JavaScript Object Notation is just a


light-weight text-based standard for data
interchange.

Why the need for Avro?


Primary usage in Hadoop, provides standard:
1. Serialization format for persistent data.
2. Wire format for communication ..

.. among Hadoop nodes.


.. from client programs to Hadoop services.

Overview
Avro relies on schemas.
Schema stored with data.
Each datum written with no per-value overheads.
Thus serialization is fast and small.

Avro in RPC:
Schema exchange during client-server handshake.
Correspondence in fields can be easily resolved.

APIs
Supporting API for:
Java
C
C++
C#
Python
Ruby

Comparison with other systems


Avro vs. Protobuf and Thrift.
A quick note about Thrift:
Initially developed at Facebook by a Google intern.
Closer to Googles protobuf.

Comparison with other systems


Avro

Google protobuf

Thrift

Implementation

Hmm..

Cleaner

Hmm..

Error handling

Complex

Simple

OK

Extensibility

Hmm..

Richer

OK

Compatibility

Java, C, C++, C#,


Python and Ruby

That and much


more such as
Adobe Actionscript,
Microsoft
Silverlight, etc.

About the same as


protobuf

Specification
Schema represented in one of:
JSON string, naming a defined type.
JSON object of the form:
{"type": "typeName" ...attributes...}

JSON array

Primitive types: null, boolean, int, long, float,


double, bytes, string
{"type": "string"}

Complex types: records, enums, arrays, maps,


unions, fixed

Specification, example protocol


{
"namespace": "com.acme",
"protocol": "HelloWorld",
"doc": "Protocol Greetings",
"types": [
{"name": "Greeting", "type": "record", "fields": [
{"name": "message", "type": "string"}]},
{"name": "Curse", "type": "error", "fields": [
{"name": "message", "type": "string"}]}
],

"messages": {
"hello": {
"doc": "Say hello.",
"request": [{"name": "greeting", "type": "Greeting" }],
"response": "Greeting",
"errors": ["Curse"]
}
}
}

SASL profile
Simple Authentication and Security Layer.
Provides a framework for
Authentication.
Security of network protocols.

SASL usage
Negotiation procedure to use connectionoriented Avro RPC:
0: START Used in a client's initial message.
1: CONTINUE Used while negotiation is
ongoing.
2: FAIL Terminates negotiation unsuccessfully.
3: COMPLETE Terminates negotiation
sucessfully.

References
1. Apache Avro,
http://avro.apache.org/docs/current/
2. Google protocol buffers vs Apache Avro,
http://www.sammur.com/?p=36
3. Avro vs Thrift,
http://tech.puredanger.com/2011/05/27/serializ
ation-comparison/
4. SASL,
http://avro.apache.org/docs/current/sasl.html

Apache Avro
Zafar Gilani
Muhammad Adnan Khan
Hui Shang