Вы находитесь на странице: 1из 8

5/13/2016 CCPDataEngineer

CCP Data Engineer


Join an elite group of data engineering professionals who have proven mastery
developing solutions for big data.

(/) (/training/certification.html) | CCP Data Engineer

Benefits
Individuals
Performance-Based
Employers want to hire candidates with proven skills. The CCP program lets you
demonstrate your skills in a rigorous hands-on environment.
Skills not Products
Clouderas ecosystem is defined by choice and so are our exams. CCP exams
test your skills and give you the freedom to use any tool on the cluster. You are
given a customer problem, a large data set, a cluster, and a time limit. You
choose the tools, languages, and approach. (see below for cluster configuration)
Promote and Verify
As a CCP, you've proven you possess skills where it matters most. To help you
promote your achievement, Cloudera provides the following for all current CCP
credential holders:
A Unique profile link on certification.cloudera.com to promote your skills
and achievements to your employer or potential employers which is also
integrated to LinkedIn. (Example of a current CCP profile
(http://certification.cloudera.com/user/ccp002-luis-carlos-quintela))
CCP logo for business cards, rsums, and online profiles
http://www.cloudera.com/training/certification/ccpdataengineer.html 1/8
5/13/2016 CCPDataEngineer

Current
The big data space is rapidly evolving. CCP exams are constantly updated to
reflect the skills and tools relevant for today and beyond. And because change
is the only constant in open-source environments, Cloudera requires all CCP
credentials holders to stay current with three-year mandatory re-testing in
order to maintain current CCP status and privileges.

Companies
Performance-Based
Clouderas hands-on exams require candidates to prove their skills on a live
cluster, with real data, at scale. This means the CCP professional you hire or
manage have skills where it matters.
Verified
The CCP program provides a way to find, validate, and build a team of qualified
technical professionals
Current
The big data space is rapidly evolving. CCP exams are constantly updated to
reflect the skills and tools relevant for today and beyond. And because change
is the only constant in open-source environments, Cloudera requires all CCP
credentials holders to stay current with three-year mandatory re-testing.

CCP Data Engineer Exam (DE575) Details


Exam Question Format
You are given five to eight customer problems each with a unique, large data set, a 7-
node high performance CDH5 cluster, and four hours. For each problem, you must
implement a technical solution with a high degree of precision that meets all the
requirements. You may use any tool or combination of tools on the cluster (see list
below) -- you get to pick the tool(s) that are right for the job. You must possess
enough industry knowledge to analyze the problem and arrive at an optimal
approach given the time allowed. You need to know what you should do and then do
it on a live cluster under rigorous conditions, including a time limit and while being
watched by a proctor.

Audience and Prerequisites


http://www.cloudera.com/training/certification/ccpdataengineer.html 2/8
5/13/2016 CCPDataEngineer

Audience and Prerequisites


Candidates for CCP Data Engineer should have in-depth experience developing data
engineering solutions and a high-level of mastery of the skills below. There are no
other prerequisites.
Register for DE575 (https://university.cloudera.com/content/de575)

Required Skills
Data Ingest
The skills to transfer data between external systems and your cluster. This includes
the following:
Import and export data between an external RDBMS and your cluster, including
the ability to import specific subsets, change the delimiter and file format of
imported data during ingest, and alter the data access pattern or privileges.
Ingest real-time and near-real time (NRT) streaming data into HDFS, including
the ability to distribute to multiple data sources and convert data on ingest from
one format to another.
Load data into and out of HDFS using the Hadoop File System (FS) commands.

Transform, Stage, Store


Convert a set of data values in a given format stored in HDFS into new data values
and/or a new data format and write them into HDFS or Hive/HCatalog. This includes
the following skills:
Convert data from one file format to another
Write your data with compression
Convert data from one set of values to another (e.g., Lat/Long to Postal Address
using an external library)
Change the data format of values in a data set
Purge bad records from a data set, e.g., null values
http://www.cloudera.com/training/certification/ccpdataengineer.html 3/8
5/13/2016 CCPDataEngineer

Deduplication and merge data


Denormalize data from multiple disparate data sets
Evolve an Avro or Parquet schema
Partition an existing data set according to one or more partition keys
Tune data for optimal query performance

Data Analysis
Filter, sort, join, aggregate, and/or transform one or more data sets in a given format
stored in HDFS to produce a specified result. All of these tasks may include reading
from Parquet, Avro, JSON, delimited text, and natural language text. The queries will
include complex data types (e.g., array, map, struct), the implementation of external
libraries, partitioned data, compressed data, and require the use of metadata from
Hive/HCatalog.
Write a query to aggregate multiple rows of data
Write a query to calculate aggregate statistics (e.g., average or sum)
Write a query to filter data
Write a query that produces ranked or sorted data
Write a query that joins multiple data sets
Read and/or create a Hive or an HCatalog table from existing data in HDFS

Workflow
The ability to create and execute various jobs and actions that move data towards
greater value and use in a system. This includes the following skills:
Create and execute a linear workflow with actions that include Hadoop jobs,
Hive jobs, Pig jobs, custom actions, etc.
Create and execute a branching workflow with actions that include Hadoop
jobs, Hive jobs, Pig jobs, custom action, etc.
Orchestrate a workflow to execute regularly at predefined times, including
workflows that have data dependencies

http://www.cloudera.com/training/certification/ccpdataengineer.html 4/8
5/13/2016 CCPDataEngineer

Exam delivery and cluster information


CCP: Data Engineer Exam (DE575) is a remote-proctored exam available anywhere,
anytime. See theFAQ (/training/certification/faq.html)for more information and
system requirements.
CCP: Data Engineer Exam (DE575) is a hands-on, practical exam using Cloudera
technologies. Each user is given their own 7-node, high-performance CDH5 (currently
5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume,
Kite, Hue, Oozie, DataFu, and many others (See a full list (/documentation/cdh/5-1-
x.html)). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10,
Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA,
Sublime, Eclipse, and NetBeans.

Documentation Available online during the exam


Cloudera Product Documentation
(http://www.cloudera.com/documentation/enterprise/5-3-x.html)
Hadoop - Apache Hadoop 2.5.0-cdh5.3.2 (http://archive-
primary.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2/)
Apache Hive (http://hive.apache.org/)
Sqoop Documentation (v1.4.5-cdh5.3.2) (http://archive-
primary.cloudera.com/cdh5/cdh/5/sqoop-1.4.5-cdh5.3.2/)
Spark (http://spark.apache.org/docs/1.2.1/)
Apache Crunch (http://crunch.apache.org/)
Apache Pig (http://archive-primary.cloudera.com/cdh5/cdh/5/pig-0.12.0-
cdh5.3.2/)
Kite: A Data API for Hadoop (http://kitesdk.org/docs/current/)
Apache Avro 1.7.7 (http://avro.apache.org/docs/1.7.7/)
Apache Parquet (https://parquet.incubator.apache.org/documentation/latest/)
Cloudera HUE (http://archive-primary.cloudera.com/cdh5/cdh/5/hue-3.7.0-
cdh5.3.2/)
Apache Oozie (http://archive-primary.cloudera.com/cdh5/cdh/5/oozie-4.0.0-
cdh5.3.2/)
Apache Flume 1.5.0 (http://archive-primary.cloudera.com/cdh5/cdh/5/flume-ng-
1.5.0-cdh5.3.2/)
http://www.cloudera.com/training/certification/ccpdataengineer.html 5/8
5/13/2016 CCPDataEngineer

DataFu 1.1.0 (http://archive-primary.cloudera.com/cdh5/cdh/5/datafu-1.1.0-


cdh5.3.2/javadoc/)
JDK 7 API Docs
(http://docs.oracle.com/javase/7/docs/api/)
Only the documentation, links, and resources listed above are accessible during the
exam. All other websites, including Google/search functionality is disabled. You may
not use notes or other exam aids.

Sample Exam Question


LoudAcre Mobile is a mobile phone service provider that is moving a portion of their
customer analytics workload to Hadoop. Before they can use their customer data,
they want you to clean it and make it consistent.
Errors were found while looking at the customer records. Unfortunately, dierent
input methods wrote date fields in dierent formats. Your task is to standardize
these date fields into a consistent format.
Data Description
The Hive metastore contains a database namedproblem1that contains a table
namedcustomer.Thecustomertable contains 90 million customer records
(90,000,000), each with a birthday field.

Sample Data (birthday is in bold)

1904287 Christopher Rodriguez Jan 11, 2003

96391595 Thomas Stewart 6/17/1969

2236067 John Nelson 08/22/54

Output Requirements
http://www.cloudera.com/training/certification/ccpdataengineer.html 6/8
5/13/2016 CCPDataEngineer

Create a new table namedsolutionin theproblem1database of the Hive


metastore
Yoursolutiontable must have its data stored in the HDFS
directory/user/cert/problem1/solution
Yoursolutiontable must have exactly the same columns as thecustomertable
in the same order, as well as keeping the existing file format
For every row in thesolutiontable, replace the contents of the birthday field
with a date string in MM/DD/YY format.
MM is the zero-padded month (01-12),
DD is the zero-padded day (01-31),
YY is the zero-padded 2-digit year (00-99)
End of Sample Problem
Certification FAQ (/training/certification/faq.html)
Verify a Certification (http://certification.cloudera.com/verify/)

About Cloudera (/about-cloudera.html)


Resources (/resources.html)
Contact (/contact-us.html)
Careers (/about-cloudera/careers.html)
Press (/about-cloudera/press-center.html)
Documentation (/documentation.html)
United States: +1 888 789 1488
Outside the US: +1 650 362 0488

2016 Cloudera, Inc. All rights reserved. Apache Hadoop (http://hadoop.apache.org) and associated
open source project names are trademarks of the Apache Software Foundation (http://apache.org).
For a complete list of trademarks, click here. (/legal/terms-and-conditions.html)

(https://www.linkedin.com/company/cloudera)

http://www.cloudera.com/training/certification/ccpdataengineer.html 7/8
5/13/2016 CCPDataEngineer

(https://www.facebook.com/cloudera) (https://twitter.com/cloudera)

(/contact-us.html)

Terms & Conditions (/legal/terms-and-conditions.html)|Privacy Policy


(/legal/privacy-policy.html)

http://www.cloudera.com/training/certification/ccpdataengineer.html 8/8

Вам также может понравиться