CCP Data Scientist

eran0%6 CCP Data Scientist Exams CCP Data Scientist Exams iing/certification/ecp-ds.html) | CCP Data Scientist Exams Required Exams DS700 - Descriptive and Inferential Statistics on Big Data DS701 - Advanced Analytical Techniques on Big Data DS702 - Machine Learning at Scale Each exam may be taken in any order. All three exams must be passed within 365 days of each other. Candidates who fail an exam must wait a period of thirty calendar days, beginning the day after the failed attempt, before they may retake the same exam. Candidates must pay for each exam attempt. Each passed exam is verifiable in your exam transcript and history Chttp://certification.cloudera.com/verify/). Exam Format Each exam is a single challenge scenario. You are provided access to the scenario, the data sets, and the cluster. You are given eight (8) hours to complete the challenge. See below for more information on the cluster. Register for DS700 > Chttps://university.cloudera.com/content/DS700) Register for DS701 > (https://university.cloudera.com/content/DS701? -9a=1.42130597.187347023.1473657110) Register for DS702 > (https://university.cloudera.com/content/DS702? -9a=1.42130597.187347023.1473657110) ip hwwr cloudora comfraivinglcertcatnlecp-dsfexams ml 48eran0%6 CCP Data Scientist Exams Required Skills Common Skills (all exams) Extract relevant features from a large dataset that may contain bad records, partial records, errors, or other forms of “noise” Extract features from a data stored in a wide range of possible formats, including JSON, XML, raw text logs, industry-specific encodings, and graph link data DS700 - Descriptive and Inferential Statistics on Big Data Use statistical tests to determine confidence for a hypothesis Calculate common summary statistics, such as mean, variance, and counts Fit a distribution to a dataset and use that distribution to predict event likelihoods Perform complex statistical calculations on a large dataset DS701 - Advanced Analytical Techniques on Big Data Build @ model that contains relevant features from a large dataset Define relevant data groupings, including number, size, and characteristics Assign data records from a large dataset into a defined set of data groupings Evaluate goodness of fit for a given set of data groupings and a dataset Apply advanced analytical techniques, such as network graph analysis or outlier detection DS702 - Machine Learning at Scale Build @ model that contains relevant features from a large dataset Predict labels for an unlabeled dataset using a labeled dataset for reference Select a classification algorithm that is appropriate for the given dataset Tune algorithm metaparameters to maximize algorithm performance Use validation techniques to determine the successfulness of a given algorithm for the given dataset How to Prepare ip hwwr cloudora comfaivinglcerticatnlecp-dsexams mlarrazote CoP Osa Sietst Exams Q. What technologies/languages do I need to know? A. You'll be provided with a cluster with Hadoop technologies on a cluster, plus standard tools like Python and R. Among these standard technologies, it's your choice what to use to solve the problem Q. How difficult are the problems? A. Think of a scaled-down Kaggle problem that’s intended to be solved in hours, not days of effort. If you can solve a Kaggle problem in a weekend, you're in good shape. You may also take a look at a sample past exam and the solution in our free solution kit ¢http://certification.cloudera.com/prep/dscisk/intro.html? -9a=1.228129693.187347023.1473657110). Q. What should | study to prepare? A. Courseras intro "machine learning’ course is a good level of preparation, but here are several more links of interest. General Data Science The Open Source Data Science Masters Curriculum (http://datasciencemasters.org/) Theory Machine Learning Data Science at Scale Using Spark and Hadoop (/training/courses/data- science-at-scale-using-spark-and-hadoop.html?course=data-science-at- scale-using-spark-and-hadoop&loc=online) Coursera specialization (https://www.coursera.org/specializations/machine-learning) Introductory Machine Learning (Coursera) (https://www.coursera.org/learn/machine-learning) Advanced Probabilistic Graphical Models (Coursera) (https://www.coursera.org/course/pgm) Learning from Data (Caltech through EdX) (https://work.caltech.edu/telecourse.html) Statistics Statistics: Making Sense of Data (Coursera) (https://www.coursera.org/course/introstats) or Statistics One (Coursera) (https://www.coursera.org/course/stats1) ip hwwr cloudora comfaivinglcerticatnlecp-dsexams ml asarrazote CoP Osa Sietst Exams Open Intro Statistics (https://www.openintro.org/stat/?stat_book=os) Linear Algebra https://www.coursera.org/course/matrix Chttps:/www.coursera.org/course/matrix) Engineering Tools Spark Cloudera Developer Training for Spark and Hadoop (training/courses/developer-training-for-spark-and-hadoop.html? course=developer-training-for-spark-and-hadoop&loc=online) Data Science at Scale Using Spark and Hadoop (training/courses/data-science-at-scale-using-spark-and- hadoop.html?course=data-science-at-scale-using-spark-and- hadoop&loc=online) R R Programming (Coursera) (https://www.coursera.org/course/rprog) Exam Delivery and Cluster Information All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime. See the FAQ (/training/certification/fag.html) for more information and system requirements. Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each user is given their own 7-node, high-performance CDHS (currently 53.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list ((documentation/other/CDH-tarballs.htm))). In addition the cluster also comes with Python (26 and 3.4), Perl 510, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 211, Scalding, IDEA, Sublime, Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb, SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical parser, Stanford Log-linear Part-Of-Speech ‘Tagger, Stanford Named Entity Recognizer (NER), Stanford Word Segmenter, opennlp, H20, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib, ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water. Currently, the cluster is open to the internet and there are no restrictions on tools you can install or websites or resources you may use. ip hwwr cloudora comfaivinglcerticatnlecp-dsexams ml astaa0rs oP data Sienst Exams CCP Data Scientist Solution Kit > ¢httpi//certification.cloudera.com/prep/dscisk/intro.html) Certification FAQ > (/training/certification/faq.html) Verify a Certification > Chttp://certification.cloudera.com/verify/) About (/about-cloudera.html) Resources (/resources.html) Contact (/contact-us.html) Careers (/about-cloudera/careers.htm)) Press (/about-cloudera/press-center.html) Documentation (/documentation.html) Blogs (/blogs.html) United States: +1 888 789 1488 Outside the US: +1 650 362 0488 © 2016 Cloudera, Inc. Alll rights reserved. Apache Hadoop (http://hadoop.apache.org) and associated open source project names are trademarks of the Apache Software Foundation (http://apache.org). For a complete list of trademarks, click here (/legal/terms-and-conditions.html##trademarks). (https://www.linkedin.com/company/cloudera) (https://www.facebook.com/cloudera) (https://twitter.com/cloudera) (/contact-us.html) Terms & Conditions (/legal/terms-and-conditions.html) | Policies (/legal/policies.html) ip hwwr cloudora comfaivinglcerticatnlecp-dsexams ml 5

CCP Data Scientist

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CCP Data Scientist

Загружено:

Авторское право:

Доступные форматы

Вам также может понравиться