Вы находитесь на странице: 1из 4

Google Certified Professional - Data Engineer

Job Role Description

A Google Certified Professional - Data Engineer enables data-driven decision making by collecting, transforming, and visualizing data. The data engineer should be able to design, build, maintain, and troubleshoot data processing systems with a particular emphasis on the security, reliability, fault-tolerance, scalability, fidelity, and efficiency of such systems. The data engineer should also be able to analyze data to gain insight into business outcomes, build statistical models to support decision-making, and create machine learning models to automate and simplify key business processes.

Certification Exam Guide

Section 1: Designing data processing systems

1.1 Designing flexible data representations. Considerations include:

● future advances in data technology

● changes to business requirements

● awareness of current state and how to migrate the design to a future state

● data modeling

● tradeoffs

● distributed systems

● schema design

1.2 Designing data pipelines. Considerations include:

● future advances in data technology

● changes to business requirements

● awareness of current state and how to migrate the design to a future state

● data modeling

● tradeoffs

● system availability

● distributed systems

● schema design

● common sources of error (eg. removing selection bias)

1.3 Designing data processing infrastructure. Considerations include:

● future advances in data technology

● changes to business requirements

● awareness of current state, how to migrate the design to the future state

● data modeling

● tradeoffs

● system availability

● distributed systems

● schema design

● capacity planning

● different types of architectures: message brokers, message queues, middleware, service-oriented

Section 2: Building and maintaining data structures and databases

2.1 Building and maintaining flexible data representations

2.2 Building and maintaining pipelines. Considerations include:

● data cleansing

● batch and streaming

● transformation

● acquire and import data

● testing and quality control

● connecting to new data sources

2.3 Building and maintaining processing infrastructure. Considerations include:

● provisioning resources

● monitoring pipelines

● adjusting pipelines

● testing and quality control

Section 3: Analyzing data and enabling machine learning

3.1 Analyzing data. Considerations include:

● data profiling

● data correlation

● patterns and insights

● anomaly detection

● statistical models

● machine learning

● assessing the statistical relevance of conclusions

3.2 Transforming data to enable machine learning and pattern discovery. Considerations

include:

 

● repeatability

● generalization

● distributed computing

● improved model accuracy

3.3

Identifying or building data visualization and reporting tools. Considerations include:

● automation

● decision support

● data summarization

● enabling patterns and insights

Section 4: Modeling business processes for analysis and optimization

4.1 Mapping business requirements to data representations. Considerations include:

● working with business users

● gathering business requirements

4.2 Optimizing data representations, data infrastructure performance and cost.

Considerations include:

● resizing and scaling resources

● data cleansing, distributed systems

● high performance algorithms

● common sources of error (eg. removing selection bias)

Section 5: Ensuring reliability

5.1 Performing quality control. Considerations include:

● verification

● building and running test suites

● pipeline monitoring

5.2 Assessing, troubleshooting, and improving data representations and data processing

infrastructure.

5.3 Recovering data. Considerations include:

● planning (e.g. fault-tolerance)

● executing (e.g., rerunning failed jobs, performing retrospective re-analysis)

● stress testing data recovery plans and processes

Section 6: Visualizing data and advocating policy

6.1 Building (or selecting) data visualization and reporting tools. Considerations include:

● automation

● decision support

● data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)

6.2 Advocating policies and publishing data and reports.

Section 7: Designing for security and compliance

7.1 Designing secure data infrastructure and processes. Considerations include:

● Identify and Access Management (IAM)

● data security

● penetration testing

● Separation of Duties (SoD)

● security control

7.2 Designing for legal compliance. Considerations include:

● Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc.

● audits