4/27/20 Machine Learning By Sathish Yellanki Slide No : 1
Data Engineer Versus Data Scientist
4/27/20 Machine Learning By Sathish Yellanki Slide No : 2
Let Us Understand Who is Data Engineer?
4/27/20 Machine Learning By Sathish Yellanki Slide No : 3
• Data Engineers Build And Optimize The Systems Allowing Data Scientists And Analysts To Perform Their Work. • Data Engineer Should Ensure That Any Data is Properly • Received • Transformed • Stored • Made Accessible Data Engineer Responsibilities • Establish The Foundation Architecture For Data Analysts and Data Scientists • Take Responsibility To Construct The Data Pipelines, To Handle Huge Data • Should Understand The Entire Software Development Life Cycle • Should Keep Focus on Leveraging • Data Tools • Maintain Databases • Create and Manage Data Pipelines • Should Develop a Mind Set on Building and Optimizing Applications What are The Tasks of Data Engineer? • Building API’s For Data Consumption. • Integrating External OR New Datasets into Existing Data Pipelines. • Apply Feature Transformations For Machine Learning Models on New Data. • Continuous Monitoring & Testing, System To Ensure Optimized Performance. 4/27/20 Machine Learning By Sathish Yellanki Slide No : 4 Finally What is Data Engineering?
Software Business BigData
Engineering Intelligence Abilities
Services Provided BY Data Engineer
Data Ingestion • “Scraping” Databases, Loading Logs, Fetch Data From External Stores OR API’s. Metric Computation • Frameworks To Compute &Summarize Engagement, Growth OR Segmentation Related Metrics. Anomaly Detection • Automating Data Consumption to Alert People on Anomalous Events OR Changing Trends. Metadata Management • Allow Generation &Consumption of Metadata, Make it Easy to Find Information in DWH. Experimentation • A/B Testing And Experimentation Frameworks For Company’s Analytics With A Significant Data Engineering Component integrated to it. Instrumentation • Log Events And Attributes Related To Every Event, Make Sure That High-Quality Data is Captured Upstream Dependencies • Establish Pipelines That Are Specialized in Understand Series of Actions in Time, Allowing Analysts To Understand User Behaviors 4/27/20 Machine Learning By Sathish Yellanki Slide No : 5 Learning To Be a Data Engineer • Data Engineers Must Focus More on Learning • Data Modeling Techniques • Relational And Non-Relational Database Theory And Practice • Database Clustering Tools And Techniques • ETL Design • Architectural Projections Salary Projections
4/27/20 Machine Learning By Sathish Yellanki Slide No : 6
Let Us Understand Who is Data Analyst?
4/27/20 Machine Learning By Sathish Yellanki Slide No : 7
• Big Data Analyst Reviews, Analyzes And Reports on Big Data Stored And Maintained by an Organization. • Big Data Analysts Use • Manual Techniques • Automated Big Data Analysis/Analytics Software • Big Data Analysts Analyze • Large Amounts of Raw & Unstructured Data • Big Data Analysts Main Intent is to Find • Business Insight • Intelligence • Useful Information Big Data Analyst Responsibilities • Should be Well Versed in Big Data Concepts • Possesses Knowledge & Skills in Using • Database Querying Languages • Big Data Analytics Software • Should Have Good Understanding of • Data Mining • Data Extraction Technique • Should Usually Work in Coordination With • Data Scientists • Database Developers/Administrators • Management Team Machine Learning By Sathish Yellanki 4/27/20 Slide No : 8 Big Data Analyst Skills • A High Level of Mathematical Ability. • Programming Languages, Such As • Oracle SQL Or Any SQL Flavor • Python • R Language • Java OR Scala • Good Ability To • Analyze The Data and Business • Model The Data For Business • Interpret The Data in The Business • Problem-Solving Skills With Design of Algorithms • A Methodical And Logical Approach • Should Have Good Ability To • Plan The Work • Meet Deadlines • Develop Good Accuracy and Attention To Detail • Accuracy and Attention • Detail Interpersonal Skills • Team Working skills • Written & Verbal Communication Skills 4/27/20 Machine Learning By Sathish Yellanki Slide No : 9 Let Us Understand Who is Data Scientist?
4/27/20 Machine Learning By Sathish Yellanki Slide No : 10
• Data Science is a Study Which Involves Extracting Knowledge From Data • A Data Scientist Should Have the Skill to Turn Raw Data into Valuable Insights That An Organization Needs. • A Data Scientist Should Find the Valuable Insight, Which Can Make the Business Owner to Grow And Compete in His Business. • Data Scientist Should Have the Skill to Interpret And Analyze the Data From Multiple Sources To Come Up With Imaginative Solutions To Problems. • Data Scientist Should Use Their Strong Business Sense Along With An Ability To Communicate Findings To Both Business And IT. • Should Have the Leadership That Can Influence “How An Organization Approaches A Business Challenge”. • Data Scientists May Have Different Functions Depending on Which Industry/Sector They Are Involved. • Should Have the Ability To Combine Practical Skills Such as Coding And Mathematics With The Ability To Analyze Statistics. • Should Have the Ability to Model the Data in the Interest of the Business Growth and Targets. • Data Scientist Should Eliminate the Noise and Identify the Canonical Representative Data Points.. • Data Scientist “Generalizes the Data Model to be Able to Make Useful Statistical Predictions. 4/27/20 Machine Learning By Sathish Yellanki Slide No : 11 Data ScientistResponsibilities • Should Use Strong Business Acumen • For Useful Insights, He Should Have Great Ability To • Communicate Findings • Mine Vast Amounts of Data • Use Insights To Influence How An Organization Approaches Business Challenges • To Solve Problems Use A Combined Knowledge of • Computer Science And Applications • Modeling • Statistics • Analytics • Mathematics • Extract Data From Multiple Sources, Which Can be • Un-Structured • Semi-Structured • Structured • Fine Sift And Analyze Data From Multiple Angles, Looking For Trends That Highlight Problems OR Opportunities • Communicate Important Information &Insights To Business And IT Leaders • Make Recommendations To Adapt Existing Business Strategies 4/27/20 Machine Learning By Sathish Yellanki Slide No : 12 Key Skills For Data Scientists (Non-Technical) • Problem-Solving Skills • Communication Skills • Teamwork Skills • Investigative Skills • Interest in Statistics • Interest in Predicting Trends and Identifying Patterns • Innovative Thinking • Observation Skills • Critical Thinking Key Skills For Data Scientists (Technical) • Java OR Scala Coding • Python Coding • R Programming • Understand Hadoop Platform • SQL Database/Coding • Apache Spark • Machine Learning and AI • Data Visualization With Reporting Tools • Design of Algorithms • Advanced Statistics 4/27/20 Machine Learning By Sathish Yellanki Slide No : 13 Let Us Get More Insights
4/27/20 Machine Learning By Sathish Yellanki Slide No : 14
4/27/20 Machine Learning By Sathish Yellanki Slide No : 15 4/27/20 Machine Learning By Sathish Yellanki Slide No : 16 4/27/20 Machine Learning By Sathish Yellanki Slide No : 17 4/27/20 Machine Learning By Sathish Yellanki Slide No : 18 4/27/20 Machine Learning By Sathish Yellanki Slide No : 19 4/27/20 Machine Learning By Sathish Yellanki Slide No : 20 4/27/20 Machine Learning By Sathish Yellanki Slide No : 21 Thank You Very Much
4/27/20 Machine Learning By Sathish Yellanki Slide No : 22