Вы находитесь на странице: 1из 21

Fixing Data

Science
Challenges, Problems, Issues, Measures,
Mistakes, Opportunities, Ideas,
Technologies, Research and Visions
Manoj Kumar Ragupathi
Challenges

Source:
https://www.kaggle.com/surv
eys/2017

Current Relevance Discussion:


https://www.reddit.com/r/dat
ascience/comments/eeok6g/
how_relevant_are_these_chall
enges_in_data_science/
Issues

• Wrong Focus
• Wrong Commitments and Promises
• Misunderstanding-led Wrong Expectations
• Unexplainable AI
• Narrowed and Inability to Transfer Knowledge
Problems

• The Over Hype – Failed Promises


• https://www.reddit.com/r/datascience/comments/egqsmy/how_many_successful_aiml_model
s_implementations/
• https://analyticsindiamag.com/the-role-of-big-data-analytics-in-the-future-of-managers/,
accordingly says,
• Gartner reported in November 2017, that 60% of big data projects failed. A year later,Gartner analyst Nick
Heudecker said his company was “too conservative” with its 60% estimate and put the failure rate at closer to
85%. Today, he says nothing has changed.
• In July 2019, VentureBeat AI reported that 87% of data science projects never make it into production
• In January 2019, NewVantage survey reported that 77% of “business adoption” of big data and AI
initiatives continued to represent a big challenge for business, (which meant three-fourth of the
software being built is apparently collecting dust)

• Another AI Winter
• https://mindmatters.ai/2019/12/just-a-light-frost-or-ai-winter/
Technical Efforts Segmentation in Data Science

Modelling and Validation

Data Engineering Data Preparation and Analysis Productionization


Data Architecting Data Exploration APIfication
Data Acquisition Domain Understanding Containerization
Building Data Pipeline Data Visualization Continuous Train & Test
Ensuring Reliability Insights Gathering
DevOps CI/CD
Performance Tuning Hypothesis Validation
Monitoring
Providing DS Infrastructure Feature Engineering
Data Transformation
Data Discovery Enablement
Mistakes

• Professionals & Students are mostly focusing on learning


ML, DL, NLP, while it needs least effort in the entire Data
Science Cycle
• Fastest Growing Technical Ecosystem (Software, Tools,
Techniques and Practices) without Standardization
• Reusability of efforts spent is lacking
Mistakes: Data Infrastructure Sharing

• Businesses have Data Science Infrastructure, which is for


internal DS team
• Rarely, it is open for one IT vendor
• Cloud Data Science Infrastructure Providers’ Profitability is
more, due to data infrastructure redundancy and often leads
to huge waste of resources
• Need for Data Mesh
Mistakes: “My Precious” Data

• Businesses won’t share data, easily. So, no way for “Open-


Data”, unless Governments mandate it.
• Data Science Projects won’t succeed without using external
data
• Data Vendors’ Profitability is more
• Data Monetization is not done, due to lack of trust and
visibility
Mistakes: If Data = Oil, then, from Power
Perspective
Mistakes: The Silent “Linked Data”

• Social Media and Tech Giants


• Cloud Providers with Admin Access
• Blockchain Systems connects global business data together

“Artificial General Super Intelligence Powered By Tech Giants”


- Safe AI or Dystopic Future?
The Vision: A Platform

• Serves as Global Data Hub for Global Linked Data


• Anybody with access Can Peek & Work, Cannot Sneak and Steal
• Data Science for Digital Nomads and Telecommuters
• Hyper Data Monetization by Businesses
• Data Control and Tracking
• Nano-Payments for Outcomes
• Data Science Effort Reutilization and Transfer Learning
• A Safe Artificial Super Intelligence (ASI) Powered Global Auto Governance
The Virtual Glove Box
Platform
For Global Data Science Efforts, Tracking,
Monetization and Safe AI Governance
Safe ASI

• According to wiki, glovebox (or glove box) is a sealed


container that is designed to allow one to manipulate
objects where a separate atmosphere is desired.
• We need a virtual glove box for ASI Initiatives
• We can accelerate ASI Development through this Platform
Vision Enabler 1: Data
Mesh
https://www.slideshare.net/ManojKumarR41/data-mesh-212917511
https://martinfowler.com/articles/data-monolith-to-mesh.html
https://fast.wistia.net/embed/iframe/vys2juvzc3?videoFoam
Vision Enabler 2 : Data Trajectories

If Data =
Oil, then,
where are
the
refineries?

http://www.ijdc.net/article/view/11.1.1/419
Vision Enabler 3: Hash
Graph
https://www.swirlds.com/downloads/SWIRLDS-TR-2016-02.pdf
https://www.hedera.com/hh-whitepaper-v2.0-17Sep19.pdf
Vision Enabler 4:
BigPrivacy from Anonos
https://www.anonos.com/ : Anonos technology is ”cool” because it enables
the creation of re-linkable non-identifying privacy-enhanced data called
Variant Twins that enable lawful analytics, AI, ML, data sharing and combining.
Vision Enabler 5: Citrix
HDX
https://www.citrix.com/en-in/digital-workspace/hdx/
Combination of these 5 and few other ideas will
ultimately lead us to the VGB Platform. Will
soon come up with other document explaining
the vision and how exactly work on the vision to
gradually develop this Platform, which fixes
Data Science Efforts Globally and also
accelerates ASI Development.
To Be Continued…
I thank all the ideators, inventors,
companies, who come up with
these awesome enablers.
About me: https://www.linkedin.com/in/manoj-kumar-r-427b0b195/

Вам также может понравиться