Big Data (2016 - 07 - 03 23 - 08 - 03 UTC)

Загружено:

Shreekanth Prabhu

0% нашли этот документ полезным (0 голосов)

11 просмотров2 страницы

Overview of Big Data

Оригинальное название

Big Data (2016_07_03 23_08_03 UTC)

Авторское право

Доступные форматы

DOCX, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Overview of Big Data

Авторское право:

Доступные форматы

Скачайте в формате DOCX, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

11 просмотров2 страницы

Big Data (2016 - 07 - 03 23 - 08 - 03 UTC)

Загружено:

Shreekanth Prabhu

Overview of Big Data

Авторское право:

Доступные форматы

Скачайте в формате DOCX, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 2

Поиск в документе

Just as Newtonian physics become useless at very high speeds, so do techniques for

processing data become non-performant with very large data sets.

Sorting is O(n log 10) meaning sorting 10 million records takes 13 times as long as
sorting 1 million records.

Mapping, that is taking a data object and transforming it to another data object, is O(n),
so ten times more records take ten times as much time.

Folding, or reducing, which takes a list of objects and reduce them to a single value is
also O(n).

Moore's law was sufficient to allow single processor/storage systems to scale with data. I
suspect that one sees an inflection point and slope increase with regards to data per user
with the internet in the mid-90s.

So, very large datasets have to be split up, which imposes latency costs via time costs in
network control and communication and via the overhead to divide data and recombine
the results. With very large distributed processors, the probability of failure during a
process increases, and the controller has to have the ability to requery a subset and the
retries add time to a job.

Further speed-ups are possible via caching, but caching also adds overhead in terms of
storage to contain the caches, time to check if a value is in the cache, and some form of
management in order to be sure that the cache is not stale when it matters if it is.

All of this added infrastructure is not necessary in the domains for which I wrangle data.
Access would be powerful enough, because the data sets are in the hundreds and not
the billions. I actually use postgresql which has more power than I need, but provides
more scalability than my clients will need and can be deployed on all the server
platforms.

I could transform the dataset management into these large-scale NoSql / map-reducing
frameworks - and I have been looking into them - but I'd be spending more time writing
for moving targets as SQL and RDBMS's are well-understood, but the new tools are still
figuring out where the sweet spots of interfaces, cost/benefit and CAP
(http://en.wikipedia.org/wiki/CAP_theorem) are found. But, I see this as adding hours to
my development time and the result would be achieving 0.33 seconds in response time
instead of 1.2 seconds for internal applications. I work with small businesses, they don't
need and won't pay for the jet-powered Pregnant Guppy
(http://en.wikipedia.org/wiki/Aero_Spacelines_Pregnant_Guppy) for their data transport.

I understand where we agree. Back in the day, the slow-and-steady mainframe, which
took up half the second floor, did the payroll overnight every other week. People did plan
their processes around the limits of the 100% capacity jobs, and as the limits have
increased, the size of the routine jobs increased. Even with the new hardware,
programmers quickly found themselves at the edge of the flat world, applying all their
creativity to keep the ship from going over into the realm of dragons. Think about
weather forecasting which gets better with more and more comprehensive station
reports. Seven day forecasts? Back when I was a young grown-up, unimaginable. And, I
imagine today, meteorologists have a firm grasp on which modeling jobs are out of
reach.

But my point is that something like the CAP theorem wouldn't even be thought of,
except that the largest datasets are now magnitudes larger than pre-internet datasets
and in constant flux. However the typical (plus/minus one standard deviation) datasets
are still manageable without adding the complexities of distributed processing.

If vendors and customers wish to label the suite of tools for handling tera- and peta-
record datasets as BigData, to distinguish from the stuff I need (for which the tools are
now given away), then I see why. These are not my grandfather's databases (which
would have been filing cabinets.)

VOLUME, VARIETY AND VERACITY OF DATA

Вам также может понравиться

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
От Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Рейтинг: 4 из 5 звезд
4/5 (5794)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
От Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Рейтинг: 4 из 5 звезд
4/5 (1090)
Never Split the Difference: Negotiating As If Your Life Depended On It
От Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Рейтинг: 4.5 из 5 звезд
4.5/5 (838)
Principles: Life and Work
От Everand
Principles: Life and Work
Ray Dalio
Рейтинг: 4 из 5 звезд
4/5 (599)
The Glass Castle: A Memoir
От Everand
The Glass Castle: A Memoir
Jeannette Walls
Рейтинг: 4.5 из 5 звезд
4.5/5 (1713)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
От Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Рейтинг: 4 из 5 звезд
4/5 (895)
Sing, Unburied, Sing: A Novel
От Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Рейтинг: 4 из 5 звезд
4/5 (1103)
Grit: The Power of Passion and Perseverance
От Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Рейтинг: 4 из 5 звезд
4/5 (588)
Shoe Dog: A Memoir by the Creator of Nike
От Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Рейтинг: 4.5 из 5 звезд
4.5/5 (537)
The Perks of Being a Wallflower
От Everand
The Perks of Being a Wallflower
Stephen Chbosky
Рейтинг: 4.5 из 5 звезд
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
От Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Рейтинг: 4.5 из 5 звезд
4.5/5 (345)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
От Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Рейтинг: 4.5 из 5 звезд
4.5/5 (474)
Bad Feminist: Essays
От Everand
Bad Feminist: Essays
Roxane Gay
Рейтинг: 4 из 5 звезд
4/5 (1016)
Her Body and Other Parties: Stories
От Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Рейтинг: 4 из 5 звезд
4/5 (821)
The Outsider: A Novel
От Everand
The Outsider: A Novel
Stephen King
Рейтинг: 4 из 5 звезд
4/5 (1839)
The Emperor of All Maladies: A Biography of Cancer
От Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Рейтинг: 4.5 из 5 звезд
4.5/5 (271)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
От Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Рейтинг: 4.5 из 5 звезд
4.5/5 (121)
Angela's Ashes: A Memoir
От Everand
Angela's Ashes: A Memoir
Frank McCourt
Рейтинг: 4.5 из 5 звезд
4.5/5 (440)
Brooklyn: A Novel
От Everand
Brooklyn: A Novel
Colm Tóibín
Рейтинг: 3.5 из 5 звезд
3.5/5 (1937)
The Little Book of Hygge: Danish Secrets to Happy Living
От Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Рейтинг: 3.5 из 5 звезд
3.5/5 (400)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
От Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Рейтинг: 3.5 из 5 звезд
3.5/5 (2259)
A Man Called Ove: A Novel
От Everand
A Man Called Ove: A Novel
Fredrik Backman
Рейтинг: 4.5 из 5 звезд
4.5/5 (4610)
The Art of Racing in the Rain: A Novel
От Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Рейтинг: 4 из 5 звезд
4/5 (4200)
A Tree Grows in Brooklyn
От Everand
A Tree Grows in Brooklyn
Betty Smith
Рейтинг: 4.5 из 5 звезд
4.5/5 (1929)
The Yellow House: A Memoir (2019 National Book Award Winner)
От Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Рейтинг: 4 из 5 звезд
4/5 (98)
Steve Jobs
От Everand
Steve Jobs
Walter Isaacson
Рейтинг: 4.5 из 5 звезд
4.5/5 (806)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
От Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Рейтинг: 4.5 из 5 звезд
4.5/5 (266)
The Woman in Cabin 10
От Everand
The Woman in Cabin 10
Ruth Ware
Рейтинг: 3.5 из 5 звезд
3.5/5 (2322)
Yes Please
От Everand
Yes Please
Amy Poehler
Рейтинг: 4 из 5 звезд
4/5 (1891)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
От Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Рейтинг: 3.5 из 5 звезд
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
От Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Рейтинг: 4.5 из 5 звезд
4.5/5 (234)
Fear: Trump in the White House
От Everand
Fear: Trump in the White House
Bob Woodward
Рейтинг: 3.5 из 5 звезд
3.5/5 (738)
Wolf Hall: A Novel
От Everand
Wolf Hall: A Novel
Hilary Mantel
Рейтинг: 4 из 5 звезд
4/5 (3811)
John Adams
От Everand
John Adams
David McCullough
Рейтинг: 4.5 из 5 звезд
4.5/5 (2409)
On Fire: The (Burning) Case for a Green New Deal
От Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Рейтинг: 4 из 5 звезд
4/5 (74)
The Light Between Oceans: A Novel
От Everand
The Light Between Oceans: A Novel
M.L. Stedman
Рейтинг: 4.5 из 5 звезд
4.5/5 (789)
The Unwinding: An Inner History of the New America
От Everand
The Unwinding: An Inner History of the New America
George Packer
Рейтинг: 4 из 5 звезд
4/5 (45)
Manhattan Beach: A Novel
От Everand
Manhattan Beach: A Novel
Jennifer Egan
Рейтинг: 3.5 из 5 звезд
3.5/5 (792)
The Constant Gardener: A Novel
От Everand
The Constant Gardener: A Novel
John le Carré
Рейтинг: 3.5 из 5 звезд
3.5/5 (104)
Rise of ISIS: A Threat We Can't Ignore
От Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Рейтинг: 3.5 из 5 звезд
3.5/5 (137)
Little Women
От Everand
Little Women
Louisa May Alcott
Рейтинг: 4 из 5 звезд
4/5 (104)
Colfax MR Series Compresor
Документ2 страницы
Colfax MR Series Compresor
invidiuo
Оценок пока нет
HUMSS - Introduction To World Religions & Belief Systems CG
Документ13 страниц
HUMSS - Introduction To World Religions & Belief Systems CG
Aliuqus SirJasper
89% (18)
Subaru
Документ7 страниц
Subaru
claude terizla
Оценок пока нет
CISF Manual Final Original
Документ17 страниц
CISF Manual Final Original
Vaishnavi Jayakumar
Оценок пока нет
Safety Bulletin 09 - Emergency Escape Breathing Device - Product Recall
Документ2 страницы
Safety Bulletin 09 - Emergency Escape Breathing Device - Product Recall
Muhammad
Оценок пока нет
OM CommandCenter OI SEP09 en
Документ30 страниц
OM CommandCenter OI SEP09 en
Gabriely Murilo
Оценок пока нет
Quotation of Suny PDF
Документ5 страниц
Quotation of Suny PDF
Haider King
Оценок пока нет
Importance of Skill Based Education-2994
Документ5 страниц
Importance of Skill Based Education-2994
João Neto
0% (1)
FHWA Guidance For Load Rating Evaluation of Gusset Plates in Truss Bridges
Документ6 страниц
FHWA Guidance For Load Rating Evaluation of Gusset Plates in Truss Bridges
Patrick Saint-Louis
Оценок пока нет
Lodge at The Ancient City Information Kit / Great Zimbabwe
Документ37 страниц
Lodge at The Ancient City Information Kit / Great Zimbabwe
citysolutions
Оценок пока нет
COK - Training Plan
Документ22 страницы
COK - Training Plan
ralph
Оценок пока нет
NHD Process Paper
Документ2 страницы
NHD Process Paper
api-203024952
100% (1)
Bcom (HNRS) Project Final Year University of Calcutta (2018)
Документ50 страниц
Bcom (HNRS) Project Final Year University of Calcutta (2018)
Balaji
100% (1)
Assessment of Embodied Carbon Emissions For Building Construc - 2016 - Energy An
Документ11 страниц
Assessment of Embodied Carbon Emissions For Building Construc - 2016 - Energy An
y4smani
Оценок пока нет
Lightning Arrester Lightningcontroller MC 125-B/Npe: Operation and Fields of Application
Документ2 страницы
Lightning Arrester Lightningcontroller MC 125-B/Npe: Operation and Fields of Application
Anas Basarah
Оценок пока нет
Toh736 - 84000 The Dharani of Parnasavari PDF
Документ24 страницы
Toh736 - 84000 The Dharani of Parnasavari PDF
James Lee
Оценок пока нет
JCP4 XDOBursting Engine
Документ13 страниц
JCP4 XDOBursting Engine
subhash221103
Оценок пока нет
Breastfeeding W Success Manual
Документ40 страниц
Breastfeeding W Success Manual
Nova Gave
Оценок пока нет
Tangerine - Breakfast Set Menu Wef 16 Dec Updated
Документ3 страницы
Tangerine - Breakfast Set Menu Wef 16 Dec Updated
developer lou
Оценок пока нет
Evidence Prove Discrimination
Документ5 страниц
Evidence Prove Discrimination
Renzo Jimenez
Оценок пока нет
40 Sink and Float
Документ38 страниц
40 Sink and Float
leandro hualverde
Оценок пока нет
Engleza Referat-Pantilimonescu Ionut
Документ13 страниц
Engleza Referat-Pantilimonescu Ionut
Ailenei Razvan
Оценок пока нет
Job Satisfaction Variable
Документ2 страницы
Job Satisfaction Variable
Anagha Pawar - 34
Оценок пока нет
Mathmatcs Joint Form Two
Документ11 страниц
Mathmatcs Joint Form Two
Nurudi juma
Оценок пока нет
Highlights ASME Guides Preheat PWHT I
Документ4 страницы
Highlights ASME Guides Preheat PWHT I
Arul Edwin Vijay Vincent
Оценок пока нет
ISA InTech Journal - April 2021
Документ50 страниц
ISA InTech Journal - April 2021
Ike Edmond
Оценок пока нет
Chapter 13 Carbohydrates
Документ15 страниц
Chapter 13 Carbohydrates
Shanna Sophia Pelicano
Оценок пока нет
SPC FD 00 G00 Part 03 of 12 Division 06 07
Документ236 страниц
SPC FD 00 G00 Part 03 of 12 Division 06 07
marco.w.orascom
Оценок пока нет
Catheter Related Infections
Документ581 страница
Catheter Related Infections
hardbone
Оценок пока нет
Sundar Pichai PDF
Документ6 страниц
Sundar Pichai PDF
Himanshi Patle
100% (1)