'

Thinking Big

Понравилась презентация – покажи это...





Слайд 0

Thinking Big An Introduction to Big Data


Слайд 1

About Me Shawn Hermans Data Engineer/Scientist Technology consultant Physics, math, data geek


Слайд 2

About this Talk Non-technical introduction to Big Data Not focused on any technology or platform Focus on concepts


Слайд 3

Should you believe the hype?


Слайд 4

No need for scientific method Predict disease outbreaks before the CDC Cure cancer Innovating healthcare Solve world hunger Bring about world peace Big Data Promises


Слайд 5


Слайд 6

Big Data Criticism Garbage in, Garbage out Ignores the role of the scientific method Lots of questions don’t require large amounts of data to get good stats Privacy issues


Слайд 7

Big Data is just another way to think about data


Слайд 8

Mental Models “A mental model is simply a representation of an external reality inside your head. Mental models are concerned with understanding knowledge about the world.” - Farnam Street Blog


Слайд 9

Examples Occam's razor Mind maps Law of supply and demand Never get in a land war in Asia


Слайд 10

All models are wrong, but some are useful


Слайд 11

Relational Resistance Resistance to big data concepts, technologies, and techniques because of belief that the relational model is the only way to think about data. See also: Theory induced blindness


Слайд 12


Слайд 13

Data Mental Models Relational Linked Object Oriented Geospatial Temporal Semantic Event Based Data as Code Bayesian Unstructured


Слайд 14

What is Big Data?


Слайд 15

“Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” According to Gartner


Слайд 16

According to Me Big data is the Bazaar to traditional data’s Cathedral


Слайд 17

Cathedral and Bazaar Traditional Data Clean Top down Carefully collected Scales vertically One true way Big Data Disorderly Bottom up Randomly collected Scales horizontally More than one way


Слайд 18

Big Data Differences Relational Normalization ACID SQL/Query Structured/Schema Big Data Denormalization BASE MapReduce/Other Loosely Structured


Слайд 19

Integrating all available data is the promise of Big Data


Слайд 20

Why should you care?


Слайд 21


Слайд 22

Information as an Asset Target specific customer's needs rather than broad segments Just-in-time inventory management Evaluating demand for product Predict and track traffic patterns


Слайд 23

Big Data and You What information do you have, that no one else has? Can you easily integrate your data or is it locked in silos? What data don’t you collect? What data don’t you archive?


Слайд 24

Big Data Technology


Слайд 25

Big Data Platforms Cloud AWS Google Microsoft Hadoop Cloudera MapR Hortonworks This isn’t an all inclusive list, but a sample of the big players in the space.


Слайд 26

Big Data Stack Batch Processing Data Collection SQL/Query Search Machine Learning Serialization Security Stream Processing File Storage Resource management Online NoSQL Data Pipeline


Слайд 27


Слайд 28

What about data science?


Слайд 29

Data science is statistics on a Mac A data scientist is a statistician who lives in San Francisco Person who is better at statistics than any software engineer and better at software engineering than any statistician. What IS Data Science?


Слайд 30


Слайд 31

The need for Data Science There is a LOT of data Too much data for people to look at it all Probabilistic models help extract signal from the noise Need to automate the analysis and exploitation of data


Слайд 32

Big Data has its limits


Слайд 33

Black Swans and Big Data There are fundamental limits to prediction Hard to predict rare events where no prior data exists (i.e. Black Swans) Complex systems often have feedback loops (e.g. stock market)


Слайд 34

What’s next?


Слайд 35

Business Identify some unresolved questions Figure out what data could answer those questions Pick the easiest and test out your hypothesis Getting Started Technology Pick a technology you know or want to learn Pick a platform Pick a data set and identify some basic problems to solve


Слайд 36

My Info Twitter: @shawnhermans Github: github.com/shawnhermans Blog: http://shawnhermans.github.io/ (In Progress) Slideshare: www.slideshare.net/shawnhermans/ Quora: http://www.quora.com/Shawn-Hermans


Слайд 37

Backup Slides


Слайд 38


Слайд 39

The Fourth Quadrant and the Failure of Statistics


Слайд 40

Soothsayer Simple HTTP/JSON API for training/classifying data Lots of built in classifier statistics https://github.com/shawnhermans/soothsayer


Слайд 41


×

HTML:





Ссылка: