How Can Startups Leverage Big Data?

If you like this presentation – show it...

Slide 0

How Can Startups Leverage Big Data? Trudging Through Myth To Discover Real Value

Slide 1

Mostly Unstructured Data Client Data Customer Data Social Data Driving towards insight 2 What is Big Data? www.rackspace.com

Slide 2

RACKSPACE® HOSTING | WWW.RACKSPACE.COM “Big Data is any dataset not suited to be processed by traditional legacy technology.”

Slide 3

The Three V’s 4 V3C Mining social data for sentiment Analyzing web clickstreams Analyzing log data for security breaches Telemetry from sensors and machines eCommerce predictive analytics

Slide 4

The Three V’s 5 V3C Mining social data for sentiment Analyzing web clickstreams Analyzing log data for security breaches Telemetry from sensors and machines eCommerce predictive analytics

Slide 5

Evolution of Data Data Complexity Time

Slide 6

Big Data is now much more than hype – real customers with real use cases are adopting daily Recent survey found that business leaders expected the deployment of Hadoop to result in a 3-year benefit ranging from $5M to $50M+ Close to 100% of business leaders have already deployed or plan to deploy ApacheTM Hadoop® 7 Big Data is Here to Stay www.rackspace.com "Enterprises are showing increasing interest in the value provided by the large-scale data processing that Hadoop and Spark can provide, but can be wary of the upfront cost and complexity of setting up a cluster to prove that value. Managed services such as [OnMetalTM Cloud Big Data Platform] enable enterprises to focus their energies on generating business insights rather than configuring and managing infrastructure.”  Matt Aslett 451 Research Director, Data Platforms and Analytics

Slide 7

To learn more about your customers To optimize your business processes To become a more targeted marketer Interact with users and customers in real time Add additional revenue and services 8 Why leverage Big Data? www.rackspace.com

Slide 8

9 www.rackspace.com What Is the Cost of Lacking a Big Data Strategy? Today every company can be a data company Successful companies will be data companies Under Armour isn’t just a fitness company – they’re a data company

Slide 9

Open Source Able to process petabytes of data quickly Developed at Google, implemented at scale at Yahoo Handles unstructured data very well One of the fastest growing eco-systems 10 Hadoop Has Emerged As A Leader In Distributed Data Sets

Slide 10

Fundamentals of Hadoop v1 11 Data Services Core Services HDFS Distributed File System HBase Distributed, scalable, non relational database HCatalog Metadata and table management system Pig Data flow scripting language Hive DW analysis layer through HiveQL (SQL-like) queries MapReduce Data processing framework Operational Services Flume Log data aggregation and movement Sqoop Bulk data transfer from and to relational DB

Slide 11

Biggest impediments include: Insufficient skills in-house to design and deploy Designing and deploying takes too long High cost of physical infrastructure 12 Hadoop is Hard www.rackspace.com

Slide 12

Original focus on batch processing Streaming and interactive use cases emerging Shift from jobs that take hours to seconds Impala, Spark, and Presto are emerging tools Hadoop is Changing

Slide 13

14 But what are these companies doing with Big Data? www.rackspace.com Gaining Insights!!!

Slide 14

What are Companies Doing with Hadoop? 15 www.rackspace.com

Slide 15

Application Underpinning Mobile Enterprises consider support for mobility and productivity enhancement to mobile workers as their top-priority new application category, according to a recent survey by CIMI Corp. That means most companies that have adopted, or are adopting, Hadoop will likely have to integrate the framework with mobile applications. Data Aggregation The two big use cases we're seeing for Impala are aggregating data in Hadoop to present analytic dashboards and improving data-discovery applications by providing faster performance than Hive," Alex Gutow, Cloudera's product marketing manager. Dashboarding Users are increasingly choosing Hadoop as the underlying technology to power interactive dashboarding capability. Internet of Things As tech wearables and generated devices start to become common-day solutions the backend of your application needs to be built to address these concerns and can handle the velocity and volume of data being produced by the appliance. People are building net-new applications with Hadoop as their database 16 www.rackspace.com

Slide 16

Clickstream Analysis Your home page looks great. But how do you move customers on to bigger things—like submitting a form or completing a purchase? Get more granular with customer segmentation. Hadoop makes it easier to analyze, visualize and ultimately change how visitors behave on your website. A clickstream is a series of page requests. Every page requested generates a signal. These signals can be graphically represented for clickstream reporting. The main point of clickstream tracking is to give webmasters insight into what visitors on their site are doing. Clickpath The study of human clicks on a website Tracking Cookies Tool used to understand and track online activity Data Mining Collecting data from websites and online properties Understand how your users are behaving on your website and optimize your experience 17 www.rackspace.com

Slide 17

Sentiment Analysis Your customers are talking. With Hadoop, you can mine Twitter, Facebook and other social media conversations for sentiment data about you and your competition, and use it to make targeted, real-time decisions that increase market share. Sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. Social Media Feeds Many companies are now capturing entire Twitter and Facebook feeds to analyze. Data Mining Users are searching the web for comments, blogs, and whitepapers that can point to overall sentiment E-Communities Forums, user groups, Heroku Find out what your users are saying about you. Are they happy? Does your product make them a promoter? 18 www.rackspace.com

Slide 18

Machine Learning Your machines know things. From out in the field to the assembly line floor—machines stream low-cost, always-on data. Hadoop makes it easier for you to store and refine that data and identify meaningful patterns, providing you with the insight to make proactive business decisions. Machine Learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions. Pattern Recognition Users are building clusters to detect patterns and identify anomalies in data that these devices are generating Decision Tree Allows the system to take action and make choices based on the data Predictive Modeling Aims to automate the most common mistakes and errors as part of a preventative model Interactive devices are now streamlining things like maintenance and troubleshooting 19 www.rackspace.com

Slide 19

Fraud Detection Fraud is a billion-dollar business and it is increasing every year. The PwC global economic crime survey of 2009 suggests that close to 30% of companies worldwide have reported being victims of fraud in the past year. Fraud involves one or more persons who intentionally act secretly to deprive another of something of value, for their own benefit. Fraud is as old as humanity itself and can take an unlimited variety of different forms. However, in recent years, the development of new technologies has also provided further ways in which criminals may commit fraud. Rules-Based Detection Even though internet hackers have become better at tricking online systems, they still exhibit very calculated behavior. Machine Learning The aggregation of data points can help you collect more info about the potential sale and detect if it might be fraud. Users Tagging and Tracing Once users are flagged as fraudulent, their repeated attempts can be prevented. Users are detecting fraudulent online behavior and rejecting those users before they commit an offense 20 www.rackspace.com

Slide 20

Server Log Data Security breaches happen. And when they do, your server logs may be your best line of defense. Hadoop takes server-log analysis to the next level by speeding and improving security forensics and providing a low cost platform to show compliance. Generally small files that track user information inside a confined environment; often used to meet compliance or troubleshoot an incident. Scrub Data for Forensics If a security incident occurs, it is important to remediate fast Identify Anomalies Anti-patterns are often the first sign Discover Trends Some types of errors might become common; learn to identify them Actively Automate to Solve Issues with Log Files Many of these errors can be proactively eliminated through the use of automation. Aggregate server logs to find trends and anomalies in your security records 21 www.rackspace.com

Slide 21

360 View of Customer – Dashboards and Analytics Whenever a customer interacts with an organization, it is vital that the richness of information available on that customer informs and guides the processes that will help to maximize their experience, while simultaneously making the interaction as effective and efficient as possible. This includes everything from avoiding repetition or rekeying of information, to viewing customer history, establishing context and initiating desired actions. A total 360 view often contains 3 views: The Past Understanding how your users act in the past lets you understand who they are and serve them relevant content and products The Present Where are users coming from? What is their experience on your site right now? Do they need help? The Future Did they buy? Can we serve them more information to help their choice? Can we market to them better? Create in-depth personas for your customers based on how they are actually behaving. 22 www.rackspace.com

Slide 22

What’s Next? Interactive Processing! What if instead of reacting to behavior we can engage virtually with the user to inhibit behavior? This is called interactive processing and it takes input from humans and reacts based on patterns and algorithms. The quicker we can server up this interaction, to the user the better equipped we are to inhibit their behavior! Interact with customers in real-time offering suggestions and inhibiting behavior 23 www.rackspace.com source: Teach-ICT.com

Slide 23

Introducing support of Apache SparkTM Apache Spark enables enterprises to combine the breadth of structured and unstructured data with the speed of in-memory processing to build streaming, machine learning, and graph-optimized applications that allow businesses to take action at the speed of insight. 24 Apache Spark www.rackspace.com

Slide 24

Deeper Integration with SQL Workloads Streaming Applications Machine Learning Iterative Processing Real-time Graphical Dashboards 25 New Use Cases www.rackspace.com

Slide 25

YES 26 Does the delivery method matter? www.rackspace.com

Slide 26

Choose The Best Deployment Model 27

Slide 27


Slide 28

Advantages of storing data in the cloud: 29

Slide 29

Dedicated Hosting No Capex Investment Choose new hardware and software versioning easily Rely on extended support personnel Increased security options Concurrent and predictable performance On-Premise Control Data Access Integrate with core mainframe and systems Build your own IP Control every aspect of design and operation 30 www.rackspace.com Advantages of Dedicated Hosting/On-Premise

Slide 30

31 www.rackspace.com The Trade Off... Custom Built Consistent Available Performant Purpose Built Elastic Flexible On-Demand

Slide 31

32 www.rackspace.com OnMetal Lets You Scale Like the Internet Giants “Rackspace Cloud, because of its single-tenant OnMetal line, is the only place on Earth where you can enjoy Facebook/Google-style infrastructure rented by the hour.” -Ev Kontsevoy Director, Product Rackspace

Slide 32

Benefits of Outsourced Hosting

Slide 33

34 www.rackspace.com The Level of Management You Need Only you can decide what model is best for you! DIY Platform Managed Service Turnkey Service

Slide 34

Data as a Service: more time building, less time managing databases For some businesses, database or infrastructure management IS core to the business For most software-based businesses, database or infrastructure management represents time and resources not spent building the application You must answer for yourself: are you in the business of managing infrastructure, or in the business of [your market here]?

Slide 35

36 www.rackspace.com

Slide 36

37 www.rackspace.com

Slide 37

38 www.rackspace.com

Slide 38

39 Rackspace Offerings for the Data Tier www.rackspace.com Infrastructure for Data Managed Offerings of Most Popular Big Data, SQL, & NoSQL Databases Managed Database Services for Production Apps Cloud IaaS Get started fast Dedicated Hosting Predictable costs & performance OnMetal Cloud Elasticity & Dedicated Performance

Slide 39

Sign up for a free trial Want to know more? Read my blog and check out the articles 40 What’s Next? www.rackspace.com

Slide 40

41 Questions? www.rackspace.com

Slide 41