If you like this presentation – show it...
Spring XD Glenn Renfro grenfro @pivotal.io @CPPWFS
420 Million Wearables 90% of enterprise data is unstructured 60-100 sensors in each car 22 Billion sensors by 2020 86% suspect data inaccuracy 30% revenue loss due to bad data quality 500 million tweets each day 2.3 Trillion GBs of each day Data Data Points: McKinsey, Twitter, Gartner, IBM
Batch and Streaming often handled by multiple platforms Fragmented Big Data Ecosystem Not all data Hadoop bound
“One stop shop for developing and deploying Big Data Applications” SPRING XD EXTREME DATA
Batch and Streaming often handled by multiple platforms Fragmented Big Data Ecosystem Not all data Hadoop bound Portable on-prem, YARN, EC2, PCF, Mesos, Docker etc. Easy to Use, Extend and Integrate with other Technologies Built on proven Spring EAI and Batch projects (Volume, Velocity, Veracity, and Variety) Unified Stream and Batch Operations Hadoop Batch Workflow Orchestration Predictive Analytics and Model Scoring Spring XD to Rescue
Spring XD - 10,000 Foot View
Create a stream with http as a source and hdfs as a sink. The hdfs —rollover is set to a small value so that we can read the file on hdfs.
Spring XD - Distributed Runtime Container State
Spring XD - Analytics Counters and Gauges Simple & Field Value Counter (how many tweets for #java) Aggregate Counter (how many tweets for #java in the week/day/hr) Gauge & Rich Gauge (how many requests / minute?) Abstract API implemented in Redis in-memory Predictive Model Evaluation JPMML Is this transaction fraudulent? What group does this user belong to? Interoperable with R, Rattle, KNIME, RapidMiner, MADLib
FILES Spring XD GemFire XD GemFire XD SPEED LAYER BATCH LAYER SERVING LAYER PCF - BOSH Service PCF - Apps MOBILE SENSORS SOCIAL
Unified runtime for both Real-time and Batch use cases Scalable, Distributed and Fault Tolerant Runtime Increased Productivity through out-of-the-box components Closed Loop Analytics through online (stream) and offline (batch) data Swiss-army knife of data movement and data pipelines Repeatable ‘turnkey’ solution for next generation data-centric use cases
Agility: Easy to Setup and Run Writing HTTP Data to HDFS …that simple! or or or
Spring XD on YARN Spring XD Running on YARN!
Even easier with PCF
Natural Fit: Reactive Streaming Pipelines
Deployment Manifest – Module Count http | doWork | hdfs http http doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties module.http.count=2, module.doWork.count=4, module.hdfs.count=3
Deployment Manifest – Module Placement http | doWork | hdfs http http doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties module.http.count=2, module.doWork.count=4, module.hdfs.count=3, module.http.criteria = groups.contains(‘WEB’)
Deployment Manifest – Data Partitioning http | doWork | hdfs http http doWork doWork doWork doWork hdfs hdfs hdfs stream deploy –name s1 --properties ... module.http.producer .partitionKeyExpression = payload.customerId doWork modules will always process the same set of customer IDs
Learn More Project: http://projects.spring.io/spring-xd/ GitHub: https://github.com/spring-projects/spring-xd/ Wiki: https://github.com/spring-projects/spring-xd/wiki Samples: https://github.com/spring-projects/spring-xd-samples