What is the Present State of the Art Of In-Memory Analytics?

If you like this presentation – show it...

Slide 0

What is the Present State of the Art Of In-Memory Analytics? Timo Elliott, Innovation Evangelist timoelliott.com

Slide 1

Disclaimer “i think you’ll find it’s a bit more complicated than that.”

Slide 2

A Bit of History

Slide 3

LEO: Lyon’s Electronic Office, 1951 Sixty-four 5ft-long mercury tubes, each weighing half a ton, were used to provide a massive 8.75 Kb of memory (i.e. one hundred-thousandth of a today’s entry-level iPhone).

Slide 4

1980s – first in-memory BI tools Usefulness limited by high cost of memory and limitations of 16bit memory addressing 640KB max memory

Slide 5

1995: Windows 95 & 32-bit Architectures Qlikview, TimesTen, and others take advantage of new 32bit memory addressing to provide in-memory analytics

Slide 6

Complex Event Processing Sensor readings – 10’s of thousands per second Virtually no useful information in a single isolated event history e.g. Compare variance of trends across multiple sensors against historical norms Event window – e.g. 30 min Alert Extracting insight from events

Slide 7

Complex Event Processing Tradtional BI: “How many Fraudulent credit card transactions occurred last week in Madrid?” 1 2 3 4 5 6 7 8 9 time Complex Event Processing: “when three credit card authorizations for the same card occur in any five seconds window, deny the requests and check for fraud.” Continuous Queries

Slide 8

In-Memory and The Internet of Things CEP Engine Studio Input Streams Sensors Messages Transactions Market data Clicks … Alerts Dashboards Applications adapters

Slide 9

“Traditional” Business Intelligence Slow Painful Expensive Copy ETL

Slide 10

It’s Like An Onion… The more layers there are, the more it makes you cry…

Slide 11

What Was The Problem? Slow Disks & CPUs I/O Bottleneck Expensive Memory Optimized for Transactions BI is an Afterthought 30 Year-Old Database Design Principles

Slide 12

Why Talk About In-Memory?

Slide 13

Analysts Recommend In-Memory . “An in-memory data platform offers more than performance benefits” “Recommendations: Invest in an in-memory data platform to gain competitive edge” “In-Memory Database Is Gaining Momentum Across All Use Cases” “In-Memory Delivers Extreme Performance And Scalability” “In-Memory Data Platform Is No Longer An Option — It’s A Necessity!”

Slide 14

Companies Like Yours Are Implementing In-Memory 32% run in-memory databases at their location today 75% expect to expand their in-memory use in the next 3 years Source: 2014 DBTA survey of IT and data managers Top Uses Top Benefits

Slide 15

Database vendors are investing in in-memory The Forrester Wave: In-Memory Database Platforms, Q3 ‘15

Slide 16

All Analytics Vendors Now Support In-Memory To Some Extent Oracle Database In-Memory Option “The Oracle Database In-Memory option dramatically accelerates the performance of analytic queries by storing data in a highly optimized columnar in-memory format.” Microsoft SQL Server In-Memory OLTP ‘When data lives totally in memory, we can use much, much simpler data structures. When a table is declared memory-optimized, all of its records live in memory.” DB2 with BLU Acceleration “IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data.“ Qlik “In-memory indexing automatically builds and maintains all data relationships from multiple sources for unrestricted exploration” SAP HANA “A good example of a modern in-memory database technology is SAP's HANA platform. “ Teradata “Teradata uses a hybrid approach to in-memory that intelligently puts the right data in memory to deliver high-speed in-memory performance at a fraction of the cost of putting all data in memory.“ Tableau “The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory.“ Spark “Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.“

Slide 17

What Is In-Memory? And why now?

Slide 18

What Is In-Memory? Data access times of various storage types relative to RAM (logarithmic scale) RAM is 300,000 times faster than hard disks CPU register is 61 million times faster than hard disks

Slide 19

In-Memory Databases vs. Caching “Much of the work that is done by a conventional, disk-optimized RDBMS is done under the assumption that data primarily resides on disk. Even when a disk-based RDBMS has been configured to hold all of its data in main memory, its performance is hobbled by assumptions of disk-based data residency. When the assumption of disk-residency is removed, complexity is dramatically reduced.” - Oracle TimesTen Overview

Slide 20

In-Memory Computing Costs have Plummeted Turning Torso: 190m Cost of 1 Mb of memory in 2000: ?$1

Slide 21

In-Memory Computing Costs have Plummeted Cost of 1 Mb of memory today: ? ? cent 75cm And shrinking, and shrinking, and shrinking…. IKEA MICKE Skrivbord 399 kr

Slide 22

Prices Continue to Slide DRAM production costs drop by 30% every 12 months

Slide 23

In-Memory Computing Copy ETL Up to 1,000x faster No optimizations required

Slide 24

Row vs. Column Databases My Filing System My Wife’s Filing System Row-based Column-based

Slide 25

Column Databases Copy ETL Up to 1,000x faster More data in less space

Slide 26

Massively Parallel Systems E.g. Netezza technology now part of IBM PureSystems E.g. Greenplum, now part of EMC

Slide 27

Column Stores, Compression, and Parallel Processing E.g. DB2 with BLU acceleration

Slide 28

“In-Chip” Processing E.g. SiSense Vector-based instructions Cache-optimized Decompression Close collaboration between in-memory software vendors and chip developers (e.g. SAP & Intel Haswell)

Slide 29

Massively Parallel Hardware Copy ETL Query Up to 1,000x faster Optimized for hardware

Slide 30

In-Database Processing E.g. SAS & Teradata

Slide 31

Move Processing to the Data Operational (OLTP) Analytics (OLAP) Planning Predictive Text Search Spatial Processing Engines Relational Stores Row based Columnar ETL Data Quality Document Store Object Graph Store

Slide 32

In-Database Analytics Copy ETL Query Up to 1,000x faster Push processing down to dedicated hardware, less traffic

Slide 33

Real-Time Data Copy ETL Real-time replication — why have a separate operational data store?

Slide 34

Transactions ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. ACID ACID compliance

Slide 35

In-Memory Enterprise Applications E.g. Microsoft SQL Server In-Memory OLTP

Slide 36

In-Memory Enterprise Applications E.g. SAP S/4 HANA

Slide 37

Hybrid Transactional Analytical Processing Copy Use a single platform for both analytics and applications

Slide 38

Virtuous Circle of Technology In-Memory Columnar Databases Hardware Acceleration Calculation Engine Columnar storage increases the amount of data that can be stored in limited memory (compared to disk) Column databases enable easier parallelization of queries In-memory processing gives more time for relatively slow updates to column data In-memory allows sophisticated calculations in real-time Hardware acceleration makes sophisticated calculations possible Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides

Slide 39

Apache Spark MAP Reduce HDFS MAP Reduce Data Source 2 map() join() cache() transform Hadoop V1 Spark

Slide 40

Lots of Support for Spark

Slide 41

YARN HDFS HANA-Spark Adapter for improved performance between distributed systems Compiled queries enable applications & data analysis to work more efficiently across nodes Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data Compiled Queries Spark Adapter Drill Downs SAP HANA in-memory platform Vora Spark Vora Spark Vora Spark HANA-Spark Adaptor HANA Smart Data Access, UDFs, Others Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice, Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data. Spark Extensions SAP HANA Vora

Slide 42

Persistence & Failover

Slide 43

Next-Generation Chips Are On Their Way NVM non-volatile memory

Slide 44

Scale Up 4,294,967,296x 256x 16 bit 32 bit 64 bit 64 kilobytes 4 gigabytes 16 exabytes Directly addressable memory

Slide 45

What About Scale? There are now systems with more than half a petabyte of in-memory, and growing…

Slide 46

Balancing Data Temperature and Costs Hot Warm Cold Data is accessed frequently Data is not accessed frequently Data is only accessed sporadically Volume of data Performance (and direct cost) Many different solutions possible

Slide 47

What Type of In-Memory Is The Right One?   Complex ROI calculations Data volumes Relative costs (?) Cost of storage Value of speed Value of agility

Slide 48

Fast-Moving Market

Slide 49

Hybrid vs. Pure In-Memory Tradeoffs data duplication vs single source replicated vs real-time unpredictable response times vs consistent response times

Slide 50

Top Benefits

Slide 51

Speed “If things seem under control, you’re just not going fast enough.” Mario Andretti

Slide 52

Real-Time Operations Instead of analyzing the shards of glass after the accident, what if you could catch the vase BEFORE it hit the ground?

Slide 53

Agility (Speed of Change)

Slide 54

Simplification = Lower Costs “In-memory changes the cost equation through simplification. It can help save costs on hardware and software, as well as reduce labor required for administration and development needs. Based on a composite cost model, an in-memory platform can save an organization 37% across hardware, software, and labor costs, depending on various factors.”

Slide 55

Lower Costs “Don’t let somebody say to you we can’t go in-memory because it’s so much more money. Acquisition costs may be higher. If you calculate out a TCO, it’s going to be less.” Donald Feinberg, Gartner

Slide 56

The price of light… …is less than the cost of darkness ROI = Return On Ignorance?

Slide 57

New, Simpler Infrastructures and Business Models Weissbeerger Beverage Analytics

Slide 58


Slide 59

Myths & Facts It’s a niche technology to run analytics faster It has been around since late 1990s The main users of in-memory analytics are SMBs Entire industries (SaaS, social networks, financial trading, online gaming) would not exist as we know them today without in-memory computing More than 50 software vendors deliver in-memory technology Small number of in-memory vendors Only for deep-pocketed organizations New and unproven Myths Facts

Slide 60

Business Impact of In-Memory Computing Reducing applications running cost via data base/legacy applications offloading Improving transactional applications performance Enabling horizontal, elastic scalability (scale up/down) Boosting response time in analytical applications Low latency (<1 microsecond) application messaging Dramatically shortening batch processes execution time Enabling real-time, "self-service" business intelligence and unconstrained data exploration Detecting correlations/patterns across million of events in "a blink of an eye" Supporting "big data" (big data needs big memory) Running transactional and analytical applications on the same physical dataset Run the business Grow the business Transform the business Opportunities: Business Impact

Slide 61

In-Memory Changes Everything “In-memory computing will have a long-term, disruptive impact by radically changing users’ expectations, application design principles, products’ architecture and vendors’ strategy.” — Gartner

Slide 62

Thank you!