What is the Present State of the Art Of In-Memory Analytics?

Понравилась презентация – покажи это...

Слайд 0

What is the Present State of the Art Of In-Memory Analytics? Timo Elliott, Innovation Evangelist timoelliott.com

Слайд 1

Disclaimer “i think you’ll find it’s a bit more complicated than that.”

Слайд 2

A Bit of History

Слайд 3

LEO: Lyon’s Electronic Office, 1951 Sixty-four 5ft-long mercury tubes, each weighing half a ton, were used to provide a massive 8.75 Kb of memory (i.e. one hundred-thousandth of a today’s entry-level iPhone).

Слайд 4

1980s – first in-memory BI tools Usefulness limited by high cost of memory and limitations of 16bit memory addressing 640KB max memory

Слайд 5

1995: Windows 95 & 32-bit Architectures Qlikview, TimesTen, and others take advantage of new 32bit memory addressing to provide in-memory analytics

Слайд 6

Complex Event Processing Sensor readings – 10’s of thousands per second Virtually no useful information in a single isolated event history e.g. Compare variance of trends across multiple sensors against historical norms Event window – e.g. 30 min Alert Extracting insight from events

Слайд 7

Complex Event Processing Tradtional BI: “How many Fraudulent credit card transactions occurred last week in Madrid?” 1 2 3 4 5 6 7 8 9 time Complex Event Processing: “when three credit card authorizations for the same card occur in any five seconds window, deny the requests and check for fraud.” Continuous Queries

Слайд 8

In-Memory and The Internet of Things CEP Engine Studio Input Streams Sensors Messages Transactions Market data Clicks … Alerts Dashboards Applications adapters

Слайд 9

“Traditional” Business Intelligence Slow Painful Expensive Copy ETL

Слайд 10

It’s Like An Onion… The more layers there are, the more it makes you cry…

Слайд 11

What Was The Problem? Slow Disks & CPUs I/O Bottleneck Expensive Memory Optimized for Transactions BI is an Afterthought 30 Year-Old Database Design Principles

Слайд 12

Why Talk About In-Memory?

Слайд 13

Analysts Recommend In-Memory . “An in-memory data platform offers more than performance benefits” “Recommendations: Invest in an in-memory data platform to gain competitive edge” “In-Memory Database Is Gaining Momentum Across All Use Cases” “In-Memory Delivers Extreme Performance And Scalability” “In-Memory Data Platform Is No Longer An Option — It’s A Necessity!”

Слайд 14

Companies Like Yours Are Implementing In-Memory 32% run in-memory databases at their location today 75% expect to expand their in-memory use in the next 3 years Source: 2014 DBTA survey of IT and data managers Top Uses Top Benefits

Слайд 15

Database vendors are investing in in-memory The Forrester Wave: In-Memory Database Platforms, Q3 ‘15

Слайд 16

All Analytics Vendors Now Support In-Memory To Some Extent Oracle Database In-Memory Option “The Oracle Database In-Memory option dramatically accelerates the performance of analytic queries by storing data in a highly optimized columnar in-memory format.” Microsoft SQL Server In-Memory OLTP ‘When data lives totally in memory, we can use much, much simpler data structures. When a table is declared memory-optimized, all of its records live in memory.” DB2 with BLU Acceleration “IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data.“ Qlik “In-memory indexing automatically builds and maintains all data relationships from multiple sources for unrestricted exploration” SAP HANA “A good example of a modern in-memory database technology is SAP's HANA platform. “ Teradata “Teradata uses a hybrid approach to in-memory that intelligently puts the right data in memory to deliver high-speed in-memory performance at a fraction of the cost of putting all data in memory.“ Tableau “The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory.“ Spark “Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.“

Слайд 17

What Is In-Memory? And why now?

Слайд 18

What Is In-Memory? Data access times of various storage types relative to RAM (logarithmic scale) RAM is 300,000 times faster than hard disks CPU register is 61 million times faster than hard disks

Слайд 19

In-Memory Databases vs. Caching “Much of the work that is done by a conventional, disk-optimized RDBMS is done under the assumption that data primarily resides on disk. Even when a disk-based RDBMS has been configured to hold all of its data in main memory, its performance is hobbled by assumptions of disk-based data residency. When the assumption of disk-residency is removed, complexity is dramatically reduced.” - Oracle TimesTen Overview

Слайд 20

In-Memory Computing Costs have Plummeted Turning Torso: 190m Cost of 1 Mb of memory in 2000: ?$1

Слайд 21

In-Memory Computing Costs have Plummeted Cost of 1 Mb of memory today: ? ? cent 75cm And shrinking, and shrinking, and shrinking…. IKEA MICKE Skrivbord 399 kr

Слайд 22

Prices Continue to Slide DRAM production costs drop by 30% every 12 months

Слайд 23

In-Memory Computing Copy ETL Up to 1,000x faster No optimizations required

Слайд 24

Row vs. Column Databases My Filing System My Wife’s Filing System Row-based Column-based

Слайд 25

Column Databases Copy ETL Up to 1,000x faster More data in less space

Слайд 26

Massively Parallel Systems E.g. Netezza technology now part of IBM PureSystems E.g. Greenplum, now part of EMC

Слайд 27

Column Stores, Compression, and Parallel Processing E.g. DB2 with BLU acceleration

Слайд 28

“In-Chip” Processing E.g. SiSense Vector-based instructions Cache-optimized Decompression Close collaboration between in-memory software vendors and chip developers (e.g. SAP & Intel Haswell)

Слайд 29

Massively Parallel Hardware Copy ETL Query Up to 1,000x faster Optimized for hardware

Слайд 30

In-Database Processing E.g. SAS & Teradata

Слайд 31

Move Processing to the Data Operational (OLTP) Analytics (OLAP) Planning Predictive Text Search Spatial Processing Engines Relational Stores Row based Columnar ETL Data Quality Document Store Object Graph Store

Слайд 32

In-Database Analytics Copy ETL Query Up to 1,000x faster Push processing down to dedicated hardware, less traffic

Слайд 33

Real-Time Data Copy ETL Real-time replication — why have a separate operational data store?

Слайд 34

Transactions ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. ACID ACID compliance

Слайд 35

In-Memory Enterprise Applications E.g. Microsoft SQL Server In-Memory OLTP

Слайд 36

In-Memory Enterprise Applications E.g. SAP S/4 HANA

Слайд 37

Hybrid Transactional Analytical Processing Copy Use a single platform for both analytics and applications

Слайд 38

Virtuous Circle of Technology In-Memory Columnar Databases Hardware Acceleration Calculation Engine Columnar storage increases the amount of data that can be stored in limited memory (compared to disk) Column databases enable easier parallelization of queries In-memory processing gives more time for relatively slow updates to column data In-memory allows sophisticated calculations in real-time Hardware acceleration makes sophisticated calculations possible Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides

Слайд 39

Apache Spark MAP Reduce HDFS MAP Reduce Data Source 2 map() join() cache() transform Hadoop V1 Spark

Слайд 40

Lots of Support for Spark

Слайд 41

YARN HDFS HANA-Spark Adapter for improved performance between distributed systems Compiled queries enable applications & data analysis to work more efficiently across nodes Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data Compiled Queries Spark Adapter Drill Downs SAP HANA in-memory platform Vora Spark Vora Spark Vora Spark HANA-Spark Adaptor HANA Smart Data Access, UDFs, Others Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice, Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data. Spark Extensions SAP HANA Vora

Слайд 42

Persistence & Failover

Слайд 43

Next-Generation Chips Are On Their Way NVM non-volatile memory

Слайд 44

Scale Up 4,294,967,296x 256x 16 bit 32 bit 64 bit 64 kilobytes 4 gigabytes 16 exabytes Directly addressable memory

Слайд 45

What About Scale? There are now systems with more than half a petabyte of in-memory, and growing…

Слайд 46

Balancing Data Temperature and Costs Hot Warm Cold Data is accessed frequently Data is not accessed frequently Data is only accessed sporadically Volume of data Performance (and direct cost) Many different solutions possible

Слайд 47

What Type of In-Memory Is The Right One?   Complex ROI calculations Data volumes Relative costs (?) Cost of storage Value of speed Value of agility

Слайд 48

Fast-Moving Market

Слайд 49

Hybrid vs. Pure In-Memory Tradeoffs data duplication vs single source replicated vs real-time unpredictable response times vs consistent response times

Слайд 50

Top Benefits

Слайд 51

Speed “If things seem under control, you’re just not going fast enough.” Mario Andretti

Слайд 52

Real-Time Operations Instead of analyzing the shards of glass after the accident, what if you could catch the vase BEFORE it hit the ground?

Слайд 53

Agility (Speed of Change)

Слайд 54

Simplification = Lower Costs “In-memory changes the cost equation through simplification. It can help save costs on hardware and software, as well as reduce labor required for administration and development needs. Based on a composite cost model, an in-memory platform can save an organization 37% across hardware, software, and labor costs, depending on various factors.”

Слайд 55

Lower Costs “Don’t let somebody say to you we can’t go in-memory because it’s so much more money. Acquisition costs may be higher. If you calculate out a TCO, it’s going to be less.” Donald Feinberg, Gartner

Слайд 56

The price of light… …is less than the cost of darkness ROI = Return On Ignorance?

Слайд 57

New, Simpler Infrastructures and Business Models Weissbeerger Beverage Analytics

Слайд 58


Слайд 59

Myths & Facts It’s a niche technology to run analytics faster It has been around since late 1990s The main users of in-memory analytics are SMBs Entire industries (SaaS, social networks, financial trading, online gaming) would not exist as we know them today without in-memory computing More than 50 software vendors deliver in-memory technology Small number of in-memory vendors Only for deep-pocketed organizations New and unproven Myths Facts

Слайд 60

Business Impact of In-Memory Computing Reducing applications running cost via data base/legacy applications offloading Improving transactional applications performance Enabling horizontal, elastic scalability (scale up/down) Boosting response time in analytical applications Low latency (<1 microsecond) application messaging Dramatically shortening batch processes execution time Enabling real-time, "self-service" business intelligence and unconstrained data exploration Detecting correlations/patterns across million of events in "a blink of an eye" Supporting "big data" (big data needs big memory) Running transactional and analytical applications on the same physical dataset Run the business Grow the business Transform the business Opportunities: Business Impact

Слайд 61

In-Memory Changes Everything “In-memory computing will have a long-term, disruptive impact by radically changing users’ expectations, application design principles, products’ architecture and vendors’ strategy.” — Gartner

Слайд 62

Thank you!