'

Moving Mountains of Data for Gaming

Понравилась презентация – покажи это...





Слайд 0

MOVING MOUNTAINS OF PLAYER DATA SCALABLE INTERNET SERVICES UCLA/UCSB - NOV 2015 SEAN MALONEY RIOT GAMES @SEAN_SEANNERY


Слайд 1

WHO IS THIS GUY? Lead developer on Riot’s ETL tools SEAN MALONEY BIG DATA ENGINEER FUN FACT: Was a student in this class 4 years ago Intern at Appfolio


Слайд 2

MOVING MOUNTAINS OF DATA 1. INTRODUCTION 2. THE GAME PLATFORM: OUR MAIN DATA SOURCE 3. HOW WE INGEST AND QUERY DATA 4. HOW WE SCALE IN AWS 5. CONCLUSION - SEAN’S PRO TIPS


Слайд 3

INTRODUCTION


Слайд 4

WHAT IS LEAGUE OF LEGENDS? 2009 LAUNCH ONLINE MULTIPLAYER WINDOWS / OSX 40-50 MIN GAMES


Слайд 5

YOUR CHAMP THE TEAM THE BATTLE GROUND


Слайд 6


Слайд 7


Слайд 8


Слайд 9

THE GAME PLATFORM


Слайд 10

THE CLIENT.


Слайд 11


Слайд 12

Load Balancers and Firewalls CHAT STORE AUDIT


Слайд 13

ORACLE COHERENCE (IN MEMORY DB) PRIMARY DB HOT BACKUP DB 2nd BACKUP DB / ETL CHAT STORE AUDIT GAME ETC. CHAT STORE AUDIT GAME ETC. CHAT STORE AUDIT GAME ETC.


Слайд 14

OTHER DATA SOURCES <REST>


Слайд 15


Слайд 16

DATA INGESTION


Слайд 17

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 18

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 19

Distributed ETL Software written in Ruby. Same ETL applied to multiple regions / datacenters Self-Service UI with SQL query templating. Scales Horizontally


Слайд 20

NA Korea Russia


Слайд 21

Create an ETL


Слайд 22

Create an ETL


Слайд 23

FUETL CAN CONNECT TO Amazon S3 SQS (S)FTP Hive Microsoft SQL Server MySQL DynamoDB Vertica Redshift REST websites


Слайд 24

Create an ETL


Слайд 25


Слайд 26


Слайд 27

Webapp View - backbone.js - Bootstrap CSS Scheduler Process Worker Process Command Line Tool Task / Helper / Controllers Core Libraries Task Service Environment Service Helper Service Tasks Task DAO Helpers Env. Task DAO Environment DAO Env. Helper DAO Helper DAO


Слайд 28

Webapp View - backbone.js - Bootstrap CSS Scheduler Process Worker Process Command Line Tool Task / Helper / Controllers Core Libraries Task Service Environment Service Helper Service Tasks Task DAO Helpers Env. Task DAO Environment DAO Env. Helper DAO Helper DAO


Слайд 29

Webapp View - backbone.js - Bootstrap CSS Scheduler Process Worker Process Command Line Tool Task / Helper / Controllers Core Libraries Task Service Environment Service Helper Service Tasks Task DAO Helpers Env. Task DAO Environment DAO Env. Helper DAO Helper DAO


Слайд 30

Webapp View - backbone.js - Bootstrap CSS Scheduler Process Worker Process Command Line Tool Task / Helper / Controllers Core Libraries Task Service Environment Service Helper Service Tasks Task DAO Helpers Env. Task DAO Environment DAO Env. Helper DAO Helper DAO


Слайд 31

FuETL STATISTICS 5213 23125 14 TB ACTIVE REGIONAL ETLS DAILY ETL RUNS DATA MOVED DAILY


Слайд 32

FuETL SCALING


Слайд 33

FuETL SCALING


Слайд 34

Idempotency Idempotent - an operation that will produce the same results if executed once or multiple times EXAMPLE: Non-Idempotent: Idempotent: - x = x * 5; - Submitting a purchase - abs( abs(x) ) = abs(X) - Cancelling a purchase


Слайд 35

Idempotent? In the transactional OLTP world…. INSERT INTO games_played (SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)


Слайд 36

Idempotent? In the big data / OLAP world…. INSERT INTO games_played (SELECT * FROM games_played_na WHERE date >= ‘2015-10-25’)


Слайд 37

KEEPING INTEGRITY X


Слайд 38

Message Queues ETL1 ETLN SCHEDULER aka PRODUCER ... ETL5 ETL4 ETL3 ETL2 X X WORKER aka CONSUMER


Слайд 39

Message Queues ● REDUNDANCY ● DELIVERY GUARANTEE ● SCALABILITY ● ASYCH. COMMUNICATION ● ABSTRACTION / DECOUPLING


Слайд 40

Message Queues ● AMAZON SIMPLE QUEUE SERVICE ● APACHE ACTIVEMQ ● RABBITMQ ● HORNETQ ● MICROSOFT MQ (MSMQ)


Слайд 41

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 42

Self Service, Custom HTTP Edge Service (Java) 0 Honu Fronted by ELB in front of ~40 autoscaled m1.xlarge instances Forwards JSON data indirectly to S3 The batches need to then be unpacked and converted into Hive tables 0


Слайд 43

Custom Collector Infrastructure (Java) - Derived from Netflix Suro 0 Honu Deployed in every data center worldwide and also AWS Self Service, Custom HTTP Edge Service (Java API)


Слайд 44

Honu =


Слайд 45

Custom HTTP Edge Service (Java) 0 DRADIS Fronted by ELB in front of ~40 m1. xlarge instances Forwards data indirectly to S3 via Honu Collectors


Слайд 46

Honu R E S T E N D P O I N T JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON COLLECTORS


Слайд 47

Honu R E S T E N D P O I N T JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON batchid = 20150512 COLLECTORS


Слайд 48

Honu R E S T E N D P O I N T JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON JSON GAM1 GAM1 GAM X GAM1 GAM1 GAM1 JSON JSON JSON JSON JSON JSON COLLECTORS


Слайд 49

Idempotency Use application logic to make idempotent msg = queue.pop; if (processed_games.contains( msg.game_id ) { return; //do nothing else { process_game(msg); }


Слайд 50

Inconsistent data structure THE DOWN SIDE Its formatted however developer submits it What’s in there? Data team doesn’t know everything that is submitted Compliance Are we violating international data laws?


Слайд 51

Focus on UX Your tools need to be easy for non-technical people to use. SELF SERVICE HOW? User Documentation No one likes doing it, but it helps a lot. Onboard training Get new coworkers in-the-know Familiar Protocols Use REST or RPC so developers are on the same page


Слайд 52

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 53

AMAZON S3 STRUCTURE HIVE ‣ ‣ ‣ ‣ schema1 table1 env dt time table2 table3 schema2 table1 ... schema3 schema4 AMAZON S3 s3n://datawarehouse/ schema1/ table1/ env/ dt/ time/ table2/ table3/ schema2/ s3n://telemetrydata/ application1/ table1/ env/ dt/ table2/ application2/


Слайд 54


Слайд 55


Слайд 56

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 57

REST micro-service built with Java and docker. Source and target comparison. Warehouse Auditing Service Platform Reports and visualizations we can use to find problems.


Слайд 58

HOW TO AUDIT


Слайд 59

VISUALIZING


Слайд 60

VISUALIZING


Слайд 61


Слайд 62

HOW TO AUDIT


Слайд 63

INGESTION STORAGE PULL-BASED / ETL MASTER WAREHOUSE FuETL - OLTP game data - External Data Sources DATA AUDITING QUERY / VIEWS AGGREGATE QUERIES BATCH QUERIES PUSH-BASED SINGLE-ROW QUERIES HONU - Anything pushed to it - Server logs VIZ. TOOLS


Слайд 64

BATCH OLAP POINT


Слайд 65

SCALING IN AWS


Слайд 66

RESOURCE CONTENTION SCALING


Слайд 67

AWS Infrastructure Today EMR EC2 Storage Networking AWS Direct Connect RDS Data Science DynamoDB Loading Telemetry Platfora Solr (real time) Auditing ETL Metastore Telemetry collectors Rocana (real time dashboard) Data dictionary Point Data Service Data Science DYNAMODB Point Data Store ETL App DB Fraud VPC ETL Analytics / Hue AWS Direct Connect AWS Direct Connect S3 Source of “Truth” AWS Direct Connect


Слайд 68

CONCLUSION


Слайд 69

SEAN’S PRO TIPS OF THE DAY DO DON’T ➔ Keep idempotency in mind and use MQ architecture ➔ Don’t underestimate simple problems in big data. ➔ Don’t forget to track cost. AWS bills can surprise you ➔ Get an auditing solution for DW accuracy ➔ Prepare for multiple data access patterns ➔ Allocate time for tuning AWS infrastructure ➔ Don’t wait. Create S3 permissions and naming standards early ➔ Don’t stop. Believing


Слайд 70

CHAMPION MASTERY Custom rewards for mastering different champions Intensive query that spans every game that every player has played Improves player engagement


Слайд 71

PLAYER SUPPORT Full copy of our data warehouse in DynamoDB Hive->DynamoDB Dynamic Partition Support can answer questions faster than ever.


Слайд 72

OFFENSIVE CHAT DETECTION Data science team queries all chat messages in game Sentiment analysis and classification Identifies negative, offensive players and mutes them automatically.


Слайд 73

QUESTIONS? ENGINEERING BLOG engineering.riotgames.com SMALONEY @RIOTGAMES.COM @SEAN_SEANNERY


Слайд 74


×

HTML:





Ссылка: