'

Managing growth in Production Hadoop Deployments

Понравилась презентация – покажи это...





Слайд 0

Managing growth in Production Hadoop Deployments Soam Acharya @soamwork Charles Wimmer @cwimmer Altiscale @altiscale Hadoop Summit 2015 San Jose


Слайд 1

Altiscale : INFRASTRUCTURE NERDS Soam Acharya - Head of Application Engineering Formerly Chief Scientist @ Limelight OVP, Yahoo Research Engineer Charles Wimmer, Head of Operations Former Yahoo! & LinkedIn SRE Managed 40000 nodes in Hadoop clusters at Yahoo! Hadoop as a Service, built and managed by Big Data, SaaS, and enterprise software veterans Yahoo!, Google, LinkedIn, VMWare, Oracle, ... 2


Слайд 2

3 So, you’ve put together your first Hadoop deployment It’s now running production ETLs


Слайд 3

Congratulations!


Слайд 4

But then ... Your data scientists get on the cluster and start building models 5


Слайд 5

But then ... Your data scientists get on the cluster and start building models Your BI team starts running interactive SQL on Hadoop queries .. 6


Слайд 6

But then ... Your data scientists get on the cluster and start building models Your BI team starts running interactive SQL on Hadoop queries .. Your mobile team starts sending RT events into the cluster .. 7


Слайд 7

But then ... Your data scientists get on the cluster and start building models Your BI team starts running interactive SQL on Hadoop queries .. Your mobile team starts sending RT events into the cluster .. You sign up more clients And the input data for your initial use case doubles .. 8


Слайд 8

Soon, your cluster ... 9


Слайд 9

And you … 10


Слайд 10

The “success disaster” scenario Initial success Many subsequent use cases on cluster Cluster gets bogged down 11


Слайд 11

Why Do Clusters Fail? Failure categories: Too much data Too many jobs Too many users 12


Слайд 12

How extricate yourself? Short term strategy: Get more resources for your cluster Expand cluster size! More headroom for longer term strategy Longer term strategy 13


Слайд 13

Longer term Strategy Can’t cover every scenario Per failure category: Selected pressure points (PPs) Can occur at different levels of Hadoop stack Identify and shore up pressure points Squeeze more capacity from cluster 14


Слайд 14

Hadoop 2 Stack REMINDER 15 Application Layer Execution Framework Core Hadoop Layer Machine Level


Слайд 15

Failure category 1 - too much data 16 PP: HDFS at capacity PP: Too many objects


Слайд 16

Pressure Point - Hdfs at capacity Unpredictable cluster behavior Transient errors Hadoop daemons can’t save logs to HDFS Execution framework errors: Hive unable to run queries that create temp tables 17 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/hive-user/hive_2014-07-23_08-43-40_408_2604848084853512498-1/_task_tmp.-ext-10001/_tmp.000121_0 could only be replicated to 1 nodes instead of minReplication (=2). There are xx datanode(s) running and no node(s) are excluded in this operation.


Слайд 17

HDFS AT CAPACITY mitigation Use HDFS quotas! hdfs dfsadmin -setSpaceQuota 113367670 / Quotas can be set per directory Cannot be set per user Protection against accidental cluster destabilzation 18


Слайд 18

too many objects “Elephants are afraid of mice. Hadoop is afraid of small files.” 19 # of dirs + files # of blocks


Слайд 19

too many objects Memory pressure: Namenode heap: too many files + directories + objects in HDFS Datanode heap: too many blocks allocated per node Performance overhead Too much time spent on container creation and teardown More time spent in execution framework than actual application 20


Слайд 20

Where are the objects? Use HDFS count: hdfs dfs -count -q <directory name> Number of directories, files and bytes On per directory basis Use fsimage files: Can be produced by NN hdfs oiv <fsimage file> Detailed breakdown of the HDFS file system Hard! 21


Слайд 21

Too many objects - mitigation Short term: Increase NN/DN heapsizes Node physical limits Increase cluster node count Longer term: Find and compact Coalesce multiple files Use HAR 22


Слайд 22

COALESCE MULTIPLE FILES I Hadoop streaming job Whatever Hadoop can read on cluster LZO output 23 hadoop \ jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-*.jar \ -D mapreduce.job.reduces=40 \ -D mapred.output.compress=true \ -D mapred.output.compression.codec=\ com.hadoop.compression.lzo.LzopCodec \ -D mapreduce.output.fileoutputformat.compress.type=BLOCK \ -D mapreduce.reduce.memory.mb=8192 \ -mapper /bin/cat \ -reducer /bin/cat \ -input $IN_DIR \ -output $DIR


Слайд 23

COALESCE MULTIPLE FILES II Build index for LZO output Tell hadoop where the splits are hadoop \ jar /opt/hadoop/share/hadoop/common/lib/hadoop-lzo-*.jar \ com.hadoop.compression.lzo.DistributedLzoIndexer \ $DIR 24


Слайд 24

Combine files into hAR HAR: Hadoop Archive hadoop archive -archiveName <archive name>.har -p <HDFS parent path> <dir1> <dir2> ... <outputDir> MR job to produce archive Watch out for replication factor On versions 2.4 and earlier, source files are set to a default replication factor 10 Not good for small clusters -r <replication factor> option added in 2.6 25


Слайд 25

Combine files into har HAR archives are useful if you want to preserve the file/directory structure of input [alti_soam@desktop ~]$ hdfs dfs -ls har:///tmp/alti_soam_test.har Found 3 items drwxr-xr-x - alti_soam hdfs 0 2013-09-03 22:44 har:/tmp/alti_soam_test.har/examples drwxr-xr-x - alti_soam hdfs 0 2013-11-16 03:53 har:/tmp/alti_soam_test.har/test-pig-avro-dir drwxr-xr-x - alti_soam hdfs 0 2013-11-12 22:23 har:/tmp/alti_soam_test.har/test-camus 26


Слайд 26

Failure category 2 - too many jobs “Help! My job is stuck!” 27 Jobs don’t make progress Jobs don’t start “Right” jobs finish last Mixed profile job issues


Слайд 27

Too many jobs remediation Need to quantify job processing on cluster Hadoop job usage analysis: Resource Manager logs History Server logs, job history files APIs Analysis goals: Queue usage => cluster utilization Time spent by jobs/containers in waiting state Job level stats # of jobs, type of jobs … Queue tuning 28


Слайд 28

Hadoop Logs - resource manager job stats (outcome, duration, startdate) queue used container: number allocated Memory, vCPU allocation state transition times outcome 29


Слайд 29

Hadoop logs - jobhistory files Configure history server to produce files Created for every MR job HDFS data volume processed for mappers/reducers: CPU time memory used start/end time max parallel maps, reduces GC time not available for Tez/Spark: Use timeline server for better logging Timeline server dependencies 30


Слайд 30

Hadoop log analysis Analysis goals: Queue usage => cluster utilization Time spent by jobs/containers in waiting state Job level stats: # of jobs Failed/killed vs successful Type of jobs Container level stats How analyze logs? Custom scripts Parse job history files, hadoop logs Data warehouse Visualization Not much by the way of publicly available tools 31


Слайд 31

32 Sample plot: container wait time and UTILIZATION PER QUEUE Container wait times Queue utilization vCore usage


Слайд 32

33 Sample plot: Daily Job type and status


Слайд 33

34 Sample plot: Daily Job BREakdown by user


Слайд 34

Queue Tuning Strategy Determine how you want your cluster to behave Pick scheduler depending on behavior Real world examples: Production jobs must get resources Dedicate a certain portion of the cluster regardless of cluster state (idle, at capacity) Data loading jobs Constrain to a small portion of cluster to preserve network bandwidth Research jobs: Small portion of cluster at peak Large portion of cluster when idle Divide up cluster amongst business units 35


Слайд 35

Queue tuning - scheduler basics 36


Слайд 36

More on EACH SCHEDULER Fair Scheduler: Hadoop Summit 2009 Job Scheduling With the Fair and Capacity Schedulers - Matei Zaharia Capacity Scheduler: Hadoop Summit 2015 (5/9, 12:05pm) Towards SLA-based Scheduling on YARN - Sumeet Singh, Nathan Roberts 37


Слайд 37

TOO MANY JOBS - Mixed PROFILE JOBS Jobs may have different memory profiles Standard MR jobs: small container sizes Newer execution frameworks (Spark, H2O): Large container sizes All or nothing scheduling A job with many little tasks Can starve jobs that require large containers 38


Слайд 38

Too Many jobs - Mixed Profile JOBS Mitigation Reduce container sizes if possible Always start with the lowest container sizes Node labels (YARN-2492) and gang scheduling (YARN-624) More details: Running Spark and MapReduce Together In Production - David Chaiken Hadoop Summit 2015, 06/09, 2:35pm 39


Слайд 39

Too Many JOBS - Hardening Your Cluster Cluster configuration audit Container vs heap size Appropriate kernel level configuration Turn on Linux Container Executor Enable Hadoop Security Use operating system cgroups Protect Hadoop daemons Cage user processes: Impala Limits on what Hadoop can control: CPU But not memory, network & disk BW 40 mapreduce.map.memory.mb = 1536 mapreduce.map.java.opts = -Xmx2560m


Слайд 40

FAILURE CATEGORY 3 - Too many users Data access control 41 Inter-departmental resource contention (too many jobs)


Слайд 41

Too many users - Queue Access Use queue ACLs restrict which users can submit jobs to a queue per queue administrator roles: submit job administer job restrict whether users can view applications in another queue 42


Слайд 42

DATA ACCESS CONTROL By default, Hadoop supports UNIX style file permissions Easy to circumvent HADOOP_USER_NAME=hdfs hdfs dfs -rm /priv/data Use Kerberos 43


Слайд 43

DATA ACCESS CONTROL - ACCOUNTABILITY HDFS Audit logs Produced by NameNode 015-02-24 20:59:45,382 INFO FSNamesystem.audit: allowed=true ugi=soam (auth:SIMPLE) ip=/10.251.255.181 cmd=delete src=/hive/what_a_con.db dst=/user/soam/.Trash/Current/hive/what_a_con.db perm=soam:hiveusers:rwxrwxr-x 44


Слайд 44

Squeeze more capacity from cluster 45 Application Layer Execution Framework Core Hadoop Layer Targeted upgrades, optimizations


Слайд 45

Squeeze more capacity from cluster Optimizations: Application layer: Query optimizations, algorithmic level optimizations Upgrading: Execution Framework: Tremendous performance improvements in Hive/Tez, Spark over the past two years Pig, Cascading all continue to improve Hadoop layer: Recent focus on security, stability Recommendation: Focus on upgrading execution framework 46


Слайд 46

Questions? Comments? 47


×

HTML:





Ссылка: