'

…And Metrics For All

Понравилась презентация – покажи это...





Слайд 0

…And Metrics For All Paul O’Connor github.com/pauloconnor 2015-05-19


Слайд 1

About Yelp Founded: 2004 Monthly Active Users: ~142 Million Non-US Monthly Users: ~31 Million Review: ~77 Million Local Businesses: 2.1 Million Territories: Available in 31 countries


Слайд 2

What are metrics? Name Value


Слайд 3

What are metrics? Name Value Timestamp


Слайд 4

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640


Слайд 5

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640 server1.load.1m 29.188333 1431950700 server1.load.1m 29.231667 1431950760 server1.load.1m 29.083333 1431950820 server1.load.1m 29.710000 1431950880


Слайд 6

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640 server1.load.1m 29.188333 1431950700 server1.load.1m 29.231667 1431950760 server1.load.1m 29.083333 1431950820 server1.load.1m 29.710000 1431950880


Слайд 7

Graphite Components Carbon: relay cache aggregator Whisper Web app


Слайд 8

Carbon Relay Deals with 2 things Replication Sharding


Слайд 9

Relay Methods Rules [replicate] pattern = ^services\.ads\..+ servers = 10.1.2.3, 10.2.2.3 continue = true Consistent Hashing Defines a sharding strategy across multiple backends 10


Слайд 10

Carbon Cache Receives metrics and persists them to disk Writes based on storage schemas 11


Слайд 11

Storage Schemas Details retention rates for storing metrics [databases_10sec_1year] pattern = ^servers\.db.*$ retentions = 10s:7d,1m:30d,5m:90d,30m:365d 12


Слайд 12

Storage Aggregation Rules for aggregating data to lower-precision retentions [all_min] pattern = \.min$ xFilesFactor = 0.1 aggregationMethod = min 13


Слайд 13

Carbon Aggregator Buffers metrics before forwarding to carbon cache Roll up metrics based on rules 14


Слайд 14

Aggregation Rules Not to be confused with storage aggregation Tells the carbon aggregator what to aggregate and how output_template (frequency) = method input_pattern <env>.applications.<app>.all.requests (60) = sum <env>.applications.<app>.*.requests prod.applications.apache.www01.requests prod.applications.apache.www02.requests prod.applications.apache.www03.requests prod.applications.apache.www04.requests prod.applications.apache.www05.requests prod.applications.apache.all.requests 15


Слайд 15

Whisper Fixed size database Allows for roll ups Allows for backfilling data 16


Слайд 16

Web App Django based app for rendering graphs 17


Слайд 17

Putting it all together Carbon cache listening on port 2003 Write to disk Listen with web 18


Слайд 18

Getting more complicated Carbon relay using consistent hashing to multiple caches Individual caches responsible for specific metrics 19


Слайд 19

More Relays Use HAProxy to load balance between relays Use more relays to use CPU 20


Слайд 20

Even more relays Useful for sending metrics to other locations 21


Слайд 21

Replicate the metrics Duplicate your metrics for backup, and redundancy 22


Слайд 22

More caches instead Consistent hash across multiple nodes 23


Слайд 23

Where does the aggregator fit? Aggregator uses a lot of CPU. Put it on it’s own node 24


Слайд 24

Scaling further Use nodes for particular functions: Use forwarding relay nodes solely to forward Have consistent hashing nodes Have aggregation nodes 25


Слайд 25

26


Слайд 26


Слайд 27

Getting your data back out Graphite Dashboard Third Party Dashboard We use Grafana http://grafana.org/ Graphite-api https://github.com/brutasse/graphite-api


Слайд 28

29


Слайд 29

Tips Aggregate before ingestion Control the metrics that can be sent Metrics are a gas - they expand to fill all available room Use C implementation of carbon Use the latest webapp.


Слайд 30

Optimize your dashboard queries services.biz_app.*.*.timers.pyramid_uwsgi_metrics_tweens_*.p99 2154 results 35 seconds to just find these files on disk Running functions against these results Timeout after a minute Dashboard automatically refreshing every 10 seconds


Слайд 31


Слайд 32

What’s the Future? InfluxDB Cassandra Third party 33


Слайд 33

We’re hiring! http://www.yelp.com/careers Hiring SREs in Dublin, London, New York, San Francisco


×

HTML:





Ссылка: