'

Humans By The Hundred

Понравилась презентация – покажи это...





Слайд 0

Humans By The Hundred Scaling Big Data for Big Team Growth


Слайд 1

$ whoami SRE Manager at Yelp CWRU Alum Pittsburgh native <3 Web Operations Just a dude


Слайд 2

Yelp’s Mission: Connecting people with great local businesses.


Слайд 3

Yelp Stats: As of Q2 2015


Слайд 4

What is Yelp? Many sites: www, m, biz, api Mobile apps Partner platform Hundreds of developers Thousands of servers


Слайд 5

Why Am I Here?


Слайд 6


Слайд 7

DATA


Слайд 8

This talk is about people


Слайд 9


Слайд 10


Слайд 11


Слайд 12


Слайд 13


Слайд 14


Слайд 15


Слайд 16

The Goal


Слайд 17

Iterate as fast as possible


Слайд 18

Regardless of how many people are participating


Слайд 19

Deployment


Слайд 20

How It Starts


Слайд 21

Deployment: the early days Get a few people together in slack/irc/etc. Merge up the code Run the tests Manually test it in stage Cross your fingers


Слайд 22


Слайд 23


Слайд 24

Things get slower... Tests take longer to run More hosts = longer downloads More developers = more eyeballs More features = more code


Слайд 25

The Problem: Humans Are Fallible


Слайд 26

The Problem: Humans Are Fallible “…oh @$#&”


Слайд 27


Слайд 28

The Problem, With Math Assume: Every change has a chance of success: 98% That means no test failures, no reverts, etc. Every deploy has a number of changes: n Any failure in the pipeline invalidates the deploy Let’s figure out the probability of a successful deployment: p


Слайд 29

The Problem, With Math Only you p = .98 (98%) You and a friend p = .98 * .98 = .96 (96%) You and nine co-workers p = .98 * .98 * .98 * … * .98 = .82 (82%)


Слайд 30

The Problem, With Math p = (.98)n


Слайд 31

The Problem, With Math p = (.98)n exponential decay!


Слайд 32


Слайд 33

This doesn’t scale! More developers = more changes More changes = longer deploys Longer deploys = less time to develop Less time to develop = slower to iterate Slower to iterate != the goal


Слайд 34

Mitigating Exponential Decay p = (.98)n


Слайд 35

Mitigating Exponential Decay p = (.98)n


Слайд 36


Слайд 37

Making it harder to screw up Write more tests Write better tests Get better code reviews Get better infrastructure Switch programming languages Use better tools


Слайд 38

Just write better software and stop making mistakes!


Слайд 39

PROBLEM SOLVED


Слайд 40


Слайд 41

The Real World Testing builds confidence in our changes Testing does not protect you from failure Better tools, tests, and infrastructure can raise our success rates


Слайд 42

Mitigating Exponential Decay p = (.98)n


Слайд 43

Mitigating Exponential Decay p = (.98)n


Слайд 44

Service-Oriented Architecture Large monolith > smaller services Services communicate over network Usually HTTP, but you can do RPC, SOAP, etc. Service = independent code base Independent deployments


Слайд 45

Service-Oriented Architecture Benefits Smaller code bases = upper bound to n Failure domains become isolated Technology independence Federated responsibility


Слайд 46

Service-Oriented Architecture Drawbacks everything becomes decoupled function calls start looking like HTTP requests versioning can be a nightmare tracking dependencies is hard data consistency becomes challenging end-to-end testing becomes hard(er), if not impossible


Слайд 47

SOA scales people, not code.


Слайд 48

Conquering SOA With the monolith, it’s easy to focus on mean time between failures (MTBF)


Слайд 49

Conquering SOA In a SOA, focus on mean time to recovery (MTTR)


Слайд 50

Conquering SOA Fail fast Anticipate failure Leverage iteration speed to recover fast


Слайд 51

Conquering SOA Treat everything as distributed That means everything will fail Use timeouts, retries Find ways to degrade gracefully Fail fast & isolated Don’t rely on synchronous processes Prepare for eventual consistency


Слайд 52

Reaping the Benefits Smaller failure domains Fewer people & changes to manage Deploys get smaller Deploys get faster Deploys become continuous


Слайд 53

Reaping the Benefits Smaller changes means smaller code reviews means faster validation means smaller blast radius means faster iteration


Слайд 54

Continuous Delivery Everyone works against master branch Master is deployed when commits added Deployment gated by tests Monitoring knows something is wrong before you do!


Слайд 55

PROBLEM SOLVED


Слайд 56

Testing


Слайд 57

Tests are hard to get right.


Слайд 58


Слайд 59


Слайд 60


Слайд 61


Слайд 62


Слайд 63


Слайд 64

How can we do better?


Слайд 65


Слайд 66

“Not Recommended” Tests


Слайд 67

“Not Recommended” Tests If a test fails on master: a feature is broken on the live website, or your test sucks and you should ditch it In either case, we disable it Ticket is created Developers can fix it later or just bin it and start fresh


Слайд 68

Reliable tests >> test coverage.


Слайд 69

Don’t always run all the tests!


Слайд 70

Tests of external services should be monitoring


Слайд 71

Define your boundaries.


Слайд 72

yelp.com / dataset_challenge 61K businesses 61K checkin-sets 481K business attributes 1.6M reviews 366K users 2.8M edge social-graph 495K tips Your academic project, research or visualizations, submitted by Dec 31, 2015 = $5,000 prize + $1,000 for publication + $500 for presenting* *See full terms on website Academic dataset from 10 cities in 4 countries!


Слайд 73

@YelpEngineering YelpEngineers engineeringblog.yelp.com github.com/yelp


Слайд 74

yelp.com/careers


Слайд 75

Questions?


×

HTML:





Ссылка: