'

Demystifying Machine Learning

Понравилась презентация – покажи это...





Слайд 0

@dhianadeva MACHINE LEARNING FOR EVERYONE Demystifying machine learning!


Слайд 1

AGENDA Goal: Encourage you to start a machine learning project. Today! ● ● ● ● ● ● ● ● ● ● About me About you Machine Learning Problems Design Algorithms Evaluation Code snippets Pay-as-you-go Competitions


Слайд 2

ABOUT ME Electronics Engineering, Software Development and Data Science… Why not?


Слайд 3

DHIANA DEVA


Слайд 4

NEURALTB


Слайд 5

CERN


Слайд 6

NEURALRINGER


Слайд 7

DJBRAZIL


Слайд 8

HIGGS CHALLENGE


Слайд 9

ABOUT YOU You can do it!


Слайд 10

FOR ALL


Слайд 11

MASSIVE ONLINE OPEN COURSES


Слайд 12

OPEN SOURCE TOOLS


Слайд 13

OPEN SOURCE PYTHON TOOLS


Слайд 14

PAY-AS-YOU-GO SERVICES


Слайд 15

MACHINE LEARNING Learning, machine learning!


Слайд 16

EXPECTATIONS


Слайд 17

REALITY


Слайд 18

FEATURE EXTRACTION Item { Feature 1 Feature 2 … Feature N


Слайд 19

FEATURE SPACE


Слайд 20

SUPERVISED LEARNING Items Feature Vectors Labels 458316,86.513,24.312,64.983,65.8 623 458318,-999.0,91.803,113.007,120. New item 150 58317,135.493,2.204,101.966,46.5 Machine Learning Algorithm 74 Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Expected Label 133


Слайд 21

UNSUPERVISED LEARNING Items Feature Vectors Machine Learning Algorithm 458316,86.513,24.312,64.983,65.8 58317,135.493,2.204,101.966,46.5 458318,-999.0,91.803,113.007,120. New item Feature Vector 458316,86.513,24.312,64.983,65.8 Predictive Model Better Representation


Слайд 22

MODELS


Слайд 23

BIOLOGICAL MOTIVATION


Слайд 24

PROBLEMS I've got 99 problems, but machine learning ain't one!


Слайд 25

CLASSIFICATION A ? B


Слайд 26

CLASSIFICATION


Слайд 27

REGRESSION ? 8 15 7 1 11 13 6 3


Слайд 28

REGRESSION


Слайд 29

CLUSTERING


Слайд 30

CLUSTERING


Слайд 31

DIMENSIONALITY REDUCTION


Слайд 32

DIMENSIONALITY REDUCTION


Слайд 33

DESIGN DECISIONS 1, 2 steps!


Слайд 34

NORMALIZATION ● z-score ● min-max


Слайд 35

TRAINING


Слайд 36

REGULARIZATION


Слайд 37

RELEVANCE ANALYSIS


Слайд 38

CROSS VALIDATION


Слайд 39

ALGORITHMS Cheat sheet included!


Слайд 40

CHEAT SHEET


Слайд 41

CHEAT SHEET


Слайд 42

ALGORITHMS PT. I Linear Regression Decision Trees Random Forest


Слайд 43

ALGORITHMS PT. II K-Nearest Neighbors K-Means


Слайд 44

NEURAL NETWORKS


Слайд 45

SELF-ORGANIZING MAPS


Слайд 46

PRINCIPAL COMPONENTS ANALYSIS


Слайд 47

T-SNE


Слайд 48

EVALUATION How you doin'?


Слайд 49

PRECISION AND ACCURACY


Слайд 50

CONFUSION MATRIX TP = True Positives TN = True Negatives FP = False Positives FN = False Negatives Precision = TP Recall = TP + FP F1-score = TP TP + FN 2 * precision * recall precision + recall


Слайд 51

ROC CURVE


Слайд 52

A/B TESTS


Слайд 53

Code Snippets "Hello, Machine Learning"


Слайд 54

MATLAB 101 [x,y] = ovarian_dataset; net = patternnet(5); [net,tr] = train(net,x,y); testX = x(:,tr.testInd); testY = net(testX);


Слайд 55

MATLAB 201 net = patternnet(14); net.input.processFcns = {'mapminmax', 'fixunknowns', 'processpca'}; net.inputs{1}.processParams{3}.maxfrac = 0.02; net.trainFcn = 'trainlm'; net.performFcn = 'mse'; net.divideParam.trainRatio = 70/100; net.divideParam.valRatio = 15/100; net.divideParam.testRatio = 15/100; [net, tr] = train(net_config, test_inputs, train_targets); outputs = net(test_inputs);


Слайд 56

R library(randomForest) raw.orig < - read.csv(file="train.txt", header=T, sep="\t") frmla = Metal ~ OTW + AirDecay + Koc fit.rf = randomForest(frmla, data=raw) print(fit.rf) importance(fit.rf)


Слайд 57

SCIKIT LEARN dataset = pd.read_csv('Data/train.csv') target = dataset.Activity.values train = dataset.drop('Activity', axis=1).values test = pd.read_csv('Data/test.csv').values rf = RandomForestClassifier(n_estimators=100, n_jobs=-1) rf.fit(train, target) predicted_probs = [x[1] for x in rf.predict_proba(test)] importances = rf.feature_importances_


Слайд 58

PAY-AS-YOU-GO SERVICES Amazon Machine Learning


Слайд 59

AMAZON MACHINE LEARNING Five easy steps 1. 2. 3. 4. 5. Upload csv dataset to Amazon S3 Create Datasource with metadata about uploaded dataset Create ML Model with configurations for model training Create Evaluation to analyse and tune model efficiency Create Prediction to use trained model with new data


Слайд 60

EVALUATION


Слайд 61

SDKs


Слайд 62

DATA SCIENCE COMPETITIONS Challenge accepted!


Слайд 63

KAGGLE


Слайд 64

SPONSORED


Слайд 65

END TO END TRAIN.CSV TRAIN TRAINED.DAT RUN TEST.CSV SOLUTION.CSV


Слайд 66

THANK YOU Questions? Dhiana Deva ddeva@thoughtworks.com


Слайд 67


×

HTML:





Ссылка: