#
W3 Automated Machine Learning
#
Why use AutoML
- Ability to reduce time-to-market of the product as a result of lesser iterations of creating the model.
- Lack of particular ML skillsets in teams is not a concern
- Ability to iterate & experiment quickly
- Ability to optimize scarce resources and skillsets
- Lets experts focus on harder tasks which involves domain knowledge
#
AutoML workflow
- AutoML aims at automating the process of building models
- Steps of workflow
- You provide a labelled dataset from which it detects the type of problem to solve - regression, classification, etc
- It then selects an algorithm
- It applies transformations and preprocessing
- Selects various hyperparameters and configs to train & test the models
#
SageMaker Autopilot
- Fully transparent and shares code and notebooks for all the processing which are reproducible
- Steps
- Upload dataset to S3
- Provide Autopilot with the target variable
- It goes through the entire AutoML workflow
- It returns 2 notebooks - the data exploration (what it learned and potential issues with data) and candidate generation notebook (each preprocessing step, algorithm and hyperparameter choices)
- SDK's available
- AWS CLI
- AWS SDK
- Amazon SageMaker
- SageMaker Studio
#
TFIDF vectorizer for text
- tf(t, d) = \frac{f_{t, d}}{\sum_{t' \in d}f_{t', d}}
- idf(t, D) = log(\frac{|D|}{|{d \in D : t \in d}|}) (scales terms based on frequency)
- tf-idf(t, d, D) = tf(t, d) * idf(t, D)
- Where t = term, d = document, D = corpus
#
Autopilot results
- Data transformation and job config code
- Data exploration and candidate notebooks
- Transformed data (train, val data)
- Models
- Metrics report
#
Model hosting
- Involves a stack containing a proxy, web server, serving code and model
- Using autopilot, choose the instance counts, docker container for inference and it takes care of creating endpoints.
PipelineModel
contains various containers- Data Transformation - built from model that was trained to transform data
- Algorithm - built from trained model selecting best algorithm to predict
- Inverse label Transformer - converts numerical intermediate prediction to labels
- All of above hosted on same endpoint using inference model
automl = sagemaker.automl.automl.AutoML(target_attribute="", ...)
automl.fit(inputs=...)