# W3 Automated Machine Learning

# Why use AutoML

Ability to reduce time-to-market of the product as a result of lesser iterations of creating the model.
Lack of particular ML skillsets in teams is not a concern
Ability to iterate & experiment quickly
Ability to optimize scarce resources and skillsets
Lets experts focus on harder tasks which involves domain knowledge

AutoML aims at automating the process of building models
Steps of workflow
- You provide a labelled dataset from which it detects the type of problem to solve - regression, classification, etc
- It then selects an algorithm
- It applies transformations and preprocessing
- Selects various hyperparameters and configs to train & test the models

Fully transparent and shares code and notebooks for all the processing which are reproducible
Steps
- Upload dataset to S3
- Provide Autopilot with the target variable
- It goes through the entire AutoML workflow
- It returns 2 notebooks - the data exploration (what it learned and potential issues with data) and candidate generation notebook (each preprocessing step, algorithm and hyperparameter choices)
SDK's available
- AWS CLI
- AWS SDK
- Amazon SageMaker
- SageMaker Studio

tf(t, d) = \frac{f_{t, d}}{\sum_{t' \in d}f_{t', d}}
idf(t, D) = log(\frac{|D|}{|{d \in D : t \in d}|}) (scales terms based on frequency)
tf-idf(t, d, D) = tf(t, d) * idf(t, D)
Where t = term, d = document, D = corpus

Involves a stack containing a proxy, web server, serving code and model
Using autopilot, choose the instance counts, docker container for inference and it takes care of creating endpoints.
PipelineModel contains various containers
- Data Transformation - built from model that was trained to transform data
- Algorithm - built from trained model selecting best algorithm to predict
- Inverse label Transformer - converts numerical intermediate prediction to labels
All of above hosted on same endpoint using inference model

automl = sagemaker.automl.automl.AutoML(target_attribute="", ...)
automl.fit(inputs=...)