# W3 Automated Machine Learning

# Why use AutoML

  • Ability to reduce time-to-market of the product as a result of lesser iterations of creating the model.
  • Lack of particular ML skillsets in teams is not a concern
  • Ability to iterate & experiment quickly
  • Ability to optimize scarce resources and skillsets
  • Lets experts focus on harder tasks which involves domain knowledge

# AutoML workflow

  • AutoML aims at automating the process of building models
  • Steps of workflow
    • You provide a labelled dataset from which it detects the type of problem to solve - regression, classification, etc
    • It then selects an algorithm
    • It applies transformations and preprocessing
    • Selects various hyperparameters and configs to train & test the models

# SageMaker Autopilot

  • Fully transparent and shares code and notebooks for all the processing which are reproducible
  • Steps
    • Upload dataset to S3
    • Provide Autopilot with the target variable
    • It goes through the entire AutoML workflow
    • It returns 2 notebooks - the data exploration (what it learned and potential issues with data) and candidate generation notebook (each preprocessing step, algorithm and hyperparameter choices)
  • SDK's available
    • AWS CLI
    • AWS SDK
    • Amazon SageMaker
    • SageMaker Studio

# TFIDF vectorizer for text

  • tf(t, d) = \frac{f_{t, d}}{\sum_{t' \in d}f_{t', d}}
  • idf(t, D) = log(\frac{|D|}{|{d \in D : t \in d}|}) (scales terms based on frequency)
  • tf-idf(t, d, D) = tf(t, d) * idf(t, D)
  • Where t = term, d = document, D = corpus

# Autopilot results

  • Data transformation and job config code
  • Data exploration and candidate notebooks
  • Transformed data (train, val data)
  • Models
  • Metrics report

# Model hosting

  • Involves a stack containing a proxy, web server, serving code and model
  • Using autopilot, choose the instance counts, docker container for inference and it takes care of creating endpoints.
  • PipelineModel contains various containers
    • Data Transformation - built from model that was trained to transform data
    • Algorithm - built from trained model selecting best algorithm to predict
    • Inverse label Transformer - converts numerical intermediate prediction to labels
  • All of above hosted on same endpoint using inference model
automl = sagemaker.automl.automl.AutoML(target_attribute="", ...)
automl.fit(inputs=...)