# W4 Built in Algorithms

# Why use built in

Implementations are highly optimized and scalable, support GPU's and distributed systems
Focus more on domain specific tasks than low level code
Trained models can be downloaded and reused

# Usage timeline

If the task is simple and used a lot everywhere, go for built-in
If the task is more niche, script mode can be used which involves scripting with Python frameworks.
The highest customization can be done with your own container

# Built-in examples

Classification - XGBoost, KNN
Regression - Linear, XGBoost
Time series forecasting - DeepAR forecasting (uses RNN's)
Dimensionality reduction - PCA
Anomaly detection - Random Cut Forest (RCF)
Clustering - KMeans
Topic modeling - Latent Dirichlet Allocation (LDA), Neural Topic Model (NTM)
Content moderation - Image classification
Object detection
Semantic segmentation
Machine translation
Text summarization
Speech to text
Text classification

# Text analysis

Word2Vec
- Converts text into vectors (embeddings)
- Architectures to create embeddings
  - Continuous bag of words (CBOW)
  - Continuous skip-gram
GloVe
FastText
- Extension on Word2Vec
- Breaks word into character n-grams
- Embedding is aggregate of embedding of each n-gram within the word
Transformers
- Uses self-attention
BlazingText (we use this cuz AWS)
- Scales Word2Vec to distributed compute
- Extends FastText to use GPU with CUDA
- Saves money by early-stopping
- Optimized IO datasets
ELMo
- BidLLM
GPT
BERT

# Training model

BlazingText takes the hyper params
- epochs
- learning_rate
- vector_dim
- word_ngrams

# Sentiment analysis

def tokenize(review):
    return nltk.word_tokenize(review)

train = sagemaker.inputs.TrainingInput(...)
val = sagemaker.inputs.TrainingInput(...)

channels = {
    'train': train,
    'val': val,
}

# Docker image
image_uri = sagemaker.image_uris.retrieve(frameword='blazing_text')

estimator = sagemaker.estimator.Estimator(image_uri=image_uri)
estimator.set_hyperparameters(...)
estimator.fit(...)

# Creates API endpoint on EC2
classifier = estimator.deploy(initial_instance_count=1, ...)

payload = {'instances': ['Nice']}
response = classifier.predict(...)