ml, machine learning, ml, python, artificial intelligence, ai, computer science, data science, data analytics, technology training at bay area, algorithm, deep learning, supervised learning, database, nlp, loud computing, cloud technologies, data processing, data science, google cloud, data pipelines, online training program
CSE590/MB590 Special Topics
  - Machine Learning and NLP on Cloud

This course introduces students to Big Data and NLP on Cloud. It provides an overview of Google Cloud Platform and a deeper dive of the data processing and NLP capabilities. Through a combination of presentations, demos, and hand-on labs, students will learn how to design data processing systems, orchestrate end-to-end data pipelines, build scalable, accurate, and production-ready natural language models using cloud technologies.
  • » 23 hours (8 weeks) in class lecturing plus dedicated mentoring sessions from our faculty of industry experts
  • » 1.5 semester credits for both certificate and master’s degree
  • » Access to high-quality live class recording
  • » Online live classroom available for all classes
  • » Lifetime learning resources for our students
  • $ 990
Course Description

This course introduces students to Big Data and NLP on Cloud. It provides an overview of Google Cloud Platform and a deeper dive of the data processing and NLP capabilities.

Through a combination of presentations, demos, and hand-on labs, students will learn how to design data processing systems, orchestrate end-to-end data pipelines, build scalable, accurate, and production-ready natural language models using cloud technologies.

Prerequisite: programming experience with python, basic understanding of command line.

Measurable Course Objectives
  • Learn about the infrastructure and platform services provided by Google Cloud for data processing
  • Processing big data at scale for analytics and machine learning
  • Understand the wide spectrum of problem statements, tasks, and solution approaches within NLP
  • Evaluate various algorithms and approaches for the given task, dataset, and stage of the NLP
  • Demonstrate ability to build, train model, and deploy NLP models using Google Cloud
University-wide Student Learning Outcome

The University Student Learning Outcomes assessed and reinforced in this course include but are not limited to the following:

  • Communication
  • Critical Thinking
  • Information Literacy
Course Topics
Week 1-2: Unsupervised Learning and Clustering
  • Introduction to Microsoft Azure Cloud
  • Getting Started
  • Storage services
  • Compute Services
  • Big Data/Machine Learning Services
  • Modernizing Data Lakes and Data Warehouses with Microsoft Azure
  • Introduction to Azure Storage
  • ​Fundamental ​Microsoft Azure ​Features
  • Query Basics
  • Microsoft ML Pipeline
Week 3-4: Creating a Data Transformation and Building Batch Data Pipeline
  • Introduction to Azure Analytics
  • Azure Data Factory
  • Azure HD Insights
  • Azure DataBricks pipeline
  • Other Azure Data based solutions
  • Building Distributed Batch Data Pipelines on Azure
  • Executing Spark on Azure DataBricks and Blob Storage
  • Optimizing Azure DataBricks
  • The Hadoop ecosystem
  • Running Hadoop on Cloud Dataproc1
Week 5-8: Natural Language Processing
  • Intro to Natural Language Processing
  • What is Natural Language Processing?
  • Real-world use cases for NLP
  • Python, NLTK, Spacy libraries
  • Extracting, Cleaning and Preprocessing Text
  • Tokenization
  • Frequency Distribution Different Types of
  • Tokenizers Bigrams, Trigrams & N Grams Stemming
  • Lemmatization
  • Stopwords
  • POS Tagging
  • Named Entity Recognition
  • Analyzing Sentence Structure
  • Syntax Trees
  • Chunking
  • Chinking
  • Context Free Grammars (CFG)
  • Intro to text classification
  • Machine Learning: Brush Up Bag of Words
  • Countvectorizer
  • Term Frequency (TF)
  • Inverse Document Frequency (IDF)
  • Multinomial Naive Bayes Classifier
  • Leveraging Confusion Matrix
  • Feature Extraction and Embeddings
  • Converting text to features and labels
  • embedding algorithms, such as Word2Vec and Glove
  • Text Summarization and Generation
  • BERT
  • GPT
  • Turing NLG
  • Final Project (Student will apply what they’ve learned to build real-world cloud applications in the following domains)
  • Text classification:
  • Chatbots:
  • Speech recognition
  • Document summarisation
  • Automatic text generation
  • Subjectivity production
  • Data visualization
  • Question answering
About the Instructor

Bhairav Mehta

Bhairav Mehta is Data Science Manager at Apple Inc. Senior Data Scientist and Technical Program Manager with 14 years experience in Analytics, Data Science, AI/ML, Big-Data and Program management at Fortune 10 companies (Apple, Qualcomm, Ford) and startups (MIT Startup) in various industry verticals. Focused, highly dependable and detail-oriented solutions architect offering exceptional problem solving, troubleshooting skills and a talent for developing innovative solutions to unusual and difficult problems. Demonstrated performance in leading high performance teams in delivery of actionable solutions to business problems. Extensive education with 5 masters degrees in engineering, statistics (Cornell), computer science (GeorgiaTech) and MBA (Cornell) from Ivy league University. US Citizen (Naturalized)