AI+ Data™
About Course
Executive Summary
The AI+ Data™ program equips professionals with essential data science skills, covering statistics, programming, data wrangling, machine learning, and generative AI.
-
Learn to analyze, model, and visualize data for actionable insights.
-
Apply advanced techniques to solve real-world problems using Python, R, and cloud tools.
-
Complete a capstone project on Employee Attrition Prediction.
-
Gain expertise in Data-Driven Decision Making and Data Storytelling, enabling effective communication of insights to stakeholders.
Learning Outcomes
Participants will be able to:
-
Understand the fundamentals and lifecycle of data science projects.
-
Apply statistical concepts and probability for informed analysis.
-
Manipulate, clean, and preprocess structured and unstructured data.
-
Develop data visualization and storytelling skills to convey insights effectively.
-
Build predictive models using machine learning and generative AI tools.
-
Optimize model performance and apply advanced ML techniques like ensemble learning and dimensionality reduction.
-
Make data-driven decisions using open-source tools (Power BI, Apache Superset, Pentaho, Redash).
-
Communicate findings effectively through dashboards, reports, and narratives.
Course Modules
Module 1 – Foundations of Data Science
-
Introduction to Data Science: concepts, importance, and applications
-
Data Science Life Cycle: business problem, data preparation, exploratory analysis, modeling, deployment, evaluation
-
Real-world data science applications
Module 2 – Foundations of Statistics
-
Descriptive & inferential statistics
-
Probability distributions & central limit theorem
-
Hypothesis testing & confidence intervals
Module 3 – Data Sources and Types
-
Structured, semi-structured, unstructured data
-
Accessing data: databases, APIs, web scraping
-
Data storage: SQL & NoSQL databases
-
Hands-on: querying and handling different data types
Module 4 – Programming Skills for Data Science
-
Python and R basics
-
Key libraries: NumPy, Pandas, Matplotlib, Seaborn, ggplot2, dplyr
-
Hands-on: data manipulation and visualization
Module 5 – Data Wrangling & Preprocessing
-
Handling missing values: imputation techniques
-
Outlier detection & data transformation: normalization & standardization
-
Hands-on: cleaning, preprocessing, and preparing data
Module 6 – Exploratory Data Analysis (EDA)
-
Summary statistics and data visualization
-
Selecting the right visualization: histograms, scatter plots, box plots
-
Hands-on: visualizations with Python (Matplotlib, Seaborn) and R (ggplot2)
Module 7 – Generative AI Tools for Insights
-
Introduction to generative AI: autoencoders, GANs, VAEs
-
Applications in data synthesis, augmentation, anomaly detection
-
Hands-on exercises with Gen AI tools
Module 8 – Machine Learning Refresher
-
Supervised learning: regression, KNN, logistic regression
-
Unsupervised learning: clustering, decision trees, SVM, hierarchical clustering
-
Association rule learning
-
Hands-on exercises with ML tools
Module 9 – Advanced Machine Learning
-
Ensemble learning: Random Forest, Bagging, Boosting, Stacking, XGBoost
-
Dimensionality reduction: PCA, t-SNE
-
Advanced optimization: SGD, Adam, RMSprop, LDA, momentum-based, learning rate schedulers
-
Practical tips for model training and optimization
Module 10 – Data-Driven Decision Making
-
Importance of data-driven decision making
-
Tools: Apache Superset, Pentaho, Redash, Power BI
-
Case study: Adidas sales dataset for predictive modeling, segmentation, and insights
Module 11 – Data Storytelling
-
Crafting compelling narratives with data
-
Identifying use cases, business relevance, and audience
-
Visualizing data for impact: charts, graphs, maps, dashboards
-
Interactive and engaging presentation techniques
Module 12 – Capstone Project: Employee Attrition Prediction
-
Problem statement, data collection, and preparation
-
Exploratory data analysis and feature engineering
-
Predictive modeling: logistic regression, decision trees, random forests, gradient boosting
-
Model evaluation: accuracy, precision, recall, F1-score
-
Data storytelling: dashboards, visualizations, and actionable business insights
