course

Home / Courses / Python With Data Science

Course Objectives

• NumPy, pandas, Matplotlib, scikit-learn • Python REPLs • Jupyter Notebooks • Data analytics life-cycle phases • Data repairing and normalizing • Data aggregation and grouping • Data visualization • Data science algorithms for supervised and unsupervised machine learning

Agenda

  • • Using Modules
  • • Listing Methods in a Module
  • • Creating Your Own Modules
  • • List Comprehension
  • • Dictionary Comprehension
  • • String Comprehension
  • • Python 2 vs Python 3
  • • Sets (Python 3+)
  • • Python Idioms
  • • Python Data Science “Ecosystem”
  • • NumPy
  • • NumPy Arrays
  • • NumPy Idioms
  • • pandas
  • • Data Wrangling with pandas’ DataFrame
  • • SciPy
  • • Scikit-learn
  • • SciPy or scikit-learn?
  • • Matplotlib
  • • Python vs R
  • • Python on Apache Spark
  • • Python Dev Tools and REPLs
  • • Anaconda
  • • IPython
  • • Visual Studio Code
  • • Jupyter
  • • Jupyter Basic Commands
  • • Summary

  • • What is Data Science?
  • • Data Science Ecosystem
  • • Data Mining vs. Data Science
  • • Business Analytics vs. Data Science
  • • Data Science, Machine Learning, AI?
  • • Who is a Data Scientist?
  • • Data Science Skill Sets Venn Diagram
  • • Data Scientists at Work
  • • Examples of Data Science Projects
  • • An Example of a Data Product
  • • Applied Data Science at Google
  • • Data Science Gotchas
  • • Summary

  • • Big Data Analytics Pipeline
  • • Data Discovery Phase
  • • Data Harvesting Phase
  • • Data Priming Phase
  • • Data Logistics and Data Governance
  • • Exploratory Data Analysis
  • • Model Planning Phase
  • • Model Building Phase
  • • Communicating the Results
  • • Production Roll-out
  • • Summary

  • • Repairing and Normalizing Data
  • • Dealing with the Missing Data
  • • Sample Data Set
  • • Getting Info on Null Data
  • • Dropping a Column
  • • Interpolating Missing Data in pandas
  • • Replacing the Missing Values with the Mean Value
  • • Scaling (Normalizing) the Data
  • • Data Preprocessing with scikit-learn
  • • Scaling with the scale() Function
  • • The MinMaxScaler Object
  • • Summary

  • • Descriptive Statistics
  • • Non-uniformity of a Probability Distribution
  • • Using NumPy for Calculating Descriptive Statistics Measures
  • • Finding Min and Max in NumPy
  • • Using pandas for Calculating Descriptive Statistics Measures
  • • Correlation
  • • Regression and Correlation
  • • Covariance
  • • Getting Pairwise Correlation and Covariance Measures
  • • Finding Min and Max in pandas DataFrame
  • • Summary

  • • Data Aggregation and Grouping
  • • Sample Data Set
  • • The pandas.core.groupby.SeriesGroupBy Object
  • • Grouping by Two or More Columns
  • • Emulating the SQL’s WHERE Clause
  • • The Pivot Tables
  • • Cross-Tabulation
  • • Summary

  • • Data Visualization
  • • What is matplotlib?
  • • Getting Started with matplotlib
  • • The Plotting Window
  • • The Figure Options
  • • The matplotlib.pyplot.plot() Function
  • • The matplotlib.pyplot.bar() Function
  • • The matplotlib.pyplot.pie () Function
  • • Subplots
  • • Using the matplotlib.gridspec.GridSpec Object
  • • The matplotlib.pyplot.subplot() Function
  • • Hands-on Exercise
  • • Figures
  • • Saving Figures to File
  • • Visualization with pandas
  • • Working with matplotlib in Jupyter Notebooks
  • • Summary

  • • Data Science, Machine Learning, AI?
  • • Types of Machine Learning
  • • Terminology: Features and Observations
  • • Continuous and Categorical Features (Variables)
  • • Terminology: Axis
  • • The scikit-learn Package
  • • scikit-learn Estimators
  • • Models, Estimators, and Predictors
  • • Common Distance Metrics
  • • The Euclidean Metric
  • • The LIBSVM format
  • • Scaling of the Features
  • • The Curse of Dimensionality
  • • Supervised vs Unsupervised Machine Learning
  • • Supervised Machine Learning Algorithms
  • • Unsupervised Machine Learning Algorithms
  • • Choose the Right Algorithm
  • • Life-cycles of Machine Learning Development
  • • Data Split for Training and Test Data Sets
  • • Data Splitting in scikit-learn
  • • Hands-on Exercise
  • • Classification Examples
  • • Classifying with k-Nearest Neighbors (SL)
  • • k-Nearest Neighbors Algorithm
  • • k-Nearest Neighbors Algorithm
  • • The Error Rate
  • • Hands-on Exercise
  • • Dimensionality Reduction
  • • The Advantages of Dimensionality Reduction
  • • Principal component analysis (PCA)
  • • Hands-on Exercise
  • • Data Blending
  • • Decision Trees (SL)
  • • Decision Tree Terminology
  • • Decision Tree Classification in Context of Information Theory
  • • Information Entropy Defined
  • • The Shannon Entropy Formula
  • • The Simplified Decision Tree Algorithm
  • • Using Decision Trees
  • • Random Forests
  • • SVM
  • • Naive Bayes Classifier (SL)
  • • Naive Bayesian Probabilistic Model in a Nutshell
  • • Bayes Formula
  • • Classification of Documents with Naive Bayes
  • • Unsupervised Learning Type: Clustering
  • • Clustering Examples
  • • k-Means Clustering (UL)
  • • k-Means Clustering in a Nutshell
  • • k-Means Characteristics
  • • Regression Analysis
  • • Simple Linear Regression Model
  • • Linear vs Non-Linear Regression
  • • Linear Regression Illustration
  • • Major Underlying Assumptions for Regression Analysis
  • • Least-Squares Method (LSM)
  • • Locally Weighted Linear Regression
  • • Regression Models in Excel
  • • Multiple Regression Analysis
  • • Logistic Regression
  • • Regression vs Classification
  • • Time-Series Analysis
  • • Decomposing Time-Series
  • • Summary

  • Lab 1 – Learning the Lab Environment
  • Lab 2 – Using Jupyter Notebook
  • Lab 3 – Repairing and Normalizing Data
  • Lab 4 – Computing Descriptive Statistics
  • Lab 5 – Data Grouping and Aggregation
  • Lab 6 – Data Visualization with matplotlib
  • Lab 7 – Data Splitting
  • Lab 8 – k-Nearest Neighbors Algorithm
  • Lab 9 – The k-means Algorithm
  • Lab 10 – The Random Forest Algorithm
Tags
Technical Core Type: Core 4 Vendor: Web Age Solutions, Inc. Product Line: Other Technical LoD Tier: 1
FREE

Interested in course?


Course Type: Instructor Led