Information Technology

Python With Data Science

Covers theoretical and technical aspects of using Python in Applied Data Science projects and Data Logistics use cases.

2 Days 16 Hrs 9 Lessons

Course Objectives

• NumPy, pandas, Matplotlib, scikit-learn • Python REPLs • Jupyter Notebooks • Data analytics life-cycle phases • Data repairing and normalizing • Data aggregation and grouping • Data visualization • Data science algorithms for supervised and unsupervised machine learning

Agenda

• Using Modules
• Listing Methods in a Module
• Creating Your Own Modules
• List Comprehension
• Dictionary Comprehension
• String Comprehension
• Python 2 vs Python 3
• Sets (Python 3+)
• Python Idioms
• Python Data Science “Ecosystem”
• NumPy
• NumPy Arrays
• NumPy Idioms
• pandas
• Data Wrangling with pandas’ DataFrame
• SciPy
• Scikit-learn
• SciPy or scikit-learn?
• Matplotlib
• Python vs R
• Python on Apache Spark
• Python Dev Tools and REPLs
• Anaconda
• IPython
• Visual Studio Code
• Jupyter
• Jupyter Basic Commands
• Summary

• What is Data Science?
• Data Science Ecosystem
• Data Mining vs. Data Science
• Business Analytics vs. Data Science
• Data Science, Machine Learning, AI?
• Who is a Data Scientist?
• Data Science Skill Sets Venn Diagram
• Data Scientists at Work
• Examples of Data Science Projects
• An Example of a Data Product
• Applied Data Science at Google
• Data Science Gotchas
• Summary

• Big Data Analytics Pipeline
• Data Discovery Phase
• Data Harvesting Phase
• Data Priming Phase
• Data Logistics and Data Governance
• Exploratory Data Analysis
• Model Planning Phase
• Model Building Phase
• Communicating the Results
• Production Roll-out
• Summary

• Repairing and Normalizing Data
• Dealing with the Missing Data
• Sample Data Set
• Getting Info on Null Data
• Dropping a Column
• Interpolating Missing Data in pandas
• Replacing the Missing Values with the Mean Value
• Scaling (Normalizing) the Data
• Data Preprocessing with scikit-learn
• Scaling with the scale() Function
• The MinMaxScaler Object
• Summary

• Descriptive Statistics
• Non-uniformity of a Probability Distribution
• Using NumPy for Calculating Descriptive Statistics Measures
• Finding Min and Max in NumPy
• Using pandas for Calculating Descriptive Statistics Measures
• Correlation
• Regression and Correlation
• Covariance
• Getting Pairwise Correlation and Covariance Measures
• Finding Min and Max in pandas DataFrame
• Summary

• Data Aggregation and Grouping
• Sample Data Set
• The pandas.core.groupby.SeriesGroupBy Object
• Grouping by Two or More Columns
• Emulating the SQL’s WHERE Clause
• The Pivot Tables
• Cross-Tabulation
• Summary

• Data Visualization
• What is matplotlib?
• Getting Started with matplotlib
• The Plotting Window
• The Figure Options
• The matplotlib.pyplot.plot() Function
• The matplotlib.pyplot.bar() Function
• The matplotlib.pyplot.pie () Function
• Subplots
• Using the matplotlib.gridspec.GridSpec Object
• The matplotlib.pyplot.subplot() Function
• Hands-on Exercise
• Figures
• Saving Figures to File
• Visualization with pandas
• Working with matplotlib in Jupyter Notebooks
• Summary

• Data Science, Machine Learning, AI?
• Types of Machine Learning
• Terminology: Features and Observations
• Continuous and Categorical Features (Variables)
• Terminology: Axis
• The scikit-learn Package
• scikit-learn Estimators
• Models, Estimators, and Predictors
• Common Distance Metrics
• The Euclidean Metric
• The LIBSVM format
• Scaling of the Features
• The Curse of Dimensionality
• Supervised vs Unsupervised Machine Learning
• Supervised Machine Learning Algorithms
• Unsupervised Machine Learning Algorithms
• Choose the Right Algorithm
• Life-cycles of Machine Learning Development
• Data Split for Training and Test Data Sets
• Data Splitting in scikit-learn
• Hands-on Exercise
• Classification Examples
• Classifying with k-Nearest Neighbors (SL)
• k-Nearest Neighbors Algorithm
• k-Nearest Neighbors Algorithm
• The Error Rate
• Hands-on Exercise
• Dimensionality Reduction
• The Advantages of Dimensionality Reduction
• Principal component analysis (PCA)
• Hands-on Exercise
• Data Blending
• Decision Trees (SL)
• Decision Tree Terminology
• Decision Tree Classification in Context of Information Theory
• Information Entropy Defined
• The Shannon Entropy Formula
• The Simplified Decision Tree Algorithm
• Using Decision Trees
• Random Forests
• SVM
• Naive Bayes Classifier (SL)
• Naive Bayesian Probabilistic Model in a Nutshell
• Bayes Formula
• Classification of Documents with Naive Bayes
• Unsupervised Learning Type: Clustering
• Clustering Examples
• k-Means Clustering (UL)
• k-Means Clustering in a Nutshell
• k-Means Characteristics
• Regression Analysis
• Simple Linear Regression Model
• Linear vs Non-Linear Regression
• Linear Regression Illustration
• Major Underlying Assumptions for Regression Analysis
• Least-Squares Method (LSM)
• Locally Weighted Linear Regression
• Regression Models in Excel
• Multiple Regression Analysis
• Logistic Regression
• Regression vs Classification
• Time-Series Analysis
• Decomposing Time-Series
• Summary

Lab 1 – Learning the Lab Environment
Lab 2 – Using Jupyter Notebook
Lab 3 – Repairing and Normalizing Data
Lab 4 – Computing Descriptive Statistics
Lab 5 – Data Grouping and Aggregation
Lab 6 – Data Visualization with matplotlib
Lab 7 – Data Splitting
Lab 8 – k-Nearest Neighbors Algorithm
Lab 9 – The k-means Algorithm
Lab 10 – The Random Forest Algorithm

FREE

Interested in course?

Buy Course

Course Type: Instructor Led

course

Python With Data Science

Course Objectives

Agenda

Lesson 1: Python for Data Science

Lesson 2: Applied Data Science

Lesson 3: Data Analytics Life-cycle Phases

Lesson 4: Repairing and Normalizing Data

Lesson 5: Descriptive Statistics Computing Features in Python

Lesson 6: Data Aggregation and Grouping

Lesson 7: Data Visualization with matplotlib

Lesson 8: Data Science and ML Algorithms in scikit-learn

Lesson 9: Lab Exercises

Tags

FREE