Python for Data Analysis

Python for Data Analysis

Audience

Today, data analysis is an essential skill in the fields of business, science, and social science, and Python has become the preferred language for doing that data analysis. Adding Python data analysis to your skillset can lead to new career opportunities - that's where this course comes in, it's for all those who want to master the fundamentals of data analysis using Python.

Prerequisites

You should have experience with Python, this can be gained by attending our Python Programming course.

Duration

4 days. Hands on.

Course Objectives

This course is comprised of 4 sections.

Section 1 helps you get started with data analysis as quickly and effectively as possible. You’ll learn how to use JupyterLab and Jupyter Notebooks to organize and develop your analyses. You’ll learn how to use a subset of the Pandas module for data analysis and visualization. And you’ll learn how to use a subset of the Seaborn module to create professional data visualizations that can be used for presentations. At the end of this section, you’ll be able to start doing analyses of your own.

Most analysis is descriptive analysis in which you analyze past data to help you gain new insights. That’s why section 2 presents the critical descriptive analysis skills that you need for success on the job. That includes:

  • How to read data into a Pandas DataFrame
  • How to clean the data by dropping unneeded rows and columns and fixing missing values, data types, and outliers
  • How to prepare the data by adding columns, modifying the data in columns, and combining DataFrames
  • How to analyze the data by grouping and aggregating the data, using pivot tables, and more
  • How to analyze time-series data by reindexing, downsampling, and working with rolling windows and running totals

Predictive analysis takes data analysis to another level by using statistical models to predict unknown or future values. Although a complete treatment of predictive analysis is far beyond the scope of this course, all analysts should know the basic concepts and skills. That’s why section 3 presents those concepts and gets you started doing your own predictions. This introduction includes how to find the correlations between variables, how to use Scikit-learn to work with linear regression models, and how to use Seaborn to create and plot linear regression models. It also shows you how to select the right variables and the right number of variables for multiple regressions... one of the critical skills for doing an effective job of making predictions.

Section 4 presents a number of case studies that show you how the skills you’ve been learning can be applied to real-world datasets:

  • The polling data for the 2016 presidential election
  • The US Forest Service data for forest fires
  • The US social survey data taken from hundreds of polls
  • The basketball shot location data for NBA player Stephen Curry

Course Content

Section 1
Introduction to Python for data analysis
The Pandas essentials for data analysis
The Pandas essentials for data visualization
The Seaborn essentials for data visualization

Section 2
How to get the data
How to clean the data
How to prepare the data
How to analyze the data
How to analyze time-series data

Section 3
How to make predictions with a linear regression model
How to make predictions with a multiple regression model

Section 4
The Polling case study
The Forest Fires case study
The Social Survey case study
The Sports Analytics case study

Virtual Courses

ALL of our courses can be delivered virtually. And our Bath public schedule of courses are now available as live virtual sessions, using the popular Zoom Virtual Classroom and remote labs. Delegates can test their access at: www.zoom.us/test

On-Site Courses

Can't attend one of our public classes? Booking for multiple people?

All our courses are available on your site! Delivered for your staff, at your premises.

Contact us to find out more...