Introduction to Pyspark

Open to
Government analysts
Training category
Analytical, Data science
Type of training
Online
Length
2 days
Organiser
Data Science Campus Faculty
Provider
Data Science Campus Faculty
Location
Online

This course will give you an understanding of Pyspark, which is the Python interface to the distributed processing tool “Spark”. Pyspark will help you to handle huge data sets effortlessly. It will also help you to process, query and manipulate data which is beyond the reach of traditional programming languages.

The course will:

  • cover distributed processing
  • give a strong introduction to the main data structure of Pyspark
  • teach you how to investigate data, combine it, query it, and run complex transformations upon it

This is a practical course. You will write a lot of code throughout the course and there will be plenty of opportunities to practice what you are learning. The course will end with a pair of case studies designed to combine everything you have learnt over the course.

Who this course is for

To enrol on this course you will need to have experience with Python. You do not need to have any knowledge of Pyspark or distributed processing to take part in this course.

Learning outcomes

On this course you will:

  • gain confidence in using Pyspark
  • gain an understanding of distributed programming
  • learn to import and export data
  • learn to investigate data sets
  • learn to manipulate data sets
  • learn to draw conclusions from data
  • learn to perform basic visualisation
  • gain the knowledge to handle large data sets with efficient code

How to book

Please use your Learning Hub account to enrol on this course.

If you do not have a Learning Hub account, please contact Data.Science.Campus.Faculty@ons.gov.uk.

Contact

If you would like more information about this course, please email Data.Science.Campus.Faculty@ons.gov.uk. 

Related courses

Introduction to Python 

Introduction to R

Foundations of SQL