Skip to content
GSS > Training courses > Face to face > Introduction to Pyspark

Introduction to Pyspark

Open to
All staff in the Government Statistical Service
Training category
Data science, Using statistical computer programs
Type of training
Face to face
Length
2 days
Organiser
Learning Academy
Provider
Office for National Statistics
Location
London, Newport, Titchfield

Description:

This course will give you an understanding of Pyspark, the Python interface to the distributed processing tool Spark. With it, you will be able to handle huge data sets effortlessly, and process, query, and manipulate data which is beyond the reach of traditional programming languages.

Over two days, the course will cover the why and how of distributed processing, give a strong introduction to the key data structure of Pyspark, and teach you how to investigate data, combine it, query it and run complex transformations upon it.

With a hands-on approach, you will be writing a lot of code throughout the material, getting to immediately try out what you have just learnt, before finishing with a pair of case studies designed to combine everything you have learnt over the course.

Prerequisites:

Experience with Python is essential. No prior knowledge of Pyspark or distributed processing is needed.

Learning outcomes:

By the end of the course you will:

  • be confident using Pyspark
  • have an understanding of distributed programming
  • be able to import and export data
  • be able to investigate data sets
  • be able to manipulate data sets
  • be able to draw conclusions from data
  • be able to perform basic visualisation
  • have the knowledge to handle large data sets with efficient code

How to book:

Find this course on the Learning Academy eventbrite webpage.

To search the list of courses on eventbrite:

  1. Select “show more” at the bottom of the page.
  2. Press Ctrl+f on your keyboard if you’re using a PC or ⌘+f if you’re using a Mac.
  3. Type the name of the event that you’re looking for.

Contact:

If you have any problems or are unable to find the course on eventbrite please contact us.

Email: gss.capability@statistics.gov.uk