Introduction to Pyspark
- Open to
- Government analysts
- Training category
- Analytical, Data science
- Type of training
- Online
- Length
- 2 days
- Organiser
- Data Science Campus Faculty
- Provider
- Data Science Campus Faculty
- Location
- Online
This course will give you an understanding of Pyspark, which is the Python interface to the distributed processing tool “Spark”. Pyspark will help you to handle huge data sets effortlessly. It will also help you to process, query and manipulate data which is beyond the reach of traditional programming languages.
The course will:
- cover distributed processing
- give a strong introduction to the main data structure of Pyspark
- teach you how to investigate data, combine it, query it, and run complex transformations upon it
This is a practical course. You will write a lot of code throughout the course and there will be plenty of opportunities to practice what you are learning. The course will end with a pair of case studies designed to combine everything you have learnt over the course.
Who this course is for
To enrol on this course you will need to have experience with Python. You do not need to have any knowledge of Pyspark or distributed processing to take part in this course.
Learning outcomes
On this course you will:
- gain confidence in using Pyspark
- gain an understanding of distributed programming
- learn to import and export data
- learn to investigate data sets
- learn to manipulate data sets
- learn to draw conclusions from data
- learn to perform basic visualisation
- gain the knowledge to handle large data sets with efficient code
How to book
Please use your Learning Hub account to enrol on this course.
If you do not have a Learning Hub account, please contact Data.Science.Campus.Faculty@ons.gov.uk.
Contact
If you would like more information about this course, please email Data.Science.Campus.Faculty@ons.gov.uk.