Infrastructure for Reproducible Analytical Pipelines (RAP)

RAP requires one or more programming languages, and a version control system. There are no other requirements.

Some other tools are highly recommended.

  • Slack for conversing with the RAP community and other people working with data in the UK civil service. The Gov Data Science Slack has many active members across the civil service. The #rap_collaboration channel is the best place to ask questions about RAP, and there are channels for different technologies, statistical methods and learning communities. Most departments allow access.
  • GitHub or GitLab for publishing code and collaborating within and beyond your team. This website is developed and published on GitHub. Many RAP projects are on GitHub.

Consider the Technology Code of Practice, a set of criteria to help government design, build and buy technology.

Programming languages

R and Python are the most popular programming languages for working with data. Lots of help is available for them within the RAP community and in general. Both languages are widely taught at universities to students in many fields, not just computer science, so it is often possible to recruit people with programming experience as well as a specialism in your area.

JavaScript and C# (pronounced “C sharp”) have also been used in RAP projects. HTML and CSS are used for publishing on the internet. SQL is a language for use with databases and is often used alongside other languages.

Most modern languages are free.

See some examples of RAP projects using different programming languages, sometimes a combination of different languages.

Software for writing code

You can write code in almost any text editor, but you will be much more productive using software designed for writing code, called an integrated development environment (IDE). There is a huge choice, and most of them are free.

For writing code in the R language, RStudio is by far the most popular IDE. It is free, and paid-for licences are also available that include enterprise features and support.

There is no clear leader for the Python language. Visual Studio Code, Spyder and Pycharm are popular, but there are many others.

Version control systems

Version control systems track changes to your code.

Git is the most widely used version control system. It is supported by GitHub, GitLab and BitBucket, which are the biggest platforms for developing code collaboratively. It would be unwise to choose a system not supported by those platforms.

GitHub, GitLab and BitBucket are online platforms for developing code collaboratively. They are free, with some restrictions that are relaxed if you pay a subscription. You can host GitLab yourself, offline.

Continuous integration

Continuous integration systems perform some action whenever you change your code. Usually they are used to test your code by running a set of tests that you have written, and alert you to faults. You won’t need continuous integration straight away, but once you are maintaining several projects it can save you a lot of time.

Travis is one of the most widely used continuous integration systems. It is an online service and integrates well with GitHub. It can be used securely with code that is private. It is free, and you can pay for more features and support. You can host Travis yourself, offline.

Package managers

Whatever programming language you choose, you will need to extend it by installing packages or libraries. A package is a set of code that someone else has written and published to help other people solve particular problem. They are a bit like apps on your phone – you can manage without them, but you will be much slower and make more mistakes.

Unless your analysts are allowed to install any package they like, you will need a way to control what they may install, without delaying them. There are no good ways to do this yet. You are encouraged to ask some RAP champions for advice, or ask on Gov Data Science Slack in the #rap_collaboration or #analytical-platforms channels.