sipgate blog auf Englisch?
Dieser und zukünftig erscheinende Artikel der Blogreihe „Continuous Delivery @sipgate“ werden auf Englisch verfasst, um der gemeinsamen Sprache des Zielpublikums zu entsprechen und so einen größeren Nutzen zu schaffen.
Introduction to the series of blog articles
We decided to write and publish these blog articles because we want them to cover what we could not find publicly ourselves when we started our attempt to implement continuous delivery. Our goal with these articles is to help other people and make their journey shorter. There will be code, there will be numbers and there will be configuration files. So expect rather technical articles after this introduction.
Here at sipgate, we have produced a lot of program code over the years. About a decade ago, we started out with less than a dozen people and used a relatively small set of available open source software programs on a small number of servers stacked in a cupboard. These computers provided our first product, an affordable and flexible voice over IP telephony service, for a number of private customers.
This grew to over a hundred people working with a mixture of a larger set of open source software programs alongside some inevitable proprietary things and our own software on hundreds of servers in modern data center environments. They still offer the private customer product and additionally provide a scalable business phone solution for thousands of customers.
While the company, its technical infrastructure, the number of customers we have and the number of lines of code we use grew, the software deployment process did not really keep up and introduced a bunch of problems in the last couple of months. Specifically, these problems occured when it came to correcting or enhancing features in software written by past sipgate employees a long time ago. Fixing things in that kind of code is one thing. Getting it to the production system often introduced new challenges.
Moreover, since we employed the concept of crossfunctionally working teams, developers no longer work in a development-only environment but split up into several product oriented teams. This has also made safe (manual) software deployment more difficult because developers suddenly had to coordinate deployment and could not easily gather other people’s deployment knowledge for a certain piece of software.
All this sums up to a situation where we cannot deliver software as quickly as we would like to. This series will cover our ongoing journey towards the promising concept of „continuous delivery“ which we believe will empower us to regain the responsiveness we used to have in the early days.
What is continuous delivery?
There are plenty of prolonged explanations (there’s even a book) on what is meant by the term „continuous delivery“ and people are debating all over the place what is considered a part of it and what is not. Here we’ll not try to rephrase that, but instead briefly and superficially describe the „sipgate vision“ of what we want to achieve by implementing it.
We want to be able to quickly provide new features to customers in order to establish a fast feedback loop. The goal of that loop is to be able to rapidly learn about and adapt to customer demands.
The first step towards that is implementing an automated, safe and fast software deployment process. Developers should be able to concentrate on developing awesome new features rather than spend a lot of time worrying about how to deploy the new code.
Classical software deployment abstract
Deploying a new version of a piece of software is usually a manual process. In the most simple form it involves
- logging in to a server
- stopping the running version of the software
- installing the new version
- starting the new version
- verifying that the new version works as expected
For whatever time goes by between steps 2 and 4, the service will be offline and customers will not be able to use it.
However, this simple a setup is rarely the case. It’s more likely that one has to take the designated server out of a load balancing server pool, do what was described earlier, and put it back into load balancing. Then rinse and repeat for every server that runs the software. At least, in this scenario, the software is not offline at any point in time due to the load balancing mechanism. On the other hand, verifying the functionality of each updated instance is likely harder because the server cannot be accessed through the load balancer. In this setup one also has to think about the possibility that during the upgrade process, some users might see the old, while some users get to see the new version of the software through the load balancer. For many, this might not be a desirable situation.
As you can see, these are already a lot of steps that can be messed up and the setup can be as complex as you like depending on your infrastructure and the code in question. Maybe there are databases involved, maybe there are other services affected that also need handling when this particular software is restarted, maybe there is a failover cluster that needs some love during the process … you name it. And then, hopefully, there is a good testing environment that you go through with all of this prior to doing it again in the production environment.
Essentially, converting these manually executed steps into an automated process is what we wanted to do. People started tossing around the catchy term „continuous delivery“ …
Where did we start (how did we use to deploy software)?
Obviously, there’s a version control system for the code. As we use Debian linux, close to every code repository has a „debian“ folder in order to build debian packages. These packages are then installed on the servers. The rest of the repositories have a
Makefile that is used to install the software on the server (yes, like „
make install„). For some of our own programs the deploying person has to know about specific „quirks“ of that particular code. Like having to manually update a certain field in a database or remember to do $thing before and/or after the update.
For package building, there is a jenkins setup that clones the repositories, builds the packages and uploads them to our own debian repository. Every developer can log in to any of our production servers that run our own code. There is a lengthy
/etc/sudoers configuration so that for every $software, the devs have sudo access for commands to start/stop/install it and look at the relevant logfiles. No one has „sudo ALL“ on production machines except a small group of systems administrators. Overall we are talking about fifty-ish packages, each installed on their own individual set of the 400 linux machines in addition to around a hundred testing machines we run.
This was not that bad a setup in 2005 but it reached its peak quite some time ago and does not scale well to todays requirements.
Continuous deployment „vision“
Whenever a developer, no matter if they are long term employees or first day newbies, implements a new feature or improves existing code, the change should be made available to the customer as quickly and safely as possible.
In more technical detail: A developer pushes a change to the code repository. This triggers a build process that automatically runs unit tests against the code to ensure it does not introduce any regressions and generally „works“. Then the deployment in the testing environment is triggered, which involves installing the new version and running integration and acceptance tests against that new version. If the deployment is unsuccessful on any of the involved machines, the previously installed version of the software is re-installed so that everything is in a known-to-work state. Only after success in the testing environment, the same process is triggered for the production environment. While the deployment process is running, the person that pushed the code can look at progress bars and at a central logging system that has all the logs from all the machines involved. They should never need to manually log in to any server.
For now, we’ll leave you with this introduction. The upcoming articles of this series will cover steps of our journey towards the desired situation that hopefully can some day be regarded as some sort of continuous delivery.
- Set up central logging
- Automate software deployment
- Implement rollback
- Implement automatic database changes
- Implement more automated testing