CI of Complex Cluster Environment with CircleCI and DigitalOcean

At CloudSlang, we’re passionate about all thing open source. Integrating with other open source projects is a key part of what we’re all about. We focus on integrating with many cloud native technologies, such as CoreOS, Docker, and Consul.

As such, it is important for us to test these integrations automatically as part of our CI. But setting up a complex cluster environment (like CoreOS/Swarm cluster) using popular CI solutions such as Travis or CircleCI for testing can be tricky.

In this post we’ll outline how we test our content integrations using a combination of DigitalOcean, CircleCI and CloudSlang. The same formula can be used to set up your own complex testing environments.

Recently CloudSlang introduced a testing framework to write tests for CloudSlang content and run them with the Build Tool. The Build Tool’s features include grouping tests into test suites and validating flow results, expected outputs or exceptions.

CloudSlang has a fast growing repository of content which is housed on GitHub. When a flow in the repository is changed it can affect other parts of the content that depend on the changed flow. Therefore, it was crucial for us to come up with a proper CI solution.

After investigating the available CI providers and our configuration needs, we decided to split our test suites in two categories and distribute them between CircleCI and Travis. Both solutions are based on the CloudSlang Build Tool to validate our content: CircleCI runs cluster related test suites (e.g. CoreOS, Docker Swarm) and base content (which does not require SSH access) while Travis handles most of the Docker content and flows that require access to the environment via SSH.

Logical Steps of the Build Process

The process begins with working on a local content repository. The content repository contains two main folders: content (for regular CloudSlang files) and test (test related CloudSlang files). The changes could be adding new content and tests or just updating the existing files.

After the changes are completed they are committed to a working branch. When all the commits are ready, they are pushed to GitHub and associated with a pull request.

Any changes pushed to the pull request trigger a CircleCI build which is configured based on a circle.yml file. CircleCI builds your code using Docker containers. Currently we use a parallelism of two containers, each of them running the Build Tool. Combining CircleCI’s parallel modifier and environmental variables (e.g. node index) you can specify if a command should be executed on every machine (downloading the Build Tool) or just a specific one (creating resources that are used by only one machine).

circle.yml

test:  pre:    - ? > ### machine 0        if [ "${CIRCLE_NODE_INDEX}" = "0" ];        then docker run -d -p 49165:8080 jenkins        && docker run -d -p 8500:8500 -p 8600:8600/udp fhalim/consul;        fi      : parallel: true    - if eval "${RULE_DROPLET_MACHINE_NOT_FORK}"; then chmod +x ci-env/create_droplets.sh && ci-env/create_droplets.sh; fi:        parallel: true ### machine 1    - ? > ### every machine        wget https://github.com/CloudSlang/cloud-slang/releases/download/cloudslang-0.8.0/cslang-builder.zip        && unzip cslang-builder.zip -d cslang-builder        && chmod +x cslang-builder/bin/cslang-builder        && mkdir cslang-builder/lib/Lib        && pip install -r python-lib/requirements.txt -t cslang-builder/lib/Lib      : parallel: true    - ? > ### machine 1        if eval "${RULE_DROPLET_MACHINE_NOT_FORK}";        then chmod +x ci-env/wait_for_droplets_and_update_test_inputs.sh        && ci-env/wait_for_droplets_and_update_test_inputs.sh;        fi      : parallel: true

On the first machine (machine 0) the default test suite is activated. It verifies the base content which includes things like working with files or string operations. This test suite does not require additional environmental configuration.

Tests running on the second machine (machine 1) require additional configuration. In this case we are using a cluster of CoreOS machines as the infrastructure used by the test flows. We use DigitalOcean to create and manage the CoreOS machines (droplets). Each build has its own set of droplets named after the following convention: ci-{build_number}-coreos-{machine_number} so the builds are running completely independently.

Once all the parallel executions finish running, CircleCI collects and evaluates the build result. CircleCI also publishes an artifact called execution.log which is created by the Build Tool and contains relevant logging information about the execution – which can be very useful when debugging flow executions.

Running tests against the droplets

The second container’s execution phases are presented in the following image:

These steps use bash scripts to execute relevant commands. The scripts and configuration files can be found under the ci-env directory. Droplets are created and managed using DigitalOcean’s REST API. The first step (create_droplets.sh) generates a discovery URL for the CoreOS cluster and updates the cloud-config file with the newly created URL. The cloud-config file contains regular CoreOS configuration data for the cluster (private networking configuration, starting up services like etcd and fleet) and a write_files section where we prepare a unit file (docker-tcp.socket) for the Swarm use cases (this socket will be used to enable Docker Remote API). Once the cloud-config file is ready we can create the droplets by sending POST requests to DigitalOcean:

  CURL_OUTPUT=$(curl -i -s -X POST https://api.digitalocean.com/v2/droplets \                -H 'Content-Type: application/json' \                -H "Authorization: Bearer ${DO_API_TOKEN}" \                -d "{                  \"name\":\"${COREOS_MACHINE}\",                  \"ssh_keys\":[${DO_DROPLET_SSH_PUBLIC_KEY_ID}],"'                  "region":"ams3",                  "size":"512mb",                  "image":"coreos-stable",                  "backups":false,                  "ipv6":false,                  "private_networking":true,                  "user_data": "'"$(cat ci-env/cloud-config.yaml | sed 's/"/\\"/g')"'"                }')

The request is dynamically generated using global environmental variables defined in the project’s properties page (DO_API_TOKEN, DO_DROPLET_SSH_PUBLIC_KEY_ID), environmental variables defined in the script (COREOS_MACHINE: name of the machine, depends on build number) and cloud-config file. The script also verifies that the requests were accepted by DigitalOcean (status code: 202) storing the droplet IDs in an array:

DISCOVERY_URL: https://discovery.etcd.io/5f30dd91642ae055ff24efd6b2359f3dci-1439-coreos-1 (ID: 6748964) droplet creation request accepted - status code: 202ci-1439-coreos-2 (ID: 6748965) droplet creation request accepted - status code: 202ci-1439-coreos-3 (ID: 6748966) droplet creation request accepted - status code: 202

The second step (which is executed in parallel for both build machines) downloads and prepares the Build Tool. The related commands are read from the circle.yml file.

The third step (wait_for_droplets_and_update_test_inputs.sh) waits for the droplets to become operational, adds the necessary post startup configuration (e.g. activating TCP socket for Docker Swarm) and updates the inputs with the actual information (droplet IP addresses, private key file location). The script is periodically checking the droplet status by sending GET requests to DigitalOcean:

Droplet(6748964) information retrieved successfullyDroplet(6748964) status: newDroplet(6748964) information retrieved successfullyDroplet(6748964) status: newDroplet(6748964) information retrieved successfullyDroplet(6748964) status: newDroplet(6748964) information retrieved successfullyDroplet(6748964) status: activeDroplet(6748964) IPv4 address: 188.166.45.76

Here we combine regular bash scripting (grep, awk) with Python (for complex data processing):

          IP_ADDRESS=$(\          echo "$RESPONSE_BODY_JSON" | python -c \'if True:          import json,sys;          obj = json.load(sys.stdin);          ipv4_container_list = obj["droplet"]["networks"]["v4"];          public_ipv4_container_list = filter(lambda x : x["type"] == "public", ipv4_container_list);          print public_ipv4_container_list[0]["ip_address"] if len(public_ipv4_container_list) > 0 else "";'\          )

Two CoreOS machines will be used as Swarm agents (the remaining one will have the Swarm manager) so we need to enable a Docker socket for these machines. The unit file which defines the socket was created at droplet startup time based on the cloud-config file. At this stage we need to enable the socket with the following command:

  LAST_LINE=$(ssh -i ${SSH_KEY_PATH} \  -o UserKnownHostsFile=/dev/null \  -o StrictHostKeyChecking=no \  core@${DROPLET_IP} \  'sudo systemctl enable docker-tcp.socket \  && sudo systemctl stop docker \  && sudo systemctl start docker-tcp.socket \  && sudo systemctl start docker \  && echo -e "\nSUCCESS"' | tail -n 1)

Now the droplets are ready to be accessed. While the CoreOS cluster is properly configured, the Swarm cluster will be set up using flows since Swarm is based on Docker containers (need to run manager and agent containers on the machines). The last step before starting the Build Tool is updating the test input files with the actual data (e.g. IP addresses of the droplets):

find test -type f -exec sed -i "s/<coreos_host_1>/${DROPLET_IP_ARRAY[0]}/g" {} +

At this point everything is prepared to run the tests. The Build Tool is executed by activating the relevant test suites. The flows in these suites interact with the droplets using SSH calls. The Build Tool displays relevant statistics about the execution – coverage, how many test cases passed and cause of the problem in case of test failures.

02:41:58 [INFO] 97% of the content has tests02:41:58 [INFO] Out of 158 executables, 154 executables have tests02:41:58 [INFO] 02:41:58 [INFO] ------------------------------------------------------------02:41:58 [INFO] BUILD SUCCESS02:41:58 [INFO] ------------------------------------------------------------02:41:58 [INFO] Found 153 slang files under directory: "/home/ubuntu/cloud-slang-content" and all are valid.02:41:58 [INFO] 11 test cases passed02:41:58 [INFO] 145 test cases skipped

Once the tests finish – in the last step – we need to delete the droplets used for this build by sending DELETE requests to DigitalOcean (cleanup_env.sh).

Managing Sensitive Data

Since the working mechanism of our build system is based on heterogeneous components we need a secure way to store sensitive data (e.g. DigitalOcean API token for authentication). CircleCI provides the possibility to define global environmental variables where we can store this kind of data. These environmental variables are not included in builds which come from external pull requests (pull requests created by users who are not part of CloudSlang organization), otherwise they could be easily obtained. Someone could print them by modifying the circle.yml file inside the pull request.

Conclusions

In this post we’ve show you how CloudSlang developed its own continuous integration mechanism by combining technologies like the Build Tool (the worker that executes tests), GitHub to host the codebase, CircleCI (CI solution to define the skeleton of the build system) and DigitalOcean (IaaS provider to host testing machines).

If you would like to learn more about the build system or just check out the project, visit us at CloudSlang’s website or GitHub page.

And remember – keep the bar green …