How to build Graal-enabled JDK8 on CircleCI?

Citation: feature image on the blog can be found on flickr and created by Luca Galli. The image in one of the below sections can be also found on flickr and created by fklv (Obsolete hipster).


The GraalVM compiler is a replacement to HotSpot’s server-side JIT compiler widely known as the C2 compiler. It is written in Java with the goal of better performance (among other goals) as compared to the C2 compiler. New changes starting with Java 9 mean that we can now plug in our own hand-written C2 compiler into the JVM, thanks to JVMCI. The researchers and engineers at Oracle Labs) have created a variant of JDK8 with JVMCI enabled which can be used to build the GraalVM compiler. The GraalVM compiler is open source and is available on GitHub (along with the HotSpot JVMCI sources) needed to build the GraalVM compiler). This gives us the ability to fork/clone it and build our own version of the GraalVM compiler.

In this post, we are going to build the GraalVM compiler with JDK8 on CircleCI. The resulting artifacts are going to be:

– JDK8 embedded with the GraalVM compiler, and
– a zip archive containing Graal & Truffle modules/components.

Note: we are not covering how to build the whole of the GraalVM suite in this post, that can be done via another post. Although these scripts can be used to that, and there exists a branch which contains the rest of the steps.

Why use a CI tool to build the GraalVM compiler?

Screenshot_2019-08-06 Graal lovely

Continuous integration (CI) and continuous deployment (CD) tools have many benefits. One of the greatest is the ability to check the health of the code-base. Seeing why your builds are failing provides you with an opportunity to make a fix faster. For this project, it is important that we are able to verify and validate the scripts required to build the GraalVM compiler for Linux and macOS, both locally and in a Docker container.

A CI/CD tool lets us add automated tests to ensure that we get the desired outcome from our scripts when every PR is merged. In addition to ensuring that our new code does not introduce a breaking change, another great feature of CI/CD tools is that we can automate the creation of binaries and the automatic deployment of those binaries, making them available for open source distribution.

Let’s get started

During the process of researching CircleCI as a CI/CD solution to build the GraalVM compiler, I learned that we could run builds via two different approaches, namely:

– A CircleCI build with a standard Docker container (longer build time, longer config script)
– A CircleCI build with a pre-built, optimised Docker container (shorter build time, shorter config script)

We will now go through the two approaches mentioned above and see the pros and cons of both of them.

Approach 1: using a standard Docker container

For this approach, CircleCI requires a docker image that is available in Docker Hub or another public/private registry it has access to. We will have to install the necessary dependencies in this available environment in order for a successful build. We expect the build to run longer the first time and, depending on the levels of caching, it will speed up.

To understand how this is done, we will be going through the CircleCI configuration file section-by-section (stored in .circleci/circle.yml), see config.yml in .circleci for the full listing, see commit df28ee7 for the source changes.

Explaining sections of the config file

The below lines in the configuration file will ensure that our installed applications are cached (referring to the two specific directories) so that we don’t have to reinstall the dependencies each time a build occurs:

    dependencies:
      cache_directories:
        - "vendor/apt"
        - "vendor/apt/archives"

We will be referring to the docker image by its full name (as available on http://hub.docker.com under the account name used – adoptopenjdk). In this case, it is a standard docker image containing JDK8 made available by the good folks behind the Adopt OpenJDK build farm. In theory, we can use any image as long as it supports the build process. It will act as the base layer on which we will install the necessary dependencies:

        docker:
          - image: adoptopenjdk/openjdk8:jdk8u152-b16

Next, in the pre-Install Os dependencies step, we will restore the cache, if it already exists, this may look a bit odd, but for unique key labels, the below implementation is recommended by the docs):

          - restore_cache:
              keys:
                - os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - os-deps-{{ arch }}-{{ .Branch }}

Then, in the Install Os dependencies step we run the respective shell script to install the dependencies needed. We have set this step to timeout if the operation takes longer than 2 minutes to complete (see docs for timeout):

          - run:
              name: Install Os dependencies
              command: ./build/x86_64/linux_macos/osDependencies.sh
              timeout: 2m

Then, in then post-Install Os dependencies step, we save the results of the previous step – the layer from the above run step (the key name is formatted to ensure uniqueness, and the specific paths to save are included):

          - save_cache:
              key: os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - vendor/apt
                - vendor/apt/archives

Then, in the pre-Build and install make via script step, we restore the cache, if one already exists:

          - restore_cache:
              keys:
                - make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - make-382-{{ arch }}-{{ .Branch }}

Then, in the Build and install make via script step we run the shell script to install a specific version of make and it is set to timeout if step takes longer than 1 minute to finish:

          - run:
              name: Build and install make via script
              command: ./build/x86_64/linux_macos/installMake.sh
              timeout: 1m

Then, in the post Build and install make via script step, we save the results of the above action to the cache:

          - save_cache:
              key: make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - /make-3.82/
                - /usr/bin/make
                - /usr/local/bin/make
                - /usr/share/man/man1/make.1.gz
                - /lib/

Then, we define environment variables to update JAVA_HOME and PATH at runtime. Here the environment variables are sourced so that we remember them for the next subsequent steps till the end of the build process (please keep this in mind):

          - run:
              name: Define Environment Variables and update JAVA_HOME and PATH at Runtime
              command: |
                echo '....'     <== a number of echo-es displaying env variable values
                source ${BASH_ENV}

Then, in the step to Display Hardware, Software, Runtime environment and dependency versions, as best practice we display environment-specific information and record it into the logs for posterity (also useful during debugging when things go wrong):

          - run:
              name: Display HW, SW, Runtime env. info and versions of dependencies
              command: ./build/x86_64/linux_macos/lib/displayDependencyVersion.sh

Then, we run the step to setup MX – this is important from the point of view of the GraalVM compiler (mx) is a specialised build system created to facilitate compiling and building Graal/GraalVM and  components):

          - run:
              name: Setup MX
              command: ./build/x86_64/linux_macos/lib/setupMX.sh ${BASEDIR}

Then, we run the important step to Build JDK JVMCI (we build the JDK with JVMCI enabled here) and timeout, if the process takes longer than 15 minutes without any output or if the process takes longer than 20 minutes in total to finish:

          - run:
              name: Build JDK JVMCI
              command: ./build/x86_64/linux_macos/lib/build_JDK_JVMCI.sh ${BASEDIR} ${MX}
              timeout: 20m
              no_output_timeout: 15m

Then, we run the step Run JDK JVMCI Tests, which runs tests as part of the sanity check after building the JDK JVMCI:

          - run:
              name: Run JDK JVMCI Tests
              command: ./build/x86_64/linux_macos/lib/run_JDK_JVMCI_Tests.sh ${BASEDIR} ${MX}

Then, we run the step Setting up environment and Build GraalVM Compiler, to set up the build environment with the necessary environment variables which will be used by the steps to follow:

          - run:
              name: Setting up environment and Build GraalVM Compiler
              command: |
                echo ">>>> Currently JAVA_HOME=${JAVA_HOME}"
                JDK8_JVMCI_HOME="$(cd ${BASEDIR}/graal-jvmci-8/ && ${MX} --java-home ${JAVA_HOME} jdkhome)"
                echo "export JVMCI_VERSION_CHECK='ignore'" >> ${BASH_ENV}
                echo "export JAVA_HOME=${JDK8_JVMCI_HOME}" >> ${BASH_ENV}
                source ${BASH_ENV}

Then, we run the step Build the GraalVM Compiler and embed it into the JDK (JDK8 with JVMCI enabled) which timeouts if the process takes longer than 7 minutes without any output or longer than 10 minutes in total to finish:

          - run:
              name: Build the GraalVM Compiler and embed it into the JDK (JDK8 with JVMCI enabled)
              command: |
                echo ">>>> Using JDK8_JVMCI_HOME as JAVA_HOME (${JAVA_HOME})"
                ./build/x86_64/linux_macos/lib/buildGraalCompiler.sh ${BASEDIR} ${MX} ${BUILD_ARTIFACTS_DIR}
              timeout: 10m
              no_output_timeout: 7m

Then, we run the simple sanity checks to verify the validity of the artifacts created once a build has been completed, just before archiving the artifacts:

          - run:
              name: Sanity check artifacts
              command: |
                ./build/x86_64/linux_macos/lib/sanityCheckArtifacts.sh ${BASEDIR} ${JDK_GRAAL_FOLDER_NAME}
              timeout: 3m
              no_output_timeout: 2m

Then, we run the step Archiving artifacts (means compressing and copying final artifacts into a separate folder) which timeouts if the process takes longer than 2 minutes without any output or longer than 3 minutes in total to finish:

          - run:
              name: Archiving artifacts
              command: |
                ./build/x86_64/linux_macos/lib/archivingArtifacts.sh ${BASEDIR} ${MX} ${JDK_GRAAL_FOLDER_NAME} ${BUILD_ARTIFACTS_DIR}
              timeout: 3m
              no_output_timeout: 2m

For posterity and debugging purposes, we capture the generated logs from the various folders and archive them:

          - run:
              name: Collecting and archiving logs (debug and error logs)
              command: |
                ./build/x86_64/linux_macos/lib/archivingLogs.sh ${BASEDIR}
              timeout: 3m
              no_output_timeout: 2m
              when: always
          - store_artifacts:
              name: Uploading logs
              path: logs/

Finally, we store the generated artifacts at a specified location – the below lines will make the location available on the CircleCI interface (we can download the artifacts from here):

          - store_artifacts:
              name: Uploading artifacts in jdk8-with-graal-local
              path: jdk8-with-graal-local/

Approach 2: using a pre-built optimised Docker container

For approach 2, we will be using a pre-built docker container, that has been created and built locally with all necessary dependencies, the docker image saved and then pushed to a remote registry for e.g. Docker Hub. And then we will be referencing this docker image in the CircleCI environment, via the configuration file. This saves us time and effort for running all the commands to install the necessary dependencies to create the necessary environment for this approach (see the details steps in Approach 1 section).

We expect the build to run for a shorter time as compared to the previous build and this speedup is a result of the pre-built docker image (we will see in the Steps to build the pre-built docker image section), to see how this is done). The additional speed benefit comes from the fact that CircleCI caches the docker image layers which in turn results in a quicker startup of the build environment.

We will be going through the CircleCI configuration file section-by-section (stored in .circleci/circle.yml) for this approach, see config.yml in .circleci for the full listing, see commit e5916f1 for the source changes.

Explaining sections of the config file

Here again, we will be referring to the docker image by it’s full name. It is a pre-built docker image neomatrix369/graalvm-suite-jdk8 made available by neomatrix369. It was built and uploaded to Docker Hub in advance before the CircleCI build was started. It contains the necessary dependencies for the GraalVM compiler to be built:

        docker:
          - image: neomatrix369/graal-jdk8:${IMAGE_VERSION:-python-2.7}
        steps:
          - checkout

All the sections below do the exact same tasks (and for the same purpose) as in Approach 1, see Explaining sections of the config file section.

Except, we have removed the below sections as they are no longer required for Approach 2:

    - restore_cache:
              keys:
                - os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - os-deps-{{ arch }}-{{ .Branch }}
          - run:
              name: Install Os dependencies
              command: ./build/x86_64/linux_macos/osDependencies.sh
              timeout: 2m
          - save_cache:
              key: os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - vendor/apt
                - vendor/apt/archives
          - restore_cache:
              keys:
                - make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - make-382-{{ arch }}-{{ .Branch }}
          - run:
              name: Build and install make via script
              command: ./build/x86_64/linux_macos/installMake.sh
              timeout: 1m
          - save_cache:
              key: make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - /make-3.82/
                - /usr/bin/make
                - /usr/local/bin/make
                - /usr/share/man/man1/make.1.gz

In the following section, I will go through the steps show how to build the pre-built docker image. It will involve running the bash scripts – ./build/x86_64/linux_macos/osDependencies.sh and ./build/x86_64/linux_macos/installMake.sh to install the necessary dependencies as part of building a docker image. And, finally pushing the image to Docker Hub (can be pushed to any other remote registry of your choice).

Steps to build the pre-built docker image

– Run build-docker-image.sh (see bash script source) which depends on the presence of Dockerfile (see docker script source). The Dockerfile does all the necessary tasks of running the dependencies inside the container i.e. runs the bash scripts ./build/x86_64/linux_macos/osDependencies.sh and ./build/x86_64/linux_macos/installMake.sh:

    $ ./build-docker-image.sh

– Once the image has been built successfully, run push-graal-docker-image-to-hub.sh after setting the USER_NAME and IMAGE_NAME (see source code) otherwise it will use the default values as set in the bash script:

    $ USER_NAME="[your docker hub username]" IMAGE_NAME="[any image name]" \
        ./push-graal-docker-image-to-hub.sh

CircleCI config file statistics: Approach 1 versus Approach 2

Areas of interestApproach 1Approach 2
Config file (full source list)build-on-circlecibuild-using-prebuilt-docker-image
Commit point (sha)df28ee7e5916f1
Lines of code (loc)110 lines85 lines
Source lines (sloc)110 sloc85 sloc
Steps (steps: section)1915
Performance (see Performance section)Some speedup due to caching, but slower than Approach 2Speed-up due to pre-built docker image, and also due to caching at different steps. Faster than Approach 1

Ensure DLC layering is enabled (its a paid feature)

What not to do?

Approach 1 issues

I came across things that wouldn’t work initially, but were later fixed with changes to the configuration file or the scripts:

  • please make sure the .circleci/config.yml is always in the root directory of the folder
  • when using the store_artifacts directive in the .circleci/config.yml file setting, set the value to a fixed folder name i.e. jdk8-with-graal-local/ – in our case, setting the path to ${BASEDIR}/project/jdk8-with-graal didn’t create the resulting artifact once the build was finished hence the fixed path name suggestion.
  • environment variables: when working with environment variables, keep in mind that each command runs in its own shell hence the values set to environment variables inside the shell execution environment isn’t visible outside, follow the method used in the context of this post. Set the environment variables such that all the commands can see its required value to avoid misbehaviours or unexpected results at the end of each step.
  • caching: use the caching functionality after reading about it, for more details on CircleCI caching refer to the caching docs. See how it has been implemented in the context of this post. This will help avoid confusions and also help make better use of the functionality provided by CircleCI.

Approach 2 issues

  • Caching: check the docs when trying to use the Docker Layer Caching (DLC) option as it is a paid feature, once this is known the doubts about “why CircleCI keeps downloading all the layers during each build” will be clarified, for Docker Layer Caching details refer to docs. It can also clarify why in non-paid mode my build is still not as fast as I would like it to be.

General note:

  • Light-weight instances: to avoid the pitfall of thinking we can run heavy-duty builds, check the documentation on the technical specifications of the instances. If we run the standard Linux commands to probe the technical specifications of the instance we may be misled by thinking that they are high specification machines. See the step that enlists the Hardware and Software details of the instance (see Display HW, SW, Runtime env. info and versions of dependencies section). The instances are actually Virtual Machines or Container like environments with resources like 2CPU/4096MB. This means we can’t run long-running or heavy-duty builds like building the GraalVM suite. Maybe there is another way to handle these kinds of builds, or maybe such builds need to be decomposed into smaller parts.
  • Global environment variables: as each run line in the config.yml, runs in its own shell context, from within that context environment variables set by other executing contexts do not have access to these values. Hence in order to overcome this, we have adopted two methods:
  • pass as variables as parameters to calling bash/shell scripts to ensure scripts are able to access the values in the environment variables
  • use the source command as a run step to make environment variables accessible globally

End result and summary

We see the below screen (the last step i.e. Updating artifacts enlists where the artifacts have been copied), after a build has been successfully finished:

The artifacts are now placed in the right folder for download. We are mainly concerned about the jdk8-with-graal.tar.gz artifact.

Performance

Before writing this post, I ran multiple passes of both the approaches and jotted down the time taken to finish the builds, which can be seen below:

Approach 1: standard CircleCI build (caching enabled)
– 13 mins 28 secs
– 13 mins 59 secs
– 14 mins 52 secs
– 10 mins 38 secs
– 10 mins 26 secs
– 10 mins 23 secs
Approach 2: using pre-built docker image (caching enabled, DLC) feature unavailable)
– 13 mins 15 secs
– 15 mins 16 secs
– 15 mins 29 secs
– 15 mins 58 secs
– 10 mins 20 secs
– 9 mins 49 secs

Note: Approach 2 should show better performance when using a paid tier, as Docker Layer Caching) is available as part of this plan.

Sanity check

In order to be sure that by using both the above approaches we have actually built a valid JDK embedded with the GraalVM compiler, we perform the following steps with the created artifact:

– Firstly, download the jdk8-with-graal.tar.gz artifact from under the Artifacts tab on the CircleCI dashboard (needs sign-in):

– Then, unzip the .tar.gz file and do the following:

    tar xvf jdk8-with-graal.tar.gz

– Thereafter, run the below command to check the JDK binary is valid:

    cd jdk8-with-graal
    ./bin/java -version

– And finally check if we get the below output:

    openjdk version "1.8.0-internal"
    OpenJDK Runtime Environment (build 1.8.0-internal-jenkins_2017_07_27_20_16-b00)
    OpenJDK 64-Bit Graal:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565 (build 25.71-b01-internal-jvmci-0.46, mixed mode)

– Similarly, to confirm if the JRE is valid and has the GraalVM compiler built in, we do this:

    ./bin/jre/java -version

– And check if we get a similar output as above:

    openjdk version "1.8.0-internal"
    OpenJDK Runtime Environment (build 1.8.0-internal-jenkins_2017_07_27_20_16-b00)
    OpenJDK 64-Bit Graal:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565 (build 25.71-b01-internal-jvmci-0.46, mixed mode)

With this, we have successfully built JDK8 with the GraalVM compiler embedded in it and also bundled the Graal and Truffle components in an archive file, both of which are available for download via the CircleCI interface.

Note: you will notice that we do perform sanity checks of the binaries built just before we pack them into compressed archives, as part of the build steps (see bottom section of CircleCI the configuration files section).

Nice badges!

We all like to show-off and also like to know the current status of our build jobs. A green-colour, build status icon is a nice indication of success, which looks like the below on a markdown README page:

We can very easily embed both of these status badges displaying the build status of our project (branch-specific i.e. master or another branch you have created) built on CircleCI (see docs) on how to do that).

Conclusions

We explored two approaches to build the GraalVM compiler using the CircleCI environment. They were good experiments to compare performance between the two approaches and also how we can do them with ease. We also saw a number of things to avoid or not to do and also saw how useful some of the CircleCI features are. The documentation and forums do good justice when trying to make a build work or if you get stuck with something.

Once we know the CircleCI environment, it’s pretty easy to use and always gives us the exact same response (consistent behaviour) every time we run it. Its ephemeral nature means we are guaranteed a clean environment before each run and a clean up after it finishes. We can also set up checks on build time for every step of the build, and abort a build if the time taken to finish a step surpasses the threshold time-period.

The ability to use pre-built docker images coupled with Docker Layer Caching on CircleCI can be a major performance boost (saves us build time needed to reinstall any necessary dependencies at every build). Additional performance speedups are available on CircleCI, with caching of the build steps – this again saves build time by not having to re-run the same steps if they haven’t changed.

There are a lot of useful features available on CircleCI with plenty of documentation and everyone on the community forum are helpful and questions are answered pretty much instantly.

Next, let’s build the same and more on another build environment/build farm – hint, hint, are you think the same as me? Adopt OpenJDK build farm)? We can give it a try!

Thanks and credits to Ron Powell from CircleCI and Oleg Šelajev from Oracle Labs for proof-reading and giving constructive feedback. 

Please do let me know if this is helpful by dropping a line in the comments below or by tweeting at @theNeomatrix369, and I would also welcome feedback, see how you can reach me, above all please check out the links mentioned above.

Useful resources

– Links to useful CircleCI docs
About Getting started | Videos
About Docker
Docker Layer Caching
About Caching
About Debugging via SSH
CircleCI cheatsheet
CircleCI Community (Discussions)
Latest community topics
– CircleCI configuration and supporting files
Approach 1: https://github.com/neomatrix369/awesome-graal/tree/build-on-circleci (config file and other supporting files i.e. scripts, directory layout, etc…)
Approach 2: https://github.com/neomatrix369/awesome-graal/tree/build-on-circleci-using-pre-built-docker-container (config file and other supporting files i.e. scripts, directory layout, etc…)
Scripts to build Graal on Linux, macOS and inside the Docker container
Truffle served in a Holy Graal: Graal and Truffle for polyglot language interpretation on the JVM
Learning to use Wholly GraalVM!
Building Wholly Graal with Truffle!

Advertisements

Digital Catapult | Machine Intelligence Garage: the best-kept secret yet in the open

Introduction

I was at a meetup when Simon Knowles, CTO of Graphcore was giving his talk on the latest development at Graphcore and that is also where I met with Peter Bloomfield from Digital Catapult(@digicatapult). Peter was spreading the word about Machine Intelligence Garage which is an amazing opportunity created by a collaboration between the government and industry leaders like Google, Nvidia, AWS, etcetera to help startups and small businesses access compute resources which they would have otherwise never been able to get hold of.

Digital Catapult - Get Involved

Our conversation

Sometime later, we decided to have a chat discussing the usual questions like what is Machine Intelligence Garage, what is the history, why, how, who, when, and at what stage is the programme at and how do people get involved? As you would imagine, I found our conversation interesting and informative and hence decided to write about it and share it with the rest of us.

Mani: Hey, Peter great meeting you and learning about the initiative from Digital Catapult, can you please share with me the history about this initiative?

Peter: Hi Mani, thanks for dropping in! Our programme, Machine Intelligence Garage was born from a piece of research we conducted last summer. In this report, we explore barriers facing AI startups, and we wanted to test the hypotheses that: Access to the right data, technical talent and adequate computational resources were the main things holding startups back. We collaborate a bit on the first two barriers with government and academic institutes, like the Turing, but the Machine Intelligence Garage programme was designed to provide startups with access to cloud computing vouchers, novel chipsets and HPC facilities. It is through our brilliant partners that we can offer these resources.

Mani: Are there other sister and daughter initiatives or programs, related to Digital Catapult, that everyone would benefit knowing about?

Peter: We have a whole range of digital programmes, across three core tech layers (future networks, AI/ML and Immersive tech as well as some cybersecurity and blockchain initiatives). The full details of our opportunities can be found through the ‘Get Involved’ section of our website.

Mani: Who have been and are going to benefit from this setup that you have in place?

Peter: Our Machine Intelligence Garage programme is designed to benefit early-stage startups who are data ready and need compute power to scale faster. Our collaborative programmes provide opportunities for larger corporations to get involved with startups to address industry-specific challenges.

Mani: How much does this access cost and for how long are they available for access?

Peter: Everything we provide on the Machine Intelligence Garage programme is free to use. The programme was set up using public funding from InnovateUK and CAP.AI. We are able to deliver the compute resources through our work with a wonderful set of partners.

Mani: Can you tell us the process from start to finish?

Peter: When I meet a new company, I have a chat with them about the things they are trying to do, the infrastructure they currently use and the sorts of ML approaches they are using to solve the problem. If the company needs our support and are developing a product with commercial viability and have both strong technical skills and domain expertise I encourage them to apply. The application form asks questions about the product, training data, compute power requirements. If we like the idea, we invite a company to an interview and if successful onboard them with the most suitable resource. The process generally takes 3 weeks from the close of the application call to onboarding with a resource.

gathered-around-table-classic

Mani: What other benefits do startups get from being involved? Are you able to introduce them to partners who can help them run trials and give feedback on the products or services they are building?

Peter: We have a large network that we encourage all our companies to take advantage of. Digital Catapult is an innovation centre and meeting the right people at the right time is key to success for many startups. We run a range of workshops, from business growth and pitch training to deep dives to learn more about technical resources and we make sure our startups benefit from all of these. If a startup wants to put some of the new knowledge into practice we are always very keen to facilitate it!

Mani: This is a lot of information, are there any resources on your website that can help. Anything to sign up to, to keep in touch?

Peter: We have a general technology Digital Catapult newsletter (sign-up form at the bottom of the page) and an AI specific Machine Intelligence Garage newsletter (sign-up form at the bottom of the page). If a startup wants to chat about the programme, they can send me an e-mail: peter.bloomfield@digicatapult.org.uk. We announce all our calls and opportunity through our twitter account too @DigiCatapult.

Mani: Can you please touch on the specifics of what the startups will get access to and how it can benefit them?

Peter: We have three main resources available:

  • Cloud Computing vouchers, either through AWS or Google Cloud Platform

To find out the exact amounts and specifics of access, please do get in touch.

Mani: If someone needed to find out about the benchmarks between different compute resources available to the participants, who or where would they look for the information? Do you have a team that does these measurements on a daily basis?

Peter: We do our own benchmarking of the facilities available and our data engineer on the programme can advise on this, as well as point you in the right direction for more literature!

immersive-lab-entrance

Mani: I have been to the Digital Catapult HQ at 101 Euston Road, and was blown away with all the tech activities happening there, can you please share details about it with our readers

Peter: Digital Catapult was set up four years ago and is the UK’s leading innovation centre for advanced digital technologies.  We have seen a number of changes over the years but our core values of opening up markets and making businesses more competitive and productive remain. We are incredibly lucky to have some amazing facilities to help companies develop new products and services and get their products to market faster, including a nationwide network of Immersive Labs [see launch photos, photos in 2018], an LPWAN network and the new 5G Brighton Testbed.

Mani: Can you name a few startups that are currently going through your programs and the ones who have already been through it?

Peter: We currently have 25 start-ups on our programme. The reams are a range of sizes, some just 2 people, others 20+. The thing they all have in common is that they are developing some really exciting commercial products/solutions with deep learning and have an immediate need for the computational resources we offer. The full list of start-ups can be found on our cohort page on the Machine Intelligence Garage website.

Mani: I really appreciate the time you have taken to answer my questions and this has definitely helped the readers know more about what you do and how they can benefit from this great government-driven initiative.

Peter: Thank you very much for coming in. It’s great to be able to reach a wider audience and grow our community! See you soon!

15307393344_f6881df22a_k

Closing note

I was shown around a number of facilities at their centre (two floors) i.e. the Immersive lab (yes plenty of VR headsets to play with), the server area where all the HPC hardware is kept (at low room temperature), a spacious conference room where meetups are held, a small library full of interesting books and also a hot-desking area shared by both internal staff, partners and friends of Digital Catapult/Machine Intelligence Garage. Looking at the two websites I found the news and views, events and workshops and Digital Catapult | MI Garage blog sites interesting to keep track of activities in this space.

immersive-lab-man-headgear

I’m sure after reading about the conversation, you must be wondering how you could take advantage of these facilities out there meant for you and ones in your network who could benefit from it.

Readers should go to the links mentioned above to learn about this program and how they can go about taking advantage of it, or recommend it to their friends in the community who would be more suitable for it.

Please do let me know if this is helpful by dropping a line in the comments below, and I would also welcome feedback, see how you can reach me, above all please check out to the links mentioned above and also reach out to the folks behind this great initiative.

 

Truffle served in a Holy Graal: Graal and Truffle for polyglot language interpretation on the JVM

03 Hotspot versus GraalVM

Reblogging from ZeroTurnaround’s Rebellabs blog site

One of the most fascinating additions to Java 9 is the JVMCI: Java-Level JVM Compiler Interface, a Java based compiler interface which allows us to plug in a dynamic compiler into the JVM. One of the main inspirations for including it into Java 9 was due to project Graal — a dynamic state-of-the-art compiler written in Java.

In this post we look at the reasons Graal is such a fascinating project, its advantages, what are the general code optimization ideas, some performance comparisons, and why would you even bother with tinkering with a new compiler.

Like everyone else we were inspired by the vJUG session by Chris Seaton on Graal – it looks like a great tool and technology and so we decided to play with the technology and share it with the community.

…you can read the rest at ZeroTurnaround’s Rebellabs blogs


 

In case, you are wondering what some of the ASCII-art images in one of the paragraphs is about, here’s a bit of explanation, hopefully it will clear up any doubts.

How does it actually work?

A typical flow would look like this:

02-a Program to machine code diagram (excludes expansion)
AST → Abstract Syntax Tree  (explicit data structures in memory)

We all know that a JIT is embedded inside HotSpot or the JVM. It’s old, complicated, written in C++ and assembly and is fairly hard to understand. It is a black box and there is no way to hook or link into the JIT.  All the JVM languages have to go through the same route:  

02-b Program to machine code diagram (via byte-code)

(ASM = assembly)

The flow or route when dealing with traditional compilers and VM would be:

02-c Program to machine code diagram (via JIT)
But with Graal, we get the below route or flow:

02-d Program to machine code diagram (via AST)
(notice Graal skips the steps that create byte-code by directly generating platform specific machine code)

Graal basically helps moving the control-flow from Code to the JIT bypassing the JVM (HotSpot, in our case). It means we will be running faster and more performant applications, on the JVM. These applications will not be interpreted anymore but compiled to machine code on fly or even natively.


I hope you enjoyed the read, please feel free to share any constructive feedback, so we can improve the material for the community as a whole. We learnt a lot while drafting this post and hope the same for you.

Original post by @theNeomatrix369 and  @shelajev !

(Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

In our first review, The Atlassian guide to GC tuning is an extensive post covering the methodology and things to keep in mind when tuning GC, practical examples are given and references to important resources are also made in the process. The next one How NOT to measure latency by Gil Tene, he discusses some common pitfalls encountered in measuring and characterizing latency, demonstrating and discussing some false assumptions and measurement techniques that lead to dramatically incorrect reporting results, and covers simple ways to sanity check and correct these situations.  Finally Kirk Pepperdine in his post Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! throws light on some JVM flags – he starts with some 700 flags and boils it down to merely 7 flags. Also cautions you to not just draw conclusions or to take action in a whim but consult and examine – i.e. measure don’t guess!

….read more (reblogged from the Java Advent Calendar)

(Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

Without any further ado, lets get started with our next set of blogs and videos, chop…chop…! This time its Martin Thompson’s blog posts and talks. Martin’s first post on Java Garbage collection distilled basically distils the GC process and the underlying components including throwing light on a number of interesting GC flags (-XX:…). In his next talk he does his myth busting shaabang about mechanical sympathy, what people correctly believe in and the misconceptions. In the talk on performance testing, Martin takes its further and fuses Java, OS and the hardware to show how understanding of all these aspects can help write better programs.


Java Garbage Collection Distilled by Martin Thompson

There are too many flags to allow tuning the GC to achieve the throughput and latency your application requires. There’s plenty of documentation on the specifics of the bells and whistles around them but none to guide you through them.

(Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

I have been contemplating for a number of months about reviewing a cache of articles and videos on topics like Performance tuning, JVM, GC in Java, Mechanical Sympathy, etc… and finally took the time to do it – may be this was the point in my intellectual progress when was I required to do such a thing!

Thanks to Attila-Mihaly for giving me the opportunity to write a post for his yearly newsletter Java Advent Calendar, hence a review on various Java related topics fits the bill! The selection of videos and articles are purely random, and based on the order in which they came to my knowledge. My hidden agenda is to mainly go through them to understand and broaden my own knowledge at the same time share any insight with others along the way….read more (reblogged from the Java Advent Calendar)

My experience of learning R – from basic graphs to performance tuning

Background

R as some of you may know is a statistical and graphics programming language (see Wikipedia [1]) used by academia and recently by IT professionals of our ever growing software industry. There is a sudden demand for Data Scientists, Data Analysts and Statisticians with a background in R among other things data and development related subjects.

I have been fortunate to work with such a programming language, even though I haven’t had any prior experience working with such a programming language and moreover with Data Scientists. My interest in Mathematics and affinity for numbers drew me to learning it, and with further help of Herve Schnegg our in-house Senior Data Scientist, I was able to pick a fair bit of the subject.

 

R is a mix of a object-oriented programming, Clojure-like functional programming, Javascript-like style of writing code and a Smalltalk-like programming interface. And it offers REPL like many functional programming environments. The fundamental units of the data we manipulate are usually objects like lists, vectors, data-frames, tables, etc…
 
Initial baby-steps
 
I went through a few hours of tutoring by getting an understanding of the R environment, how to install it, and an overview of RStudio and how amazing it is! What fascinates me, is that you can load objects into memory and play with it and when you shutdown your environment your data is not cleared! Rather you can save it (into the .Rdata file) and it retains such information per project!
You are able to remove individual objects from memory, view them, modify them, and reload them from the command-line or by just executing single lines of code in your R script file (they have the obvious extension of .r) in an IDE like RStudio.
R gives developers access to a REPL (stands for Read–eval–print loop [2]) environment and thats how you are able to do the above actions seamlessly! A number of other popular languages have a similar environment i.e. Clojure, Haskell, Python, Ruby, Scala, and Smalltalk, and so forth.

More about R
The order of precedence with regards to declaring a function is important in R, you can’t just call a function unless it has been defined in the package/library you have loaded like:

library([name of library])

or included a resource using the source() function like:source(“./Utils.LoadAndVerify.r”)

or defined the function in the beginning of the script file before referring to it, at a later stage! I had to learn this by the trial-an-error-then-ask-the-experts-around-you method.

 

Contents of any object can be viewed by referring to the object at the REPL CLI, that’s kind of easy!

 

>  someObject <- “contents”
>  someObject [press enter]
[1] “contents” <==== output

 

I discovered another way to view the contents of an object especially when its a list, vector, data-frame, etc…, and is a bit cumbersome to read its output on the console. I learnt that the View() function displays the contents of the object in a tabular form in a separate floating window:

 

> View(table(someList))

 

(the object is displayed in a grid like table in a separate window, which could look like the below)

Plotting graphs from a set of numeric values contained in a list or vector in R is like doing 1..2..3…:

 > counts par(bg = "white");
 > barplot(counts, main="Car Distribution by Gears and VS",
   xlab="Number of Gears", col=c("darkblue","maroon"),
   legend = rownames(counts), beside=TRUE)

And voila, you get a nice simple looking bar graph!

Thanks to a helpful R blogger who has put together some resource for us: Using R to plot data [4].

We can do something more advance by running the below commands:

> x  y  f  z  par(bg = "white");
> persp(x,y,z,zlim=c(0,0.25), theta=50, phi=10);

…and we have the below nice looking 3D mesh (wireframe), from an angle:
Note: the par (bg=”white”) command sets the colour of the canvas for the entirety of your session.

 

Logging
I wrote my own suite of very simple logging functions that log messages to the console depending on the nature of the message, these messages can of course be piped into a text file at run-time.
log.INFO print(paste(date(), "[INFO]", message))
}

log.WARNING print(paste(date(), "[WARNING]", message))
}

log.DEBUG print(paste(date(), "[DEBUG]", message))
}

log.ERROR print(paste(date(), "[ERROR]", message))
}
Of course the above block of code could have been written like this:
MSG_TYPE_INFO <- "[INFO]"
MSG_TYPE_WARNING <- "[WARNING]"
MSG_TYPE_DEBUG <- "[DEBUG]"
MSG_TYPE_ERROR <- "[ERROR]"

log.ANY print(paste(date(), typeOfMessage, message))
}

log.INFO log.ANY(MSG_TYPE_INFO, message)
}

log.WARNING log.ANY(MSG_TYPE_WARNING, message)
}

log.DEBUG log.ANY(MSG_TYPE_DEBUG, message)
}

log.ERROR log.ANY(MSG_TYPE_ERROR, message)
}

As you will know, the way R is, it is wise to have logging functions to hand, to dump values of variables when running scripts. Just because sometimes the error messages thrown by R can be obscure, which has been my finding during my pursuits. Hence I resorted to the above functions and relieved myself from annoyances during exceptions.Later some passed me a link to an R Logging library (an implementation of log4j in R) [7].

What’s up!
At the moment I’m refactoring bits of code I wrote during the last two weeks and still have many blocks of code to go through to find suitable method functions to place them into – our purpose is to make the code more readable, scalable and maintainable.
Just now in the process of replacing the slow and verbose for-loop like constructs with their equivalent xapply() functions. By doing this we will gain in speed and compactness with regards to the lines of code.

 

R gives us a number of MapReduce like functions to play with, here’s a blog [3] that covers the topic on the xapply() functions.
 
Performance measurement and performance tuning
As R is an interpreted language, if you don’t write efficient functions, you could end up waiting a bit longer than expected, before any results are thrown back onto the console. It is not verbose and does not usually tell you what it is upto.
We spent most of our two weeks performing this action as we came across performance bottlenecks in our scripts and could do with using the xapply() like functions. Applying them improved the performance of certain tasks from several hours to a reasonable number of minutes per execution.

 

“Measure, don’t guess.” was the motto!

 

Thanks to the sequence of calls to the proc.time() function, which we used voraciously to measure performances of the different blocks of code we thought needed attention.

 

startTimer <- proc.time()
and
proc.time() – startTimer
 
This paid off at the end of the process as we were able to determine how much time it would take for the script to transform and validate the heaps of data we have been playing with.
At the end of each such iteration we saw the stats in the below format. It got us excited if it was a low number and dejected if it wasn’t to our liking:

 

   user  system elapsed
 87.085   0.694  87.877
 
We tried various methods to bring down the total elapsed time. Some of the things we did even before we came to a final resolution:
 – used for-loop to iterate through a list or vector and perform the same action repeatedly and accumulate results
  – we noticed the for-loop slowed down after a number of iterations and this was a standard pattern. To relieve that we split the for-loop into an inner and outer loop. The outer loop split the inner loop into batches of 40-50 iterations followed by a call to gc() at the end of the iteration. This wasn’t a decent solution from an algorithms or language point of view
finally we settled to refactoring the for-loop into a mapply() which looked like:
result &lt;- unlist(mapply(FUN=transposeColumnAsRow, rangeOfIndices, SIMPLIFY=TRUE))
 
The last action gave us a better grip over the performance and we were confident that if we had to run all the data we had through the script, we would be able to finish transposing it within several hours as opposed to a few days, previously.
Here’s the equation we used to benchmark our functions each time we improved it. It was more to find out for us if we would be able to meet our goals. If the action was acceptable, otherwise we needed to investigate further to find a better method:

 

nm = (ns / nr) * tnr / nsm

 

 nm – no. of minutes it would take to process the whole raw file
 ns – no. of seconds taken to process the batch of records
 nr – total number of records process in the batch
tnr – grand total of the number of records in raw file
nsm – number of seconds in a minute

 

The method we settled for gave us the below results, which was a great benchmark based on processing a sample of 100 records, and when extrapolated on 11200+* records gave the below – which was pretty acceptable at the time:

 

 9.315 / 100 * 11250 / 60 = 17.465 minutes per raw data file

* – each row was made up of 1300+ columns which added to the processing time


We had about 24 files in total to process, which compute to

17.465 * 24 / 60 = 6.986 hours if run one file per session

The tasks of processing each file was split into 3 to 4 sessions processing 1000 records per session.

But it wasn’t as easy as said, we had a number of sessions running doing the above on different pieces of raw data, but never got to committing the data into the database and wondered why? We thought it was hardware/software limitations on our systems. But after further investigations and experimentations found out that no system can handle writing mega-tons of data from memory into the database system without creating giga-tons of swap files. And these swap files are a catch-22, because now the OS needs resources to manage its own resources so our resource requirements would take a back seat!

After a couple discussions, and trials we finally decided to write data back into the database system, in smaller blocks at a time, which means we can still have multiple sessions running in the background and have each one of them write smaller blocks of data into the database.
Everyone is happy as processes can handle smaller blocks much better than bigger blocks – didn’t we already know this, maybe we re-learnt it by facing a bottleneck?   Our script learnt from it as well and got modified to be able to accept and handle processing smaller blocks of data by splitting the processes into smaller batches of records per execution.$ RScript IncrementalLoad.r [filename] [starting record no.] [ending record no.]

The verification script also imitated the same and elected to be run in batches:$ RScript VerifyData.r [filename] [starting record no.] [ending record no.]

Once data had been transformed and written to a database, we wrote a script to validate the data written into the database, we chose Postgres as a trial, and found it was a pretty good database system with an intuitive SQL language.

The verification process was run in the same manner in parallel which took similar amount of time, so at the end of the 7th hour we had both the data written into the database and verified.

 

R can write to such a database system easily. We were further helped with the primary, simple and compound indices that we created to facilitate the searching and selecting processes that our SQL statements would make it do. Postgres also has an efficient caching mechanism, which helps further speed things up.

 

Tweaking the R environment to get efficiency out of it
What I didn’t mention was that before we returned to the R script to tweak it and improve its performance, we thought it was the environment and the way R was, that made it slow – so we wanted to speed up our scripts using the below methods to get the maximum out of the R environment:
  • JIT compiling R scripts – thinking its not slow when interpreted anymore
  • Converting R scripts into C/C++ code and compiling and running it instead
  • Running R scripts using parallel processing (need some library for it)
  • Learning how to use GPUs via R to get that extra performance (need some library for it)
  • Investigating other methods of High Performance Computing in R
We have parked these ideas for now, but it will be a great experience to be able to explore them at a later date.
But once again it was techniques over technology that made our day. Rory Gibson, rightly said “Its not surprising to know how game developers produce some of the best pieces of work under restricted environments”. Such situations are a good nudge to everyone especially developers when faced with performance bottleneck – look at your code not your machine first!

 

At the end of it all, it feels we did what Hadoop or Cloudera would do to our jobs – split, slice, execute, verify, put together and bring back the results at an efficient speed.

 

Hurdles
 
The time I spend learning and applying R, I had to get familiar with its unique or rather say different from other programming language syntax. Like the use of the <- (arrow sign or indirection operator) instead of the usual = (equal to sign). How you point the arrow makes a difference in R, instead of assigning a value to a variable or function you might end up doing something else if you are not careful.

 

You need to define your functions at the top first, otherwise you can’t refer to it. And all entities are case-sensitive, please pay careful attention or else you will only be notified when you least expect it and in the middle of an execution of a block – remember its an interpreted language, no compile time warnings / error messages are available.
There was one more hurdle which put the spanners at work for us – we bumped into an encoding/decoding issue with reading data from the raw file. The ESS plugin [5] in Emacs was reading the data literally at a stage and not evaluating the escape codes. When we switched to RStudio or even the R repl, we immediately became free from the issue – this was also because both I and Herve were using different development environments. He used Emacs to develop in R while I used RStudio. Why this was happening is not known to us, we think there might be a bug in the plugin – at this stage its still a speculation, but more importantly we don’t have to investigate the issue anymore.Our raw file was written using the application called SPSS which writes data in a proprietary format. Such files can be read via a few ways, and using R is one way to achieve that. There is also a Java library [6] that facilitates reading such files, but remains unexplored at this time.
Test driven development in R
 
This is where I have still been hovering around with regards to R, I came across two libraries that enables writing unit tests in R, i.e. RUnit and svUnit. See below in the External Resources section for a number of links I have put together while searching for TDD methodologies in R.
It still needs to be investigated further but a promising start – since test-first driven development is a great way to start working on any piece of problem in any programming language of choice.

 

 

Refactoring
Another action which has been a continuous process since the start. We have applied it to generalise, and make the code base more compact and manageable.

 

Move away common function calls into another .r file and called it into our main script using the source() function. Make our work more maintainable and re-usable – basically keep our code-base clean and tidy.

 

I learnt that refactoring is a continuous effort – its a journey not a destination.

 

What we took away…
I and Herve both took away a lot of learning both technical and non-technical from the whole process  – pair-programming and pair-investigation of a problem space, as the old adage goes “Two minds, are better than one.” Also reveals another reason why pair-programming is encouraged as part of a development process.

 

One important point: we learnt that when we started working on this project, slicing it into simple smaller atomic chunks when solving a problem was effective and efficient, and learnt the hard way. Also when we had a solution to apply to a dataset,  we had already decided to only apply any experimental solution to a smaller subset of the dataset first, verify the results and then scale it incrementally till the dataset was exhausted. Both these working methods came to our rescue and reduced the combination and permutations of trial-and-error!
We both exchanged ideas that we were new to and very well incorporated many of them into our work methods and was able come up with a fine, and a re-usable solution.
I have taken this project further by documenting the work, continuing with refactoring the script files, writing this blog post, and tidying up the project space as a whole.This blog comes about as a documentation of our trial to check the viability of different platforms that could serve us as an ETL (Extract, Transform, Load) – of which we have made good use of RWhether is brilliant at it, or another tools serves betters is debatable. R stands good at what it does, and it does it well – but can be used to do light/medium weight ETL work.So what would be the next tool or platform of choice for our next ETL project. We can’t tell which one is better till we have tried a few and benchmarked them against their pro-s and con-s.


Thanks

Herve Schnegg – for a good partnership during our R session the last couple of weeks, and all the input and learning.

Rory Gibson – for lending his reviewer eyes, for reviewing our R work I and Herve did and also for reviewing this post.

External resources
This blog has also been published on the web’s popular R blogging site: http://www.R-bloggers.com.
During my quest, while learning and applying R, I came across the below links that could come useful to anyone who is interested in furthering their knowledge.
JIT for R
R to CPP
Parallels in R
High Performance Computing using R
GPU programming with R
Test driven development in R
Other useful topics
PSPP

Read more….