Exploring NLP concepts using Apache OpenNLP

Introduction

After looking at a lot of Java/JVM based NLP libraries listed on Awesome AI/ML/DL I decided to pick the Apache OpenNLP library. One of the reasons comes from the fact another developer (who had a look at it previously) recommended it. Besides, it’s an Apache project, they have been great supporters of F/OSS Java projects for the last two decades or so (see Wikipedia). It also goes without saying that Apache OpenNLP is backed by the Apache 2.0 license, read more…

Originally published at blogs.valohai.com: Exploring NLP concepts using Apache OpenNLP

Applying NLP in Java, all from the command-line

Introduction

We are all aware of Machine Learning tools and cloud services that work via the browser and give us an interface we can use to perform our day-to-day data analysis, model training, and evaluation, and other tasks to various degrees of efficiencies….

Originally published at blogs.valohai.com: Applying NLP in Java, all from the command-line

Running Apache Zeppelin on the cloud

Introduction
This post is going to be a composition of the practical parts of two posts, one written late last year and the other a couple of months ago respectively. The posts being Apache Zeppelin: stairway to notes* haven! (late Dec 2018) and Running your JuPyTer notebooks on Oracle Cloud Infrastructure (early September 2019). Although this time we are going to make Apache Zeppelin run on the Oracle Cloud Infrastructure….

Originally published on Medium.com: Running Apache Zeppelin on the cloud

Running your JuPyTeR notebooks on the cloud

Introduction


On the back of my previous share on how to build and run a docker container with Jupyter, I’ll be taking this further on how we can make this run on a cloud platform.
We’ll try to do this on Oracle Cloud Infrastructure (OCI). In theory, we should be able to do everything in the blog on any VM or Baremetal instance. If you are new to Oracle Cloud, I would suggest getting familiar with the docs and Getting Started sections of the docs. You will also find several informative links at the bottom of this post, in the Resources section…

Originally published on Medium.com: Running your JuPyTeR notebooks on the cloud

How to do Deep Learning for Java?

Introduction

Some time ago I came across this life-cycle management tool (or cloud service) called Valohai and I was quite impressed by its user-interface and simplicity of design and layout. I had a good chat about the service at that time with one of the…

Originally published at blogs.valohai.com: How to do Deep Learning for Java on the Valohai Platform?

How to build Graal-enabled JDK8 on CircleCI?

Citation: feature image on the blog can be found on flickr and created by Luca Galli. The image in one of the below sections can be also found on flickr and created by fklv (Obsolete hipster).


The GraalVM compiler is a replacement to HotSpot’s server-side JIT compiler widely known as the C2 compiler. It is written in Java with the goal of better performance (among other goals) as compared to the C2 compiler. New changes starting with Java 9 mean that we can now plug in our own hand-written C2 compiler into the JVM, thanks to JVMCI. The researchers and engineers at Oracle Labs) have created a variant of JDK8 with JVMCI enabled which can be used to build the GraalVM compiler. The GraalVM compiler is open source and is available on GitHub (along with the HotSpot JVMCI sources) needed to build the GraalVM compiler). This gives us the ability to fork/clone it and build our own version of the GraalVM compiler.

In this post, we are going to build the GraalVM compiler with JDK8 on CircleCI. The resulting artifacts are going to be:

– JDK8 embedded with the GraalVM compiler, and
– a zip archive containing Graal & Truffle modules/components.

Note: we are not covering how to build the whole of the GraalVM suite in this post, that can be done via another post. Although these scripts can be used to that, and there exists a branch which contains the rest of the steps.

Why use a CI tool to build the GraalVM compiler?

Screenshot_2019-08-06 Graal lovely

Continuous integration (CI) and continuous deployment (CD) tools have many benefits. One of the greatest is the ability to check the health of the code-base. Seeing why your builds are failing provides you with an opportunity to make a fix faster. For this project, it is important that we are able to verify and validate the scripts required to build the GraalVM compiler for Linux and macOS, both locally and in a Docker container.

A CI/CD tool lets us add automated tests to ensure that we get the desired outcome from our scripts when every PR is merged. In addition to ensuring that our new code does not introduce a breaking change, another great feature of CI/CD tools is that we can automate the creation of binaries and the automatic deployment of those binaries, making them available for open source distribution.

Let’s get started

During the process of researching CircleCI as a CI/CD solution to build the GraalVM compiler, I learned that we could run builds via two different approaches, namely:

– A CircleCI build with a standard Docker container (longer build time, longer config script)
– A CircleCI build with a pre-built, optimised Docker container (shorter build time, shorter config script)

We will now go through the two approaches mentioned above and see the pros and cons of both of them.

Approach 1: using a standard Docker container

For this approach, CircleCI requires a docker image that is available in Docker Hub or another public/private registry it has access to. We will have to install the necessary dependencies in this available environment in order for a successful build. We expect the build to run longer the first time and, depending on the levels of caching, it will speed up.

To understand how this is done, we will be going through the CircleCI configuration file section-by-section (stored in .circleci/circle.yml), see config.yml in .circleci for the full listing, see commit df28ee7 for the source changes.

Explaining sections of the config file

The below lines in the configuration file will ensure that our installed applications are cached (referring to the two specific directories) so that we don’t have to reinstall the dependencies each time a build occurs:

    dependencies:
      cache_directories:
        - "vendor/apt"
        - "vendor/apt/archives"

We will be referring to the docker image by its full name (as available on http://hub.docker.com under the account name used – adoptopenjdk). In this case, it is a standard docker image containing JDK8 made available by the good folks behind the Adopt OpenJDK build farm. In theory, we can use any image as long as it supports the build process. It will act as the base layer on which we will install the necessary dependencies:

        docker:
          - image: adoptopenjdk/openjdk8:jdk8u152-b16

Next, in the pre-Install Os dependencies step, we will restore the cache, if it already exists, this may look a bit odd, but for unique key labels, the below implementation is recommended by the docs):

          - restore_cache:
              keys:
                - os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - os-deps-{{ arch }}-{{ .Branch }}

Then, in the Install Os dependencies step we run the respective shell script to install the dependencies needed. We have set this step to timeout if the operation takes longer than 2 minutes to complete (see docs for timeout):

          - run:
              name: Install Os dependencies
              command: ./build/x86_64/linux_macos/osDependencies.sh
              timeout: 2m

Then, in then post-Install Os dependencies step, we save the results of the previous step – the layer from the above run step (the key name is formatted to ensure uniqueness, and the specific paths to save are included):

          - save_cache:
              key: os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - vendor/apt
                - vendor/apt/archives

Then, in the pre-Build and install make via script step, we restore the cache, if one already exists:

          - restore_cache:
              keys:
                - make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - make-382-{{ arch }}-{{ .Branch }}

Then, in the Build and install make via script step we run the shell script to install a specific version of make and it is set to timeout if step takes longer than 1 minute to finish:

          - run:
              name: Build and install make via script
              command: ./build/x86_64/linux_macos/installMake.sh
              timeout: 1m

Then, in the post Build and install make via script step, we save the results of the above action to the cache:

          - save_cache:
              key: make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - /make-3.82/
                - /usr/bin/make
                - /usr/local/bin/make
                - /usr/share/man/man1/make.1.gz
                - /lib/

Then, we define environment variables to update JAVA_HOME and PATH at runtime. Here the environment variables are sourced so that we remember them for the next subsequent steps till the end of the build process (please keep this in mind):

          - run:
              name: Define Environment Variables and update JAVA_HOME and PATH at Runtime
              command: |
                echo '....'     <== a number of echo-es displaying env variable values
                source ${BASH_ENV}

Then, in the step to Display Hardware, Software, Runtime environment and dependency versions, as best practice we display environment-specific information and record it into the logs for posterity (also useful during debugging when things go wrong):

          - run:
              name: Display HW, SW, Runtime env. info and versions of dependencies
              command: ./build/x86_64/linux_macos/lib/displayDependencyVersion.sh

Then, we run the step to setup MX – this is important from the point of view of the GraalVM compiler (mx) is a specialised build system created to facilitate compiling and building Graal/GraalVM and  components):

          - run:
              name: Setup MX
              command: ./build/x86_64/linux_macos/lib/setupMX.sh ${BASEDIR}

Then, we run the important step to Build JDK JVMCI (we build the JDK with JVMCI enabled here) and timeout, if the process takes longer than 15 minutes without any output or if the process takes longer than 20 minutes in total to finish:

          - run:
              name: Build JDK JVMCI
              command: ./build/x86_64/linux_macos/lib/build_JDK_JVMCI.sh ${BASEDIR} ${MX}
              timeout: 20m
              no_output_timeout: 15m

Then, we run the step Run JDK JVMCI Tests, which runs tests as part of the sanity check after building the JDK JVMCI:

          - run:
              name: Run JDK JVMCI Tests
              command: ./build/x86_64/linux_macos/lib/run_JDK_JVMCI_Tests.sh ${BASEDIR} ${MX}

Then, we run the step Setting up environment and Build GraalVM Compiler, to set up the build environment with the necessary environment variables which will be used by the steps to follow:

          - run:
              name: Setting up environment and Build GraalVM Compiler
              command: |
                echo ">>>> Currently JAVA_HOME=${JAVA_HOME}"
                JDK8_JVMCI_HOME="$(cd ${BASEDIR}/graal-jvmci-8/ && ${MX} --java-home ${JAVA_HOME} jdkhome)"
                echo "export JVMCI_VERSION_CHECK='ignore'" >> ${BASH_ENV}
                echo "export JAVA_HOME=${JDK8_JVMCI_HOME}" >> ${BASH_ENV}
                source ${BASH_ENV}

Then, we run the step Build the GraalVM Compiler and embed it into the JDK (JDK8 with JVMCI enabled) which timeouts if the process takes longer than 7 minutes without any output or longer than 10 minutes in total to finish:

          - run:
              name: Build the GraalVM Compiler and embed it into the JDK (JDK8 with JVMCI enabled)
              command: |
                echo ">>>> Using JDK8_JVMCI_HOME as JAVA_HOME (${JAVA_HOME})"
                ./build/x86_64/linux_macos/lib/buildGraalCompiler.sh ${BASEDIR} ${MX} ${BUILD_ARTIFACTS_DIR}
              timeout: 10m
              no_output_timeout: 7m

Then, we run the simple sanity checks to verify the validity of the artifacts created once a build has been completed, just before archiving the artifacts:

          - run:
              name: Sanity check artifacts
              command: |
                ./build/x86_64/linux_macos/lib/sanityCheckArtifacts.sh ${BASEDIR} ${JDK_GRAAL_FOLDER_NAME}
              timeout: 3m
              no_output_timeout: 2m

Then, we run the step Archiving artifacts (means compressing and copying final artifacts into a separate folder) which timeouts if the process takes longer than 2 minutes without any output or longer than 3 minutes in total to finish:

          - run:
              name: Archiving artifacts
              command: |
                ./build/x86_64/linux_macos/lib/archivingArtifacts.sh ${BASEDIR} ${MX} ${JDK_GRAAL_FOLDER_NAME} ${BUILD_ARTIFACTS_DIR}
              timeout: 3m
              no_output_timeout: 2m

For posterity and debugging purposes, we capture the generated logs from the various folders and archive them:

          - run:
              name: Collecting and archiving logs (debug and error logs)
              command: |
                ./build/x86_64/linux_macos/lib/archivingLogs.sh ${BASEDIR}
              timeout: 3m
              no_output_timeout: 2m
              when: always
          - store_artifacts:
              name: Uploading logs
              path: logs/

Finally, we store the generated artifacts at a specified location – the below lines will make the location available on the CircleCI interface (we can download the artifacts from here):

          - store_artifacts:
              name: Uploading artifacts in jdk8-with-graal-local
              path: jdk8-with-graal-local/

Approach 2: using a pre-built optimised Docker container

For approach 2, we will be using a pre-built docker container, that has been created and built locally with all necessary dependencies, the docker image saved and then pushed to a remote registry for e.g. Docker Hub. And then we will be referencing this docker image in the CircleCI environment, via the configuration file. This saves us time and effort for running all the commands to install the necessary dependencies to create the necessary environment for this approach (see the details steps in Approach 1 section).

We expect the build to run for a shorter time as compared to the previous build and this speedup is a result of the pre-built docker image (we will see in the Steps to build the pre-built docker image section), to see how this is done). The additional speed benefit comes from the fact that CircleCI caches the docker image layers which in turn results in a quicker startup of the build environment.

We will be going through the CircleCI configuration file section-by-section (stored in .circleci/circle.yml) for this approach, see config.yml in .circleci for the full listing, see commit e5916f1 for the source changes.

Explaining sections of the config file

Here again, we will be referring to the docker image by it’s full name. It is a pre-built docker image neomatrix369/graalvm-suite-jdk8 made available by neomatrix369. It was built and uploaded to Docker Hub in advance before the CircleCI build was started. It contains the necessary dependencies for the GraalVM compiler to be built:

        docker:
          - image: neomatrix369/graal-jdk8:${IMAGE_VERSION:-python-2.7}
        steps:
          - checkout

All the sections below do the exact same tasks (and for the same purpose) as in Approach 1, see Explaining sections of the config file section.

Except, we have removed the below sections as they are no longer required for Approach 2:

    - restore_cache:
              keys:
                - os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - os-deps-{{ arch }}-{{ .Branch }}
          - run:
              name: Install Os dependencies
              command: ./build/x86_64/linux_macos/osDependencies.sh
              timeout: 2m
          - save_cache:
              key: os-deps-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - vendor/apt
                - vendor/apt/archives
          - restore_cache:
              keys:
                - make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
                - make-382-{{ arch }}-{{ .Branch }}
          - run:
              name: Build and install make via script
              command: ./build/x86_64/linux_macos/installMake.sh
              timeout: 1m
          - save_cache:
              key: make-382-{{ arch }}-{{ .Branch }}-{{ .Environment.CIRCLE_SHA1 }}
              paths:
                - /make-3.82/
                - /usr/bin/make
                - /usr/local/bin/make
                - /usr/share/man/man1/make.1.gz

In the following section, I will go through the steps show how to build the pre-built docker image. It will involve running the bash scripts – ./build/x86_64/linux_macos/osDependencies.sh and ./build/x86_64/linux_macos/installMake.sh to install the necessary dependencies as part of building a docker image. And, finally pushing the image to Docker Hub (can be pushed to any other remote registry of your choice).

Steps to build the pre-built docker image

– Run build-docker-image.sh (see bash script source) which depends on the presence of Dockerfile (see docker script source). The Dockerfile does all the necessary tasks of running the dependencies inside the container i.e. runs the bash scripts ./build/x86_64/linux_macos/osDependencies.sh and ./build/x86_64/linux_macos/installMake.sh:

    $ ./build-docker-image.sh

– Once the image has been built successfully, run push-graal-docker-image-to-hub.sh after setting the USER_NAME and IMAGE_NAME (see source code) otherwise it will use the default values as set in the bash script:

    $ USER_NAME="[your docker hub username]" IMAGE_NAME="[any image name]" \
        ./push-graal-docker-image-to-hub.sh

CircleCI config file statistics: Approach 1 versus Approach 2

Areas of interestApproach 1Approach 2
Config file (full source list)build-on-circlecibuild-using-prebuilt-docker-image
Commit point (sha)df28ee7e5916f1
Lines of code (loc)110 lines85 lines
Source lines (sloc)110 sloc85 sloc
Steps (steps: section)1915
Performance (see Performance section)Some speedup due to caching, but slower than Approach 2Speed-up due to pre-built docker image, and also due to caching at different steps. Faster than Approach 1

Ensure DLC layering is enabled (its a paid feature)

What not to do?

Approach 1 issues

I came across things that wouldn’t work initially, but were later fixed with changes to the configuration file or the scripts:

  • please make sure the .circleci/config.yml is always in the root directory of the folder
  • when using the store_artifacts directive in the .circleci/config.yml file setting, set the value to a fixed folder name i.e. jdk8-with-graal-local/ – in our case, setting the path to ${BASEDIR}/project/jdk8-with-graal didn’t create the resulting artifact once the build was finished hence the fixed path name suggestion.
  • environment variables: when working with environment variables, keep in mind that each command runs in its own shell hence the values set to environment variables inside the shell execution environment isn’t visible outside, follow the method used in the context of this post. Set the environment variables such that all the commands can see its required value to avoid misbehaviours or unexpected results at the end of each step.
  • caching: use the caching functionality after reading about it, for more details on CircleCI caching refer to the caching docs. See how it has been implemented in the context of this post. This will help avoid confusions and also help make better use of the functionality provided by CircleCI.

Approach 2 issues

  • Caching: check the docs when trying to use the Docker Layer Caching (DLC) option as it is a paid feature, once this is known the doubts about “why CircleCI keeps downloading all the layers during each build” will be clarified, for Docker Layer Caching details refer to docs. It can also clarify why in non-paid mode my build is still not as fast as I would like it to be.

General note:

  • Light-weight instances: to avoid the pitfall of thinking we can run heavy-duty builds, check the documentation on the technical specifications of the instances. If we run the standard Linux commands to probe the technical specifications of the instance we may be misled by thinking that they are high specification machines. See the step that enlists the Hardware and Software details of the instance (see Display HW, SW, Runtime env. info and versions of dependencies section). The instances are actually Virtual Machines or Container like environments with resources like 2CPU/4096MB. This means we can’t run long-running or heavy-duty builds like building the GraalVM suite. Maybe there is another way to handle these kinds of builds, or maybe such builds need to be decomposed into smaller parts.
  • Global environment variables: as each run line in the config.yml, runs in its own shell context, from within that context environment variables set by other executing contexts do not have access to these values. Hence in order to overcome this, we have adopted two methods:
  • pass as variables as parameters to calling bash/shell scripts to ensure scripts are able to access the values in the environment variables
  • use the source command as a run step to make environment variables accessible globally

End result and summary

We see the below screen (the last step i.e. Updating artifacts enlists where the artifacts have been copied), after a build has been successfully finished:

The artifacts are now placed in the right folder for download. We are mainly concerned about the jdk8-with-graal.tar.gz artifact.

Performance

Before writing this post, I ran multiple passes of both the approaches and jotted down the time taken to finish the builds, which can be seen below:

Approach 1: standard CircleCI build (caching enabled)
– 13 mins 28 secs
– 13 mins 59 secs
– 14 mins 52 secs
– 10 mins 38 secs
– 10 mins 26 secs
– 10 mins 23 secs
Approach 2: using pre-built docker image (caching enabled, DLC) feature unavailable)
– 13 mins 15 secs
– 15 mins 16 secs
– 15 mins 29 secs
– 15 mins 58 secs
– 10 mins 20 secs
– 9 mins 49 secs

Note: Approach 2 should show better performance when using a paid tier, as Docker Layer Caching) is available as part of this plan.

Sanity check

In order to be sure that by using both the above approaches we have actually built a valid JDK embedded with the GraalVM compiler, we perform the following steps with the created artifact:

– Firstly, download the jdk8-with-graal.tar.gz artifact from under the Artifacts tab on the CircleCI dashboard (needs sign-in):

– Then, unzip the .tar.gz file and do the following:

    tar xvf jdk8-with-graal.tar.gz

– Thereafter, run the below command to check the JDK binary is valid:

    cd jdk8-with-graal
    ./bin/java -version

– And finally check if we get the below output:

    openjdk version "1.8.0-internal"
    OpenJDK Runtime Environment (build 1.8.0-internal-jenkins_2017_07_27_20_16-b00)
    OpenJDK 64-Bit Graal:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565 (build 25.71-b01-internal-jvmci-0.46, mixed mode)

– Similarly, to confirm if the JRE is valid and has the GraalVM compiler built in, we do this:

    ./bin/jre/java -version

– And check if we get a similar output as above:

    openjdk version "1.8.0-internal"
    OpenJDK Runtime Environment (build 1.8.0-internal-jenkins_2017_07_27_20_16-b00)
    OpenJDK 64-Bit Graal:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565:compiler_ab426fd70e30026d6988d512d5afcd3cc29cd565 (build 25.71-b01-internal-jvmci-0.46, mixed mode)

With this, we have successfully built JDK8 with the GraalVM compiler embedded in it and also bundled the Graal and Truffle components in an archive file, both of which are available for download via the CircleCI interface.

Note: you will notice that we do perform sanity checks of the binaries built just before we pack them into compressed archives, as part of the build steps (see bottom section of CircleCI the configuration files section).

Nice badges!

We all like to show-off and also like to know the current status of our build jobs. A green-colour, build status icon is a nice indication of success, which looks like the below on a markdown README page:

We can very easily embed both of these status badges displaying the build status of our project (branch-specific i.e. master or another branch you have created) built on CircleCI (see docs) on how to do that).

Conclusions

We explored two approaches to build the GraalVM compiler using the CircleCI environment. They were good experiments to compare performance between the two approaches and also how we can do them with ease. We also saw a number of things to avoid or not to do and also saw how useful some of the CircleCI features are. The documentation and forums do good justice when trying to make a build work or if you get stuck with something.

Once we know the CircleCI environment, it’s pretty easy to use and always gives us the exact same response (consistent behaviour) every time we run it. Its ephemeral nature means we are guaranteed a clean environment before each run and a clean up after it finishes. We can also set up checks on build time for every step of the build, and abort a build if the time taken to finish a step surpasses the threshold time-period.

The ability to use pre-built docker images coupled with Docker Layer Caching on CircleCI can be a major performance boost (saves us build time needed to reinstall any necessary dependencies at every build). Additional performance speedups are available on CircleCI, with caching of the build steps – this again saves build time by not having to re-run the same steps if they haven’t changed.

There are a lot of useful features available on CircleCI with plenty of documentation and everyone on the community forum are helpful and questions are answered pretty much instantly.

Next, let’s build the same and more on another build environment/build farm – hint, hint, are you think the same as me? Adopt OpenJDK build farm)? We can give it a try!

Thanks and credits to Ron Powell from CircleCI and Oleg Šelajev from Oracle Labs for proof-reading and giving constructive feedback. 

Please do let me know if this is helpful by dropping a line in the comments below or by tweeting at @theNeomatrix369, and I would also welcome feedback, see how you can reach me, above all please check out the links mentioned above.

Useful resources

– Links to useful CircleCI docs
About Getting started | Videos
About Docker
Docker Layer Caching
About Caching
About Debugging via SSH
CircleCI cheatsheet
CircleCI Community (Discussions)
Latest community topics
– CircleCI configuration and supporting files
Approach 1: https://github.com/neomatrix369/awesome-graal/tree/build-on-circleci (config file and other supporting files i.e. scripts, directory layout, etc…)
Approach 2: https://github.com/neomatrix369/awesome-graal/tree/build-on-circleci-using-pre-built-docker-container (config file and other supporting files i.e. scripts, directory layout, etc…)
Scripts to build Graal on Linux, macOS and inside the Docker container
Truffle served in a Holy Graal: Graal and Truffle for polyglot language interpretation on the JVM
Learning to use Wholly GraalVM!
Building Wholly Graal with Truffle!

Building Wholly Graal with Truffle!

feature-image-building-graal-and-truffle

Citation: credits to the feature image go to David Luders and reused under a CC license, the original image can be found on this Flickr page.

Introduction

It has been some time, since the two posts [1][2] on Graal/GraalVM/Truffle, and a general request was when are you going to write something about “how to build” this awesome thing called Graal. Technically, we will be building HotSpot’s C2 compiler (look for C2 in the glossary list) replacement, called Graal. This binary is different from the  GraalVM suite you download from OTN via http://graalvm.org/downloads.

I wasn’t just going to stop at the first couple of posts on this technology. In fact, one of the best ways to learn and get an in-depth idea about any tech work, is to know how to build it.

Getting Started

Building JVMCI for JDK8, Graal and Truffle is fairly simple, and the instructions are available on the graal repo. We will be running them on both the local (Linux, MacOS) and container (Docker) environments. To capture the process as-code, they have been written in bash, see https://github.com/neomatrix369/awesome-graal/tree/master/build/x86_64/linux_macos.

During the process of writing the scripts and testing them on various environments, there were some issues, but these were soon resolved with the help members of the Graal team — thanks Doug.

Running scripts

Documentation on how to run the scripts are provided in the README.md on awesome-graal. For each of the build environments they are merely a single command:

Linux & MacOS

$ ./local-build.sh

$ RUN_TESTS=false           ./local-build.sh

$ OUTPUT_DIR=/another/path/ ./local-build.sh

Docker

$ ./docker-build.sh

$ DEBUG=true                ./docker-build.sh

$ RUN_TESTS=false           ./docker-build.sh

$ OUTPUT_DIR=/another/path/ ./docker-build.sh

Both the local and docker scripts pass in the environment variables i.e. RUN_TESTS and OUTPUT_DIR to the underlying commands. Debugging the docker container is also possible by setting the DEBUG environment variable.

For a better understanding of how they work, best to refer to the local and docker scripts in the repo.

Build logs

I have provided build logs for the respective environments in the build/x86_64/linux_macos  folder in https://github.com/neomatrix369/awesome-graal/

Once the build is completed successfully, the following messages are shown:

[snipped]

>> Creating /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal from /path/to/awesome-graal/build/x86_64/linux/graal-jvmci-8/jdk1.8.0_144/linux-amd64/product
Copying /path/to/awesome-graal/build/x86_64/linux/graal/compiler/mxbuild/dists/graal.jar to /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal/jre/lib/jvmci
Copying /path/to/awesome-graal/build/x86_64/linux/graal/compiler/mxbuild/dists/graal-management.jar to /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal/jre/lib/jvmci
Copying /path/to/awesome-graal/build/x86_64/linux/graal/sdk/mxbuild/dists/graal-sdk.jar to /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal/jre/lib/boot
Copying /path/to/awesome-graal/build/x86_64/linux/graal/truffle/mxbuild/dists/truffle-api.jar to /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal/jre/lib/truffle

>>> All good, now pick your JDK from /path/to/awesome-graal/build/x86_64/linux/jdk8-with-graal :-)

Creating Archive and SHA of the newly JDK8 with Graal & Truffle at /home/graal/jdk8-with-graal
Creating Archive jdk8-with-graal.tar.gz
Creating a sha5 hash from jdk8-with-graal.tar.gz
jdk8-with-graal.tar.gz and jdk8-with-graal.tar.gz.sha256sum.txt have been successfully created in the /home/graal/output folder.

Artifacts

All the Graal and Truffle artifacts are created in the graal/compiler/mxbuild/dists/ folder and copied to the newly built jdk8-with-graal folder, both of these will be present in the folder where the build.sh script resides:

jdk8-with-graal/jre/lib/jvmci/graal.jar
jdk8-with-graal/jre/lib/jvmci/graal-management.jar
jdk8-with-graal/jre/lib/boot/graal-sdk.jar
jdk8-with-graal/jre/lib/truffle/truffle-api.jar

In short, we started off with vanilla JDK8 (JAVA_HOME) and via the build script created an enhanced JDK8 with Graal and Truffle embedded in it. At the end of a successful build process, the script will create a .tar.gz archive file in the jdk8-with-graal-local folder, alongside this file you will also find the sha5 hash of the archive.

In case of a Docker build, the same folder is called jdk8-with-graal-docker and in addition to the above mentioned files, it will also contain the build logs.

Running unit tests

Running unit tests is a simple command:

mx --java-home /path/to/jdk8 unittest

This step should follow the moment we have a successfully built artifact in the jdk8-with-graal-local folder. The below messages indicate a successful run of the unit tests:

>>>> Running unit tests...
Warning: 1049 classes in /home/graal/mx/mxbuild/dists/mx-micro-benchmarks.jar skipped as their class file version is not supported by FindClassesByAnnotatedMethods
Warning: 401 classes in /home/graal/mx/mxbuild/dists/mx-jacoco-report.jar skipped as their class file version is not supported by FindClassesByAnnotatedMethods
WARNING: Unsupported class files listed in /home/graal/graal-jvmci-8/mxbuild/unittest/mx-micro-benchmarks.jar.jdk1.8.excludedclasses
WARNING: Unsupported class files listed in /home/graal/graal-jvmci-8/mxbuild/unittest/mx-jacoco-report.jar.jdk1.8.excludedclasses
MxJUnitCore
JUnit version 4.12
............................................................................................
Time: 5.334

OK (92 tests)

JDK differences

So what have we got that’s different from the JDK we started with. If we compare the boot JDK with the final JDK here are the differences:

Combination of diff between $JAVA_HOME and jdk8-with-graal and meld will give the above:

JDKversusGraalJDKDiff-02

JDKversusGraalJDKDiff-01

diff -y --suppress-common-lines $JAVA_HOME jdk8-with-graal | less
meld $JAVA_HOME ./jdk8-with-graal

Note: $JAVA_HOME points to your JDK8 boot JDK.

Build execution time

The build execution time was captured on both Linux and MacOS and there was a small difference between running tests and not running tests:

Running the build with or without tests on a quad-core, with hyper-threading:

 real 4m4.390s
 user 15m40.900s
 sys 1m20.386s
 ^^^^^^^^^^^^^^^
 user + sys = 17m1.286s (17 minutes 1.286 second)

Similar running the build with and without tests on a dual-core MacOS, with 4GB RAM, SSD drive, differs little:

 real 9m58.106s
 user 18m54.698s 
 sys 2m31.142s
 ^^^^^^^^^^^^^^^ 
 user + sys = 21m25.84s (21 minutes 25.84 seconds)

Disclaimer: these measurements can certainly vary across the different environments and configurations. If you have a more accurate way to benchmark such running processes, please do share back.

Summary

In this post, we saw how we can build Graal and Truffle for JDK8 on both local and container environments.

The next thing we will do is build them on a build farm provided by Adopt OpenJDK. We will be able to run them across multiple platforms and operating systems, including building inside docker containers. This binary is different from the GraalVM suite you download from OTN via http://graalvm.org/downloads, hopefully we will be able to cover GraalVM in a future post.

Thanks to Julien Ponge for making his build script available for re-use and the Graal team for supporting during the writing of this post.

Feel free to share your feedback at @theNeomatrix369. Pull requests with improvements and best-practices are welcome at https://github.com/neomatrix369/awesome-graal.

Containers all the way through…

In this post I will attempt to cover fundamentals of Bare Metal Systems, Virtual Systems and Container Systems. And the purpose for doing so is to learn about these systems as they stand and also the differences between them, focusing on how they execute programs in their respective environments.

Bare metal systems

Let’s think of our Bare Metal Systems as desktops and laptops we use on a daily basis (or even servers in server rooms and data-centers), and we have the following components:

  • the hardware (outer physical layer)
  • the OS platform (running inside the hardware)
  • the programs running on the OS (as processes)

Programs are stored on the hard drive in the form of executable files (a format understandable by the OS) and loaded into memory via one or more processes. Programs interact with the kernel, which forms a core part of the OS architecture and the hardware. The OS coordinate communication between hardware i.e. CPU, I/O devices, Memory, etc… and the programs.

 

Bare Metal Systems

A more detailed explanation of what programs or executables are, how programs execute and where an Operating System come into play, can be found on this Stackoverflow page [2].

Virtual systems

On the other hand Virtual Systems, with the help of Virtual System controllers like, Virtual Box or VMWare or a hypervisor [1] run an operating system on a bare metal system. These systems emulate bare-metal hardware as software abstraction(s) inside which we run the real OS platform. Such systems can be made up of the following layers, and also referred to as a Virtual Machines (VM):

  • a software abstraction of the hardware (Virtual Machine)
  • the OS platform running inside the software abstraction (guest OS)
  • one or more programs running in the guest OS (processes)

It’s like running a computer (abstracted as software) inside another computer. And the rest of the fundamentals from the Bare Metal System applies to this abstraction layer as well. When a process is created inside the Virtual System, then the host OS which runs the Virtual System might also be spawning one or more processes.

Virtual Systems

Container systems

Now looking at Container Systems we can say the following:

  • they run on top of OS platforms running inside Bare Metal Systems or Virtual Systems
  • containers which allow isolating processes and sharing the kernel between each other (such isolation from other processes and resources are possible in some OSes like say Linux, due to OS kernel features like cgroups[3] and namespaces)[4]

A container creates an OS like environment, inside which one or more programs can be executed. Each of these executions could result in a one or more processes on the host OS. Container Systems are composed of these layers:

  • hardware (accessible via kernel features)
  • the OS platform (shared kernel)
  • one or more programs running inside the container (as processes)

Container Systems

Summary

Looking at these enclosures or rounded rectangles within each other, we can already see how it is containers all the way through.

Bare Metal Systems
Virtual SystemsContainer Systems

There is an increasing number of distinctions between Bare Metal Systems, Virtual Systems and Container Systems. While Virtual Systems encapsulate the Operating System inside a thick hardware virtualisation, Container Systems do something similar but with a much thinner virtualisation layer.

There are a number of pros and cons between these systems when we look at them individually, i.e. portability, performance, resource consumption, time to recreate such systems, maintenance, et al.

Word of thanks and stay in touch

Thank you for your time, feel free to send your queries and comments to @theNeomatrix369. Big thanks to my colleague, and  our DevOps craftsman  Robert Firek from Codurance for proof-reading my post and steering me in the right direction.

Resources

Why not build #OpenJDK 9 using #Docker ? – Part 2 of 2

…continuing from Why not build #OpenJDK 9 using #Docker ? – Part 1 of 2.

I ran into a number of issues and you can see from my commits, I pulled myself out of it, but to run this Dockerfile from the command-line I used this instruction:

$ docker build -t neomatrix369/openjdk9 .

you can also do it using the below if you have not set your permissions:

$ sudo docker build -t neomatrix369/openjdk9 .

and get the below (summarised) output:

Sending build context to Docker daemon 3.072 kB
Sending build context to Docker daemon 
Step 0 : FROM phusion/baseimage:latest
 ---> 5a14c1498ff4
Step 1 : MAINTAINER Mani Sarkar (from @adoptopenjdk)
 ---> Using cache
 ---> 95e30b7f52b9
Step 2 : RUN apt-get update &&   apt-get install -y     libxt-dev zip pkg-config libX11-dev libxext-dev     libxrender-dev libxtst-dev libasound2-dev libcups2-dev libfreetype6-dev &&   rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 1ea3bbb15c2d
Step 3 : RUN apt-get update
 ---> Using cache
 ---> 6c3938f4d23d
Step 4 : RUN apt-get install -y mercurial ca-certificates-java build-essential
 ---> Using cache
 ---> e3f99b5a3bd3
Step 5 : RUN cd /tmp &&   hg clone http://hg.openjdk.java.net/jdk9/jdk9 openjdk9 &&   cd openjdk9 &&   sh ./get_source.sh
 ---> Using cache
 ---> 26cfaf16b9fa
Step 6 : RUN apt-get install -y wget &&   wget --no-check-certificate --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gz
 ---> Using cache
 ---> 696889250fed
Step 7 : RUN tar zxvf jdk-8u45-linux-x64.tar.gz -C /opt
 ---> Using cache
 ---> c25cc9201c1b
Step 8 : RUN cd /tmp/openjdk9 &&   bash ./configure --with-cacerts-file=/etc/ssl/certs/java/cacerts --with-boot-jdk=/opt/jdk1.8.0_45
 ---> Using cache
 ---> 4e425de379e6
Step 9 : RUN cd /tmp/openjdk9 &&   make clean images
 ---> Using cache
 ---> 2d9e17c870be
Step 10 : RUN cd /tmp/openjdk9 &&   cp -a build/linux-x86_64-normal-server-release/images/jdk     /opt/openjdk9
 ---> Using cache
 ---> 9250fac9b500
Step 11 : RUN cd /tmp/openjdk9 &&   find /opt/openjdk9 -type f -exec chmod a+r {} + &&   find /opt/openjdk9 -type d -exec chmod a+rx {} +
 ---> Using cache
 ---> d0c597d045d4
Step 12 : ENV PATH /opt/openjdk9/bin:$PATH
 ---> Using cache
 ---> 3965c3e47855
Step 13 : ENV JAVA_HOME /opt/openjdk9
 ---> Using cache
 ---> 5877e8efd939
Successfully built 5877e8efd939

The above action creates an image which is stored in your local repository (use docker images to enlist the images in the repo). If you want to load the image into a container, and access the files it has built or see anything else, do the below:

$ sudo docker run -it --name openjdk9 neomatrix369/openjdk9 /bin/bash

this will take you to a bash prompt into the container and you can run any of your linux commands and access the file system.

Explaining docker run

$ sudo docker run -it --name openjdk9 neomatrix369/openjdk9 java -version

will show you this

openjdk version "1.9.0-internal"
OpenJDK Runtime Environment (build 1.9.0-internal-_2015_06_04_06_46-b00)
OpenJDK 64-Bit Server VM (build 1.9.0-internal-_2015_06_04_06_46-b00, mixed mode)

Here’s a breakdown of the docker run command:

docker run The command to create and start a new Docker container.
-it To run in interactive mode, so you can see the after running the container.
neomatrix369/openjdk9 This is a reference to the image tag by name (which we created above).
java -version Runs the java command asking its version, inside the containing, which is assisted by the two environment variables PATH and JAVA_HOME which was set in the Dockerfile above.

Footnotes

You might have noticed I grouped very specific instructions with each step, especially the RUN commands, its because, each time I got one of these wrong, it would re-execute the step again, including the steps that ran fine and didn’t need re-executing. Not only is this unnecessary its not using our resources efficiently which is what Docker brings us. So any addition, edition or deletion to any step will only result in that step being executed, and not the other steps that are fine.

So one of the best practises is to keep the steps granular enough and pre-load files and data beforehand and give it to docker. It has amazing caching and archiving mechanisms built in.

Save our work

As we know if we do not save the container into the image, our changes are lost.

If I didn’t use the docker build command I used earlier I could have, after the build process was completed and image created, used the below command:

$ sudo docker commit [sha of the image] neomatrix369/openjdk9

Sharing your docker image on Docker hub

Once you are happy with your changes, and want to share it with community at large, do the below:

$ sudo docker push neomatrix369/openjdk9

and you will see these depending on which of your layers have been found in the repo and which ones are new (this one is an example snapshot of the process):

The push refers to a repository [neomatrix369/openjdk9] (len: 1)
5877e8efd939: Image already exists 
3965c3e47855: Image already exists 
d0c597d045d4: Image already exists 
9250fac9b500: Image already exists 
2d9e17c870be: Buffering to Disk
.
.
.

There is plenty of room for development and improvement of this Docker script. So happy hacking and would love to hear your feedback or contributions from you.

BIG Thanks

Big thanks to the below two who proof-read my post and added value to it, whilst enjoying the #Software #Craftsmanship developer community (organised and supported by @LSCC):
Oliver Nautsch – @ollispieps (JUG Switzerland)
Amir Bazazi (@Codurance) – @amirbazazi

Special thanks to Roberto Cortez (@radcortez) for your Docker posts, these inspired and helped me write my first Docker post.

Resources

[1] Docker
[2] Get into Docker – A Guide for Total Newbies
[3] Docker for Total Newbies Part 2: Distribute Your Applications with Docker Images
[4] Docker posts on Voxxed
[5] OpenJDK
[6] Building OpenJDK
[7] Building OpenJDK on Linux, MacOs and Windows
[8] Virtual Machines (OpenJDK)
[9] Build your own OpenJDK
[10] Vagrant script (OpenJDK)
[11] YOUR DOCKER IMAGE MIGHT BE BROKEN without you knowing it
[12] Dockerfile on github
[13] Adopt OpenJDK: Getting Started Kit
[14] London Java Community

Why not build #OpenJDK 9 using #Docker ? – Part 1 of 2

Introduction

I think I have joined the Docker [1] party a bit late but that means by now everyone knows what Docker is and all the other basic fundamentals which I can very well skip, but if you are still interested, please check these posts Get into Docker – A Guide for Total Newbies [2] and Docker for Total Newbies Part 2: Distribute Your Applications with Docker Images [3]. And if you still want to know more about this widely spoken topic, check out these Docker posts on Voxxed [4].

Why ?

Since everyone has been doing some sort of provisioning or spinning up of dev or pre-prod or test environments using Docker [1] I decided to do the same but with my favourite project i.e. OpenJDK [5].

So far you can natively build OpenJDK [6] across Linux, MacOs and Windows [7], or do the same things via virtual machines or vagrant instances, see more on then via these resources Virtual Machines, [8] Build your own OpenJDK [9] and this vagrant script [10]. All part of the Adopt OpenJDK initiative lead by London Java Community [14] and supported by JUGs all over the world.

Requirements

Most parts of post is for those using Linux distributions (this one was created on Ubuntu 14.04). Linux, MacOS and Windows users please refer to Docker‘s  Linux, MacOS and Windows instructions respectively.

Hints: MacOS and Windows users will need to install Boot2Docker and remember to run the below two commands (and check your Docker host environment variables):

$ boot2docker init
$ boot2docker up 
$ boot2docker shellinit 

For the MacOS, if the above throw FATA[…] error messages, please try the below:

$ sudo boot2docker init
$ sudo boot2docker up 
$ sudo boot2docker shellinit 

For rest of the details please refer to the links provided above. Once you have the above in place for the Windows or MacOS platform, by merely executing the Dockerfile using the docker build and docker run commands you can create / update a container and run it respectively.

*** Please refer to the above links and ensure Docker works for you for the above platforms – try out tutorials or steps proving that Docker run as expected before proceeding further. ***

Building OpenJDK 9 using Docker

Now I will show you how to do the same things as mentioned above using Docker.

So I read the first two resource I shared so far (and wrote the last ones). So lets get started, and I’m going to walk you through what the Dockerfile looks like, as I take you through each section of the Dockerfile code.

*** Please note the steps below are not meant to be executed on your command prompty, they form an integral part of the Dockerfile which you can download from here at the end of this post. ***

You have noticed unlike everyone else I have chosen a different OS image i.e. phusion/baseimage, why? Read YOUR DOCKER IMAGE MIGHT BE BROKEN without you knowing it [11], to learn more about it.

FROM phusion/baseimage:latest

Each of the RUN steps below when executed becomes a Docker layer in isolation and gets assigned a SHA like this i.e. 95e30b7f52b9.

RUN \
  apt-get update && \
  apt-get install -y \
    libxt-dev zip pkg-config libX11-dev libxext-dev \
    libxrender-dev libxtst-dev libasound2-dev libcups2-dev libfreetype6-dev && \
  rm -rf /var/lib/apt/lists/*

The base image is updated and a number of dependencies are installed i.e. Mercurial (hg) and build-essential.

RUN \
  apt-get update && \
  apt-get install -y mercurial ca-certificates-java build-essential

Clone the OpenJDK 9 sources and download the latest sources from mercurial. You will notice that each of these steps are prefixed by this line cd /tmp &&, this is because each instruction is run in its own layer, as if it does not remember where it was when the previous instruction was run. Nothing to worry about, all your changes are still intact in the container.

RUN \
  cd /tmp && \
  hg clone http://hg.openjdk.java.net/jdk9/jdk9 openjdk9 && \
  cd openjdk9 && \
  sh ./get_source.sh

Install only what you need when you need them, see below I downloaded wget and then the jdk binary. I also learnt how to use wget by passing the necessary params and headers to make the server give us the binary we request. Finally un-tar the file using the famous tar command.

RUN \
  apt-get install -y wget && \
  wget --no-check-certificate --header "Cookie: oraclelicense=accept-securebackup-cookie" \ 
http://download.oracle.com/otn-pub/java/jdk/8u45-b14/jdk-8u45-linux-x64.tar.gz

RUN \
  tar zxvf jdk-8u45-linux-x64.tar.gz -C /opt

Run configure with the famous –with-boot-jdk=/opt/jdk1.8.0_45 to set the bootstrap jdk to point to jdk1.8.0_45.

RUN \
  cd /tmp/openjdk9 && \
  bash ./configure --with-cacerts-file=/etc/ssl/certs/java/cacerts --with-boot-jdk=/opt/jdk1.8.0_45

Now run the most important command:

RUN \  
  cd /tmp/openjdk9 && \
  make clean images

Once the build is successful, the artefacts i.e. jdk and jre images are created in the build folder.

RUN \  
  cd /tmp/openjdk9 && \
  cp -a build/linux-x86_64-normal-server-release/images/jdk \
    /opt/openjdk9

Below are some chmod ceremonies across the files and directories in the openjdk9 folder.

RUN \  
  cd /tmp/openjdk9 && \
  find /opt/openjdk9 -type f -exec chmod a+r {} + && \
  find /opt/openjdk9 -type d -exec chmod a+rx {} +

Two environment variable i.e. PATH and JAVA_HOME are created with the respective values assigned to them.

ENV PATH /opt/openjdk9/bin:$PATH
ENV JAVA_HOME /opt/openjdk9

You can find the entire source for the entire Dockerfile on github [12].

…more of this in the next post, Why not build #OpenJDK 9 using #Docker ? – Part 2 of 2, we will use the docker build, docker run commands and some more docker stuff.