How to harness the Powers of Cloud TPUs?

Introduction

About a couple of months ago, I got introduced to Google Colab (an enhanced version of JuPyteR notebooks) and since then didn’t look back. It’s got everything and more you need as a researcher. The initial idea was to load up python notebooks and run them on it. Soon we realised we can actually run those notebooks not just on CPUs available on GCP but also GPUs and TPUs. Also, read up a bit about it from other sources, see Google Reveals Technical Specs and Business Rationale for TPU Processor (slightly dated, but definitely helpful).

Here we go now…

What is a TPU?

Just like a GPU is a graphics accelerator ASIC (to help create graphics images quickly for output to a display device) – which since long have been discovered that can be taken advantage of by using it for massive number crunching. Similarly, a TPU is an AI accelerator ASIC developed by Google specifically for Neural Network Machine Learning. One of the differences being TPUs are more about processing high-volume low-precision computation while GPUs for high-volume high-precision computation (please check the Wikipedia links for the interesting differences).

Then what?

TL;DR — how the notebooks and slides came about

To understand how these devices work, what better approach can we adopt than to benchmark them and then compare the results. Which is why we had a number of notebooks that came out of these experiments. And then we were also coincidentally preparing for Google Cloud Next 2018 at the Excel Center, London. And I spent an evening and a weekend preparing the slides for the talk Yaz and I were asked to give – Harnessing the Powers of Cloud TPU.

What happened first, before the other thing happened?

TL;DR — how we work at the GDG Cloud meetup events

It was a mere coincidence while we were meeting regularly at the GDG Cloud meetups in the London chapter, where Yaz would find interesting things to look at during the hack sessions (Pomodoro sessions – as he called them), suggested one session that we play with TPUs and benchmark them. Actually, I remember suggesting we do this during our sessions as an idea, but then we all got distracted with other equally interesting ideas (all saved somewhere on GitLab). And then I frantically started playing with two notebooks related to GPU and TPU respectively, provided as examples by Colab, which I then got to work on the TPU and then adapted it to work on the GPU (can’t remember anymore which way first). They both were doing slightly different things and measuring the performance of the GPU and TPU and I decided to make them do the same thing and measure the time taken to do it on the different device. Also, display details about the devices themselves (you will see towards the top or bottom of each notebook).

CPU v/s GPU v/s TPU –  Simple benchmarking example via Google Colab

CPU v/s GPU – Simple benchmarking

The CPU v/s GPU – Simple benchmarking notebook finish processing with the below output:

TFLOP is a bit of shorthand for “teraflop”, which is a way of 
measuring the power of a computer based more on mathematical 
capability than GHz. A teraflop refers to the capability of a 
processor to calculate one trillion floating-point operations
per second.

CPU TFlops: 0.53 
GPU speedup over CPU: 29x                    TFlops: 15.70

I was curious about the internals of the CPU and GPU so I ran some Linux commands (via Bash) that the notebooks allow (thankfully) and got these bits of info to share:

GPU-simple-benchmark-conclusion

You can find all those commands and the above output in the notebook as well.

CPU v/s TPU – Simple benchmarking

The TPU – Simple benchmarking notebook finish processing with the below output:

TFLOP is a bit of shorthand for “teraflop” which is a way of 
measuring the power of a computer based more on mathematical 
capability than GHz. A teraflop refers to the capability of a 
processor to calculate one trillion floating-point operations
per second.

CPU TFlops: 0.47
TPU speedup over CPU (cold-start): 75x        TFlops: 35.47 
TPU speedup over CPU (after warm-up): 338x    TFlops: 158.91

Unfortunately, I haven’t had a chance to play around with the TPU profiler yet to learn more about the internals of this fantastic device.

While there is room for errors and inaccuracies in the above figures, you might be curious about the tasks used for all the runs – it’s the below piece of code that has been making the CPU, GPU and TPU circuitry for the Simple benchmarking notebooks:

def cpu_flops():
  x = tf.random_uniform([N, N])
  y = tf.random_uniform([N, N])
  def _matmul(x, y):
    return tf.tensordot(x, y, axes=[[1], [0]]), y

  return tf.reduce_sum(
    tf.contrib.tpu.repeat(COUNT, _matmul, [x, y])
  )

CPU v/s GPU v/s TPU – Time-series prediction via Google Colab

Running the TPU version of the Timeseries notebook gave us some issues initially, which was reported on a StackOverflow post and a couple of good folks from the Google Cloud TPU team stepped in to help. But we managed to get the GPU version of the Time-Series Prediction notebook to work which clearly showed a much better response than the CPU version of Time-Series Prediction notebook – this version just choked half-way through the CPU-cycles and Colab asked me if I wanted to stop the process because we needed more resources (more memory)!!!

Time series: GPU version

Here snapshots of the notebook (Train the Recurrent Neural Network section), the full notebook can be found on Google Colab, it’s free to download, share and extract the python code in it.

Epoch 1/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0047WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 42s 4s/step - loss: 0.0048
Epoch 2/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0041WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 42s 4s/step - loss: 0.0041
Epoch 3/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0047WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 42s 4s/step - loss: 0.0046
Epoch 4/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0039WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 43s 4s/step - loss: 0.0039
Epoch 5/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0048WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0048
Epoch 6/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0036WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0037
Epoch 7/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0042WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0041
Epoch 8/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0037WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0038
Epoch 9/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0040WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0039
Epoch 10/10
 9/10 [==========================>...] - ETA: 4s - loss: 0.0036WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
WARNING:tensorflow:Reduce LR on plateau conditioned on metric `val_loss` which is not available. Available metrics are: loss,lr
10/10 [==============================] - 44s 4s/step - loss: 0.0035
CPU times: user 10min 8s, sys: 1min 22s, total: 11min 30s
Wall time: 7min 12s

The above run takes about ~7 mins (or total time of ~8 mins) on the Google Colab GPU:

CPU times: user 10min 8s, sys: 1min 22s, total: 11min 30s

Wall time: 7min 12s

I’m still unsure on how to interpret this time-related stats but I will take ~7 mins as our execution time till this point.

We finish with the following stats:

Screen Shot 2018-11-27 at 19.52.09

So now you can see why I earlier chose ~7 mins as the execution time. So it takes about 7 minutes to process this notebook – giving new predictions of temperature, pressure and wind speed and comparing it with the actual values (true values gather from post observations).

Time series: TPU version

Here snapshots of the notebook (Train the Recurrent Neural Network section), the full notebook can be found on Google Colab, it’s free to download, share and extract the python code in it.

Found TPU at: grpc://10.118.17.162:8470
INFO:tensorflow:Querying Tensorflow master (b'grpc://10.118.17.162:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 11845881175500857789)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 5923571607183194652)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 11085218230396215841)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 12636361223481337501)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14151025931657390984)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 16816909163217742616)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4327750408753767066)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 504271688162314774)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 14356678784461051119)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 6767339384180187426)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 1879489006510593388)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 17850015066511710434)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/10
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(32,), dtype=tf.int32, name='core_id0'), TensorSpec(shape=(32, 1344, 20), dtype=tf.float32, name='input_10'), TensorSpec(shape=(32, 1344, 3), dtype=tf.float32, name='Dense-2_target_30')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for input
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 3.394456386566162 secs
INFO:tensorflow:Setting weights on TPU model.
 9/10 [==========================>...] - ETA: 1s - loss: 0.0112WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 14s 1s/step - loss: 0.0115
Epoch 2/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0183WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 501ms/step - loss: 0.0187
Epoch 3/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0260WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 497ms/step - loss: 0.0264
Epoch 4/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0324WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 496ms/step - loss: 0.0327
Epoch 5/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0374WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 470ms/step - loss: 0.0376
Epoch 6/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0378WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 471ms/step - loss: 0.0372
Epoch 7/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0220WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 489ms/step - loss: 0.0211
Epoch 8/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0084WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 493ms/step - loss: 0.0081
Epoch 9/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0044WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 495ms/step - loss: 0.0044
Epoch 10/10
 9/10 [==========================>...] - ETA: 0s - loss: 0.0040WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are: loss
WARNING:tensorflow:Can save best model only with val_loss available, skipping.
10/10 [==============================] - 5s 491ms/step - loss: 0.0040
CPU times: user 10.8 s, sys: 2.7 s, total: 13.5 s
Wall time: 1min 6s

There are a few of things to still work on the notebook, one of them being getting rid of the warnings appearing during training on the TPU. As per our previous analysis, running on the TPUs should be way faster than GPUs, correction: TPUs are faster than GPUs even when running the Timeseries TPU version (I aligned the two notebooks i.e. GPU and TPU versions of the Timeseries notebooks and re-ran the two experiments). We haven’t been successful in being able to execute code cells till the end of the notebook due to errors in input shape which needs fixing and re-running the notebook. All of these look like good learning opportunities for me and everyone else. Our new results are more promising as you can see from above. And the whole notebook just took ~2 minutes to finish running, that’s so many times speed up over the GPU version (from the below).

Screen Shot 2018-11-27 at 19.44.15

Observations

We can now see the final outcome of the TPU version of the Timeseries notebook, the initial Simple Benchmarking related examples did help us make the below observations:

TPUs are ~85x to ~312x faster than CPUs, and GPUs are ~30x faster than CPUs

which also means that

TPUs are ~3x to ~10x faster than GPUs, which in turn are ~30x faster than CPUs

Some graphs plotting the speeds in teraflops and times between CPU v/s GPU v/s TPU:

CPU-GPU-TPU-Teraflopschart

CPU-GPU-TPU-Speed-times

Note: that during some runs on GCP, the above numbers showed higher than noted here so please beware of that as well. Maybe the notebooks took advantage of some improvements in the GCP infrastructure.

The different devices ran this simple task (block of code mentioned in the previous section) at different speeds, for a more complex task, the numbers would definitely be different, although we believe that their relative performances shouldn’t deviate too much.

Conclusion

Also, want to thank Yaz for being super-encouraging at all times during this whole process including for the presentation at Google Cloud Next 2018. Not forgetting Claudio who contributed quite a bit to the TPU version of the Timeseries notebook, whilst we have been debugging it.

In the end, it was all great and we will solve all the problems of humanity but for now, we have pretty much finished working on the TPU version of the Timeseries notebook, I welcome everyone to take a jab at it and see if you want to improve it further or experiment with it. Please hit back with your feedback and/or contributions in any case.

(Third generation TPU at the Google Data centre: TPU 3.0)

Have a read of what others (i.e. Jeff Hale’s article) are talking about the different PUs on various cloud providers and you can see GCPCloud leading in many such areas.

Be ready for some more notebook fests coming up in the form of more blog posts in the near to distant future. Please share your comments, feedback or any contributions to @theNeomatrix369, you can find more about me via the About me page.

Resources

Citations

Credits to all the images embedded (including the feature image) on this post go to the respective authors/ creators/owners of the images.

 

Digital Catapult | Machine Intelligence Garage: the best-kept secret yet in the open

Introduction

I was at a meetup when Simon Knowles, CTO of Graphcore was giving his talk on the latest development at Graphcore and that is also where I met with Peter Bloomfield from Digital Catapult(@digicatapult). Peter was spreading the word about Machine Intelligence Garage which is an amazing opportunity created by a collaboration between the government and industry leaders like Google, Nvidia, AWS, etcetera to help startups and small businesses access compute resources which they would have otherwise never been able to get hold of.

Digital Catapult - Get Involved

Our conversation

Sometime later, we decided to have a chat discussing the usual questions like what is Machine Intelligence Garage, what is the history, why, how, who, when, and at what stage is the programme at and how do people get involved? As you would imagine, I found our conversation interesting and informative and hence decided to write about it and share it with the rest of us.

Mani: Hey, Peter great meeting you and learning about the initiative from Digital Catapult, can you please share with me the history about this initiative?

Peter: Hi Mani, thanks for dropping in! Our programme, Machine Intelligence Garage was born from a piece of research we conducted last summer. In this report, we explore barriers facing AI startups, and we wanted to test the hypotheses that: Access to the right data, technical talent and adequate computational resources were the main things holding startups back. We collaborate a bit on the first two barriers with government and academic institutes, like the Turing, but the Machine Intelligence Garage programme was designed to provide startups with access to cloud computing vouchers, novel chipsets and HPC facilities. It is through our brilliant partners that we can offer these resources.

Mani: Are there other sister and daughter initiatives or programs, related to Digital Catapult, that everyone would benefit knowing about?

Peter: We have a whole range of digital programmes, across three core tech layers (future networks, AI/ML and Immersive tech as well as some cybersecurity and blockchain initiatives). The full details of our opportunities can be found through the ‘Get Involved’ section of our website.

Mani: Who have been and are going to benefit from this setup that you have in place?

Peter: Our Machine Intelligence Garage programme is designed to benefit early-stage startups who are data ready and need compute power to scale faster. Our collaborative programmes provide opportunities for larger corporations to get involved with startups to address industry-specific challenges.

Mani: How much does this access cost and for how long are they available for access?

Peter: Everything we provide on the Machine Intelligence Garage programme is free to use. The programme was set up using public funding from InnovateUK and CAP.AI. We are able to deliver the compute resources through our work with a wonderful set of partners.

Mani: Can you tell us the process from start to finish?

Peter: When I meet a new company, I have a chat with them about the things they are trying to do, the infrastructure they currently use and the sorts of ML approaches they are using to solve the problem. If the company needs our support and are developing a product with commercial viability and have both strong technical skills and domain expertise I encourage them to apply. The application form asks questions about the product, training data, compute power requirements. If we like the idea, we invite a company to an interview and if successful onboard them with the most suitable resource. The process generally takes 3 weeks from the close of the application call to onboarding with a resource.

gathered-around-table-classic

Mani: What other benefits do startups get from being involved? Are you able to introduce them to partners who can help them run trials and give feedback on the products or services they are building?

Peter: We have a large network that we encourage all our companies to take advantage of. Digital Catapult is an innovation centre and meeting the right people at the right time is key to success for many startups. We run a range of workshops, from business growth and pitch training to deep dives to learn more about technical resources and we make sure our startups benefit from all of these. If a startup wants to put some of the new knowledge into practice we are always very keen to facilitate it!

Mani: This is a lot of information, are there any resources on your website that can help. Anything to sign up to, to keep in touch?

Peter: We have a general technology Digital Catapult newsletter (sign-up form at the bottom of the page) and an AI specific Machine Intelligence Garage newsletter (sign-up form at the bottom of the page). If a startup wants to chat about the programme, they can send me an e-mail: peter.bloomfield@digicatapult.org.uk. We announce all our calls and opportunity through our twitter account too @DigiCatapult.

Mani: Can you please touch on the specifics of what the startups will get access to and how it can benefit them?

Peter: We have three main resources available:

  • Cloud Computing vouchers, either through AWS or Google Cloud Platform

To find out the exact amounts and specifics of access, please do get in touch.

Mani: If someone needed to find out about the benchmarks between different compute resources available to the participants, who or where would they look for the information? Do you have a team that does these measurements on a daily basis?

Peter: We do our own benchmarking of the facilities available and our data engineer on the programme can advise on this, as well as point you in the right direction for more literature!

immersive-lab-entrance

Mani: I have been to the Digital Catapult HQ at 101 Euston Road, and was blown away with all the tech activities happening there, can you please share details about it with our readers

Peter: Digital Catapult was set up four years ago and is the UK’s leading innovation centre for advanced digital technologies.  We have seen a number of changes over the years but our core values of opening up markets and making businesses more competitive and productive remain. We are incredibly lucky to have some amazing facilities to help companies develop new products and services and get their products to market faster, including a nationwide network of Immersive Labs [see launch photos, photos in 2018], an LPWAN network and the new 5G Brighton Testbed.

Mani: Can you name a few startups that are currently going through your programs and the ones who have already been through it?

Peter: We currently have 25 start-ups on our programme. The reams are a range of sizes, some just 2 people, others 20+. The thing they all have in common is that they are developing some really exciting commercial products/solutions with deep learning and have an immediate need for the computational resources we offer. The full list of start-ups can be found on our cohort page on the Machine Intelligence Garage website.

Mani: I really appreciate the time you have taken to answer my questions and this has definitely helped the readers know more about what you do and how they can benefit from this great government-driven initiative.

Peter: Thank you very much for coming in. It’s great to be able to reach a wider audience and grow our community! See you soon!

15307393344_f6881df22a_k

Closing note

I was shown around a number of facilities at their centre (two floors) i.e. the Immersive lab (yes plenty of VR headsets to play with), the server area where all the HPC hardware is kept (at low room temperature), a spacious conference room where meetups are held, a small library full of interesting books and also a hot-desking area shared by both internal staff, partners and friends of Digital Catapult/Machine Intelligence Garage. Looking at the two websites I found the news and views, events and workshops and Digital Catapult | MI Garage blog sites interesting to keep track of activities in this space.

immersive-lab-man-headgear

I’m sure after reading about the conversation, you must be wondering how you could take advantage of these facilities out there meant for you and ones in your network who could benefit from it.

Readers should go to the links mentioned above to learn about this program and how they can go about taking advantage of it, or recommend it to their friends in the community who would be more suitable for it.

Please do let me know if this is helpful by dropping a line in the comments below, and I would also welcome feedback, see how you can reach me, above all please check out to the links mentioned above and also reach out to the folks behind this great initiative.

 

Containers all the way through…

In this post I will attempt to cover fundamentals of Bare Metal Systems, Virtual Systems and Container Systems. And the purpose for doing so is to learn about these systems as they stand and also the differences between them, focusing on how they execute programs in their respective environments.

Bare metal systems

Let’s think of our Bare Metal Systems as desktops and laptops we use on a daily basis (or even servers in server rooms and data-centers), and we have the following components:

  • the hardware (outer physical layer)
  • the OS platform (running inside the hardware)
  • the programs running on the OS (as processes)

Programs are stored on the hard drive in the form of executable files (a format understandable by the OS) and loaded into memory via one or more processes. Programs interact with the kernel, which forms a core part of the OS architecture and the hardware. The OS coordinate communication between hardware i.e. CPU, I/O devices, Memory, etc… and the programs.

 

Bare Metal Systems

A more detailed explanation of what programs or executables are, how programs execute and where an Operating System come into play, can be found on this Stackoverflow page [2].

Virtual systems

On the other hand Virtual Systems, with the help of Virtual System controllers like, Virtual Box or VMWare or a hypervisor [1] run an operating system on a bare metal system. These systems emulate bare-metal hardware as software abstraction(s) inside which we run the real OS platform. Such systems can be made up of the following layers, and also referred to as a Virtual Machines (VM):

  • a software abstraction of the hardware (Virtual Machine)
  • the OS platform running inside the software abstraction (guest OS)
  • one or more programs running in the guest OS (processes)

It’s like running a computer (abstracted as software) inside another computer. And the rest of the fundamentals from the Bare Metal System applies to this abstraction layer as well. When a process is created inside the Virtual System, then the host OS which runs the Virtual System might also be spawning one or more processes.

Virtual Systems

Container systems

Now looking at Container Systems we can say the following:

  • they run on top of OS platforms running inside Bare Metal Systems or Virtual Systems
  • containers which allow isolating processes and sharing the kernel between each other (such isolation from other processes and resources are possible in some OSes like say Linux, due to OS kernel features like cgroups[3] and namespaces)[4]

A container creates an OS like environment, inside which one or more programs can be executed. Each of these executions could result in a one or more processes on the host OS. Container Systems are composed of these layers:

  • hardware (accessible via kernel features)
  • the OS platform (shared kernel)
  • one or more programs running inside the container (as processes)

Container Systems

Summary

Looking at these enclosures or rounded rectangles within each other, we can already see how it is containers all the way through.

Bare Metal Systems
Virtual SystemsContainer Systems

There is an increasing number of distinctions between Bare Metal Systems, Virtual Systems and Container Systems. While Virtual Systems encapsulate the Operating System inside a thick hardware virtualisation, Container Systems do something similar but with a much thinner virtualisation layer.

There are a number of pros and cons between these systems when we look at them individually, i.e. portability, performance, resource consumption, time to recreate such systems, maintenance, et al.

Word of thanks and stay in touch

Thank you for your time, feel free to send your queries and comments to @theNeomatrix369. Big thanks to my colleague, and  our DevOps craftsman  Robert Firek from Codurance for proof-reading my post and steering me in the right direction.

Resources

(Part 3 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

In our first review, The Atlassian guide to GC tuning is an extensive post covering the methodology and things to keep in mind when tuning GC, practical examples are given and references to important resources are also made in the process. The next one How NOT to measure latency by Gil Tene, he discusses some common pitfalls encountered in measuring and characterizing latency, demonstrating and discussing some false assumptions and measurement techniques that lead to dramatically incorrect reporting results, and covers simple ways to sanity check and correct these situations.  Finally Kirk Pepperdine in his post Poorly chosen Java HotSpot Garbage Collection Flags and how to fix them! throws light on some JVM flags – he starts with some 700 flags and boils it down to merely 7 flags. Also cautions you to not just draw conclusions or to take action in a whim but consult and examine – i.e. measure don’t guess!

….read more (reblogged from the Java Advent Calendar)

(Part 2 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

This is a continuation of the previous post titled (Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al.

Without any further ado, lets get started with our next set of blogs and videos, chop…chop…! This time its Martin Thompson’s blog posts and talks. Martin’s first post on Java Garbage collection distilled basically distils the GC process and the underlying components including throwing light on a number of interesting GC flags (-XX:…). In his next talk he does his myth busting shaabang about mechanical sympathy, what people correctly believe in and the misconceptions. In the talk on performance testing, Martin takes its further and fuses Java, OS and the hardware to show how understanding of all these aspects can help write better programs.


Java Garbage Collection Distilled by Martin Thompson

There are too many flags to allow tuning the GC to achieve the throughput and latency your application requires. There’s plenty of documentation on the specifics of the bells and whistles around them but none to guide you through them.

(Part 1 of 3): Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al

I have been contemplating for a number of months about reviewing a cache of articles and videos on topics like Performance tuning, JVM, GC in Java, Mechanical Sympathy, etc… and finally took the time to do it – may be this was the point in my intellectual progress when was I required to do such a thing!

Thanks to Attila-Mihaly for giving me the opportunity to write a post for his yearly newsletter Java Advent Calendar, hence a review on various Java related topics fits the bill! The selection of videos and articles are purely random, and based on the order in which they came to my knowledge. My hidden agenda is to mainly go through them to understand and broaden my own knowledge at the same time share any insight with others along the way….read more (reblogged from the Java Advent Calendar)