Learning to use Wholly GraalVM!

I'm still learning by michelangelo

Citation: credits to the feature image goes to Anne Davis  and reused under a CC license, the original image can be found on this Flickr page.

Introduction

In the post Truffle served in a Holy Graal: Graal and Truffle for polyglot language interpretation on the JVM, we got a brief introduction and a bit of deep dive into Graal, Truffle and some of the concepts around it. But no technology is fun without diving deep into its practicality, otherwise its like Theoretical Physics or Pure Maths — abstract for some, boring for others (sorry the last part was just me ranting).

In this post we will be taking a look into the GraalVM, by installing it, comparing SDK differences and looking at a some of the examples that illustrate how different languages can be compiled and run on the GraalVM, and also how they can be run in the same context and finally natively (more performant).

GraalVM is similar to any Java SDK (JDK) that we download from any vendor, except that it has JVMCI: Java-level JVM Compiler Interface support and Graal is the default JIT compiler. It can, not just execute Java code but also languages like JS, Ruby, Python and R. It can also enable building ahead-of-time (AOT) compiled executable (native images) or share library for Java programs and other supported languages. Although we won’t be going through every language but only a selected few of them.

Just to let you know, that all of the commands and actions have been performed on a Ubuntu 16.04 operating system environment (should work on the MacOSX with minor adaptations, on Windows a bit more changes would be required – happy to receive feedback with the differences, will update post with them).

Practical hands-on

We can get our hands on the GraalVM in more than one way, either build it on our own or download a pre-built version from a vendor website:

  • build on our own: some cloning and other magic (we can see later on)
  • download a ready-made JVM: OTN download site
  • hook up a custom JIT to an existing JDK with JVMCI support (we can see later on)

As we are using a Linux environment, we it would be best to download the linux (preview) version of GraalVM based on JDK8 (> 500MB file, need to Accept the license, need to be signed in on OTN or you will be taken to https://login.oracle.com/mysso/signon.jsp) and install it.

Follow the installation information on the download page after unpacking the archive, you will find a folder by the name graalvm-0.30 (at the time of the writing of this post), after executing the below command:

$ tar -xvzf graalvm-0.30-linux-amd64-jdk8.tar.gz

Eagle eyeing: compare SDKs

We will quickly check the contents of the SDK to gain familiarity, so let’s check the contents of the GraalVM SDK folder:

$ cd graalvm-0.30
$ ls

GraamVM 0.30 SDK folder contents

which looks familiar, and has similarities, when compared with the traditional Java SDK folder (i.e. JDK 1.8.0_44):

$ cd /usr/lib/jdk1.8.0_44
$ ls

JDK 1.8.0_44-folder-contents

Except we have quite a few additional artifacts to learn about, i.e. the launchers on the VM for the supported languages, like FastR, JS (GraalJS), NodeJS (GraalNodeJS), Python, Ruby and Sulong (C/C++, Fortran).

Comparing the bin  folder between the GraalVM SDK and say JDK 1.8.0_44 SDK, we can see that we have a handful of additional files in there:

graal-0.30-bin-folder

(use tools like meld or just diff to compare directories)

Similarly we can see that the jre folder has interesting differences, although semantically similar to the traditional Java SDKs. A few items that look interesting in the list are Rscript, lli and ployglot.

Now we haven’t literally compared the two SDKs to mark elements that are different or missing in one or the other, but the above gives us an idea about what is offered with the pre how to use the features it provides – well this SDK has them baked into it the examples folder.

$ tree -L 1 examples

examples-folder-content

(use the tree command – sudo apt-get tree to see the above, available on the MacOSX & Windows)

Each of the sub-folders contain examples for the respective languages supported by the GraalVM, including embed and native-image which we will also be looking at.

Exciting part: hands-on using the examples

Let’s get to the chase, but before we can execute any code and see what the examples do, we should move the graalvm-0.30 to where the other Java SDKs reside, lets say under /usr/lib/jvm/ and set an environment variable called GRAAL_HOME to point to it:

$ sudo mv -f graalvm-0.30 /usr/lib/jvm
$ export GRAAL_HOME=/usr/lib/jvm/graalvm-0.30
$ echo "export GRAAL_HOME=/usr/lib/jvm/graalvm-0.30" >> ~/.bashrc
$ cd examples

R language

Let’s pick the R and run some R scripts files:

$ cd R
$ $GRAAL_HOME/bin/Rscript --help    # to get to see the usage text

Beware we are running Rscript and not R, both can run R scripts, the later is a R REPL.

Running hello_world.Rusing Rscript:

$ $GRAAL_HOME/bin/Rscript hello_world.R
[1] "Hello world!"

JavaScript

Next we try out some Javascript:

$ cd ../js/
$ $GRAAL_HOME/bin/js --help         # to get to see the usage text

Running hello_world.js with js:

$ $GRAAL_HOME/bin/js hello_world.js
Hello world!

Embed

Now lets try something different, what if you wish to run code written in multiple languages, all residing in the same source file, on the JVM — never done before, which is what is meant by embed.

$ cd ../embed

We can do that using the org.graalvm.polyglot.context  class. Here’s a snippet of code from  HelloPolyglotWorld.java:

import org.graalvm.polyglot.*;

public class HelloPolyglotWorld {

public static void main(String[] args) throws Exception {
 System.out.println("Hello polyglot world Java!");
 Context context = Context.create();
 context.eval("js", "print('Hello polyglot world JavaScript!');");
 context.eval("ruby", "puts 'Hello polyglot world Ruby!'");
 context.eval("R", "print('Hello polyglot world R!');");
 context.eval("python", "print('Hello polyglot world Python!');");
 }
}

Compile it with the below to get a.class file created:

$ $GRAAL_HOME/bin/javac HelloPolyglotWorld.java

And run it with the below command to see how that works:

$ $GRAAL_HOME/bin/java HelloPolyglotWorld
Hello polyglot world Java!
Hello polyglot world JavaScript!
Hello polyglot world Ruby!
[1] "Hello polyglot world R!"
Hello polyglot world Python!

You might have noticed a bit of sluggishness with the execution when switching between languages and printing the “Hello polyglot world….” messages, hopefully we will learn why this happens, and maybe even be able to fix it.

Native image

The native image feature with the GraalVM SDK helps improve startup time of Java applications and give it smaller footprint. Effectively its converting byte-code that runs on the JVM (on any platform) to native code for a specific OS/platform — which is where the performance comes from. It’s using aggressive ahead-of-time (aot) optimisations to achieve good performance.

Let’s see how that works.

$ cd ../native-image

Lets take a snippet of Java code from  HelloWorld.java  in this folder:

public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World!");
}
}

Compile it into byte-code:

$ $GRAAL_HOME/bin/javac HelloWorld.java

Compile the byte-code (HelloWorld.class) into native code:

$ $GRAAL_HOME/bin/native-image HelloWorld
 classlist: 740.68 ms
 (cap): 1,042.00 ms
 setup: 1,748.77 ms
 (typeflow): 3,350.82 ms
 (objects): 1,258.85 ms
 (features): 0.99 ms
 analysis: 4,702.01 ms
 universe: 288.79 ms
 (parse): 741.91 ms
 (inline): 634.63 ms
 (compile): 6,155.80 ms
 compile: 7,847.51 ms
 image: 1,113.19 ms
 write: 241.73 ms
 [total]: 16,746.19 ms

Taking a look at the folder we can see the Hello World source and the compiled artifacts:

3.8M -rwxrwxr-x 1 xxxxx xxxxx 3.8M Dec 12 15:48 helloworld
 12K -rw-rw-r-- 1 xxxxx xxxxx     427 Dec 12  15:47 HelloWorld.class
 12K -rw-rw-r-- 1 xxxxx xxxxx     127 Dec 12  13:59 HelloWorld.java

The first file helloworld is the native binary that runs on the platform we compiled it on, using the native-image command, which can be directly executed with the help of the JVM:

$ helloworld
Hello, World!

Even though we gain performance, we might be loosing out on other features that we get running in the byte-code form on the JVM — the choice of which route to take is all a matter of what is the use-case and what is important for us.

It’s a wrap up!

That calls for a wrap up, quite a lot to read and try out on the command-line, but well worth the time to explore the interesting  GraalVM.

To sum up, we went about downloading the GraalVM from Oracle Lab’s website, unpacked it, had a look at the various folders and compared it with our traditional looking Java SDKs, noticed and noted the differences.

We further looked at the examples provided for the various Graal supported languages, and picked up a handful of features which gave us a taste of what the GraalVM can offer. While we can run our traditional Java applications on it, we now also have the opportunity to write applications that expressed in multiple supported languages in the same source file or the same project. This also gives us the ability to do seamlessly interop between the different aspects of the application written in a different language. Ability to even re-compile our existing applications for native environments (native-image) for performance and a smaller foot-print.

Feel free to share your thoughts with me on @theNeomatrix369.

Advertisements

Truffle served in a Holy Graal: Graal and Truffle for polyglot language interpretation on the JVM

03 Hotspot versus GraalVM

Reblogging from ZeroTurnaround’s Rebellab blog site

One of the most fascinating additions to Java 9 is the JVMCI: Java-Level JVM Compiler Interface, a Java based compiler interface which allows us to plug in a dynamic compiler into the JVM. One of the main inspirations for including it into Java 9 was due to project Graal — a dynamic state-of-the-art compiler written in Java.

In this post we look at the reasons Graal is such a fascinating project, its advantages, what are the general code optimization ideas, some performance comparisons, and why would you even bother with tinkering with a new compiler.

Like everyone else we were inspired by the vJUG session by Chris Seaton on Graal – it looks like a great tool and technology and so we decided to play with the technology and share it with the community.

…you can read the rest at ZeroTurnaround’s Rebellab blogs


 

In case, you are wondering what some of the ASCII-art images in one of the paragraphs is about, here’s a bit of explanation, hopefully it will clear up any doubts.

How does it actually work?

A typical flow would look like this:

02-a Program to machine code diagram (excludes expansion)
AST → Abstract Syntax Tree  (explicit data structures in memory)

We all know that a JIT is embedded inside HotSpot or the JVM. It’s old, complicated, written in C++ and assembly and is fairly hard to understand. It is a black box and there is no way to hook or link into the JIT.  All the JVM languages have to go through the same route:  

02-b Program to machine code diagram (via byte-code)

(ASM = assembly)

The flow or route when dealing with traditional compilers and VM would be:

02-c Program to machine code diagram (via JIT)
But with Graal, we get the below route or flow:

02-d Program to machine code diagram (via AST)
(notice Graal skips the steps that create byte-code by directly generating platform specific machine code)

Graal basically helps moving the control-flow from Code to the JIT bypassing the JVM (HotSpot, in our case). It means we will be running faster and more performant applications, on the JVM. These applications will not be interpreted anymore but compiled to machine code on fly or even natively.


I hope you enjoyed the read, please feel free to share any constructive feedback, so we can improve the material for the community as a whole. We learnt a lot while drafting this post and hope the same for you.

Original post by @theNeomatrix369 and  @shelajev !

My experience of learning R – from basic graphs to performance tuning

Background

R as some of you may know is a statistical and graphics programming language (see Wikipedia [1]) used by academia and recently by IT professionals of our ever growing software industry. There is a sudden demand for Data Scientists, Data Analysts and Statisticians with a background in R among other things data and development related subjects.

I have been fortunate to work with such a programming language, even though I haven’t had any prior experience working with such a programming language and moreover with Data Scientists. My interest in Mathematics and affinity for numbers drew me to learning it, and with further help of Herve Schnegg our in-house Senior Data Scientist, I was able to pick a fair bit of the subject.

 

R is a mix of a object-oriented programming, Clojure-like functional programming, Javascript-like style of writing code and a Smalltalk-like programming interface. And it offers REPL like many functional programming environments. The fundamental units of the data we manipulate are usually objects like lists, vectors, data-frames, tables, etc…
 
Initial baby-steps
 
I went through a few hours of tutoring by getting an understanding of the R environment, how to install it, and an overview of RStudio and how amazing it is! What fascinates me, is that you can load objects into memory and play with it and when you shutdown your environment your data is not cleared! Rather you can save it (into the .Rdata file) and it retains such information per project!
You are able to remove individual objects from memory, view them, modify them, and reload them from the command-line or by just executing single lines of code in your R script file (they have the obvious extension of .r) in an IDE like RStudio.
R gives developers access to a REPL (stands for Read–eval–print loop [2]) environment and thats how you are able to do the above actions seamlessly! A number of other popular languages have a similar environment i.e. Clojure, Haskell, Python, Ruby, Scala, and Smalltalk, and so forth.

More about R
The order of precedence with regards to declaring a function is important in R, you can’t just call a function unless it has been defined in the package/library you have loaded like:

library([name of library])

or included a resource using the source() function like:source(“./Utils.LoadAndVerify.r”)

or defined the function in the beginning of the script file before referring to it, at a later stage! I had to learn this by the trial-an-error-then-ask-the-experts-around-you method.

 

Contents of any object can be viewed by referring to the object at the REPL CLI, that’s kind of easy!

 

>  someObject <- “contents”
>  someObject [press enter]
[1] “contents” <==== output

 

I discovered another way to view the contents of an object especially when its a list, vector, data-frame, etc…, and is a bit cumbersome to read its output on the console. I learnt that the View() function displays the contents of the object in a tabular form in a separate floating window:

 

> View(table(someList))

 

(the object is displayed in a grid like table in a separate window, which could look like the below)

Plotting graphs from a set of numeric values contained in a list or vector in R is like doing 1..2..3…:

 > counts par(bg = "white");
 > barplot(counts, main="Car Distribution by Gears and VS",
   xlab="Number of Gears", col=c("darkblue","maroon"),
   legend = rownames(counts), beside=TRUE)

And voila, you get a nice simple looking bar graph!

Thanks to a helpful R blogger who has put together some resource for us: Using R to plot data [4].

We can do something more advance by running the below commands:

> x  y  f  z  par(bg = "white");
> persp(x,y,z,zlim=c(0,0.25), theta=50, phi=10);

…and we have the below nice looking 3D mesh (wireframe), from an angle:
Note: the par (bg=”white”) command sets the colour of the canvas for the entirety of your session.

 

Logging
I wrote my own suite of very simple logging functions that log messages to the console depending on the nature of the message, these messages can of course be piped into a text file at run-time.
log.INFO print(paste(date(), "[INFO]", message))
}

log.WARNING print(paste(date(), "[WARNING]", message))
}

log.DEBUG print(paste(date(), "[DEBUG]", message))
}

log.ERROR print(paste(date(), "[ERROR]", message))
}
Of course the above block of code could have been written like this:
MSG_TYPE_INFO <- "[INFO]"
MSG_TYPE_WARNING <- "[WARNING]"
MSG_TYPE_DEBUG <- "[DEBUG]"
MSG_TYPE_ERROR <- "[ERROR]"

log.ANY print(paste(date(), typeOfMessage, message))
}

log.INFO log.ANY(MSG_TYPE_INFO, message)
}

log.WARNING log.ANY(MSG_TYPE_WARNING, message)
}

log.DEBUG log.ANY(MSG_TYPE_DEBUG, message)
}

log.ERROR log.ANY(MSG_TYPE_ERROR, message)
}

As you will know, the way R is, it is wise to have logging functions to hand, to dump values of variables when running scripts. Just because sometimes the error messages thrown by R can be obscure, which has been my finding during my pursuits. Hence I resorted to the above functions and relieved myself from annoyances during exceptions.Later some passed me a link to an R Logging library (an implementation of log4j in R) [7].

What’s up!
At the moment I’m refactoring bits of code I wrote during the last two weeks and still have many blocks of code to go through to find suitable method functions to place them into – our purpose is to make the code more readable, scalable and maintainable.
Just now in the process of replacing the slow and verbose for-loop like constructs with their equivalent xapply() functions. By doing this we will gain in speed and compactness with regards to the lines of code.

 

R gives us a number of MapReduce like functions to play with, here’s a blog [3] that covers the topic on the xapply() functions.
 
Performance measurement and performance tuning
As R is an interpreted language, if you don’t write efficient functions, you could end up waiting a bit longer than expected, before any results are thrown back onto the console. It is not verbose and does not usually tell you what it is upto.
We spent most of our two weeks performing this action as we came across performance bottlenecks in our scripts and could do with using the xapply() like functions. Applying them improved the performance of certain tasks from several hours to a reasonable number of minutes per execution.

 

“Measure, don’t guess.” was the motto!

 

Thanks to the sequence of calls to the proc.time() function, which we used voraciously to measure performances of the different blocks of code we thought needed attention.

 

startTimer <- proc.time()
and
proc.time() – startTimer
 
This paid off at the end of the process as we were able to determine how much time it would take for the script to transform and validate the heaps of data we have been playing with.
At the end of each such iteration we saw the stats in the below format. It got us excited if it was a low number and dejected if it wasn’t to our liking:

 

   user  system elapsed
 87.085   0.694  87.877
 
We tried various methods to bring down the total elapsed time. Some of the things we did even before we came to a final resolution:
 – used for-loop to iterate through a list or vector and perform the same action repeatedly and accumulate results
  – we noticed the for-loop slowed down after a number of iterations and this was a standard pattern. To relieve that we split the for-loop into an inner and outer loop. The outer loop split the inner loop into batches of 40-50 iterations followed by a call to gc() at the end of the iteration. This wasn’t a decent solution from an algorithms or language point of view
finally we settled to refactoring the for-loop into a mapply() which looked like:
result &lt;- unlist(mapply(FUN=transposeColumnAsRow, rangeOfIndices, SIMPLIFY=TRUE))
 
The last action gave us a better grip over the performance and we were confident that if we had to run all the data we had through the script, we would be able to finish transposing it within several hours as opposed to a few days, previously.
Here’s the equation we used to benchmark our functions each time we improved it. It was more to find out for us if we would be able to meet our goals. If the action was acceptable, otherwise we needed to investigate further to find a better method:

 

nm = (ns / nr) * tnr / nsm

 

 nm – no. of minutes it would take to process the whole raw file
 ns – no. of seconds taken to process the batch of records
 nr – total number of records process in the batch
tnr – grand total of the number of records in raw file
nsm – number of seconds in a minute

 

The method we settled for gave us the below results, which was a great benchmark based on processing a sample of 100 records, and when extrapolated on 11200+* records gave the below – which was pretty acceptable at the time:

 

 9.315 / 100 * 11250 / 60 = 17.465 minutes per raw data file

* – each row was made up of 1300+ columns which added to the processing time


We had about 24 files in total to process, which compute to

17.465 * 24 / 60 = 6.986 hours if run one file per session

The tasks of processing each file was split into 3 to 4 sessions processing 1000 records per session.

But it wasn’t as easy as said, we had a number of sessions running doing the above on different pieces of raw data, but never got to committing the data into the database and wondered why? We thought it was hardware/software limitations on our systems. But after further investigations and experimentations found out that no system can handle writing mega-tons of data from memory into the database system without creating giga-tons of swap files. And these swap files are a catch-22, because now the OS needs resources to manage its own resources so our resource requirements would take a back seat!

After a couple discussions, and trials we finally decided to write data back into the database system, in smaller blocks at a time, which means we can still have multiple sessions running in the background and have each one of them write smaller blocks of data into the database.
Everyone is happy as processes can handle smaller blocks much better than bigger blocks – didn’t we already know this, maybe we re-learnt it by facing a bottleneck?   Our script learnt from it as well and got modified to be able to accept and handle processing smaller blocks of data by splitting the processes into smaller batches of records per execution.$ RScript IncrementalLoad.r [filename] [starting record no.] [ending record no.]

The verification script also imitated the same and elected to be run in batches:$ RScript VerifyData.r [filename] [starting record no.] [ending record no.]

Once data had been transformed and written to a database, we wrote a script to validate the data written into the database, we chose Postgres as a trial, and found it was a pretty good database system with an intuitive SQL language.

The verification process was run in the same manner in parallel which took similar amount of time, so at the end of the 7th hour we had both the data written into the database and verified.

 

R can write to such a database system easily. We were further helped with the primary, simple and compound indices that we created to facilitate the searching and selecting processes that our SQL statements would make it do. Postgres also has an efficient caching mechanism, which helps further speed things up.

 

Tweaking the R environment to get efficiency out of it
What I didn’t mention was that before we returned to the R script to tweak it and improve its performance, we thought it was the environment and the way R was, that made it slow – so we wanted to speed up our scripts using the below methods to get the maximum out of the R environment:
  • JIT compiling R scripts – thinking its not slow when interpreted anymore
  • Converting R scripts into C/C++ code and compiling and running it instead
  • Running R scripts using parallel processing (need some library for it)
  • Learning how to use GPUs via R to get that extra performance (need some library for it)
  • Investigating other methods of High Performance Computing in R
We have parked these ideas for now, but it will be a great experience to be able to explore them at a later date.
But once again it was techniques over technology that made our day. Rory Gibson, rightly said “Its not surprising to know how game developers produce some of the best pieces of work under restricted environments”. Such situations are a good nudge to everyone especially developers when faced with performance bottleneck – look at your code not your machine first!

 

At the end of it all, it feels we did what Hadoop or Cloudera would do to our jobs – split, slice, execute, verify, put together and bring back the results at an efficient speed.

 

Hurdles
 
The time I spend learning and applying R, I had to get familiar with its unique or rather say different from other programming language syntax. Like the use of the <- (arrow sign or indirection operator) instead of the usual = (equal to sign). How you point the arrow makes a difference in R, instead of assigning a value to a variable or function you might end up doing something else if you are not careful.

 

You need to define your functions at the top first, otherwise you can’t refer to it. And all entities are case-sensitive, please pay careful attention or else you will only be notified when you least expect it and in the middle of an execution of a block – remember its an interpreted language, no compile time warnings / error messages are available.
There was one more hurdle which put the spanners at work for us – we bumped into an encoding/decoding issue with reading data from the raw file. The ESS plugin [5] in Emacs was reading the data literally at a stage and not evaluating the escape codes. When we switched to RStudio or even the R repl, we immediately became free from the issue – this was also because both I and Herve were using different development environments. He used Emacs to develop in R while I used RStudio. Why this was happening is not known to us, we think there might be a bug in the plugin – at this stage its still a speculation, but more importantly we don’t have to investigate the issue anymore.Our raw file was written using the application called SPSS which writes data in a proprietary format. Such files can be read via a few ways, and using R is one way to achieve that. There is also a Java library [6] that facilitates reading such files, but remains unexplored at this time.
Test driven development in R
 
This is where I have still been hovering around with regards to R, I came across two libraries that enables writing unit tests in R, i.e. RUnit and svUnit. See below in the External Resources section for a number of links I have put together while searching for TDD methodologies in R.
It still needs to be investigated further but a promising start – since test-first driven development is a great way to start working on any piece of problem in any programming language of choice.

 

 

Refactoring
Another action which has been a continuous process since the start. We have applied it to generalise, and make the code base more compact and manageable.

 

Move away common function calls into another .r file and called it into our main script using the source() function. Make our work more maintainable and re-usable – basically keep our code-base clean and tidy.

 

I learnt that refactoring is a continuous effort – its a journey not a destination.

 

What we took away…
I and Herve both took away a lot of learning both technical and non-technical from the whole process  – pair-programming and pair-investigation of a problem space, as the old adage goes “Two minds, are better than one.” Also reveals another reason why pair-programming is encouraged as part of a development process.

 

One important point: we learnt that when we started working on this project, slicing it into simple smaller atomic chunks when solving a problem was effective and efficient, and learnt the hard way. Also when we had a solution to apply to a dataset,  we had already decided to only apply any experimental solution to a smaller subset of the dataset first, verify the results and then scale it incrementally till the dataset was exhausted. Both these working methods came to our rescue and reduced the combination and permutations of trial-and-error!
We both exchanged ideas that we were new to and very well incorporated many of them into our work methods and was able come up with a fine, and a re-usable solution.
I have taken this project further by documenting the work, continuing with refactoring the script files, writing this blog post, and tidying up the project space as a whole.This blog comes about as a documentation of our trial to check the viability of different platforms that could serve us as an ETL (Extract, Transform, Load) – of which we have made good use of RWhether is brilliant at it, or another tools serves betters is debatable. R stands good at what it does, and it does it well – but can be used to do light/medium weight ETL work.So what would be the next tool or platform of choice for our next ETL project. We can’t tell which one is better till we have tried a few and benchmarked them against their pro-s and con-s.


Thanks

Herve Schnegg – for a good partnership during our R session the last couple of weeks, and all the input and learning.

Rory Gibson – for lending his reviewer eyes, for reviewing our R work I and Herve did and also for reviewing this post.

External resources
This blog has also been published on the web’s popular R blogging site: http://www.R-bloggers.com.
During my quest, while learning and applying R, I came across the below links that could come useful to anyone who is interested in furthering their knowledge.
JIT for R
R to CPP
Parallels in R
High Performance Computing using R
GPU programming with R
Test driven development in R
Other useful topics
PSPP

Read more….