Update: Next Generation Big Data Lab V2 in the Cloud

Recently, I presented the first version of the Big Data Lab in the cloud. Now I extended this version and kept most of the features of the previous version. However, I provide upgrades for important software components. It still runs on Amazon EMR, but with the newest Amazon AMI (including Amazon Linux). It now features Hadoop 2.4, Spark 1.1.1, R 3 and for the first time SparkR, so you can do in-memory  analytics in R by leveraging your whole Big Data cluster.

You can find the new version here.

Attention: It may not yet work in all availability zones, but has been tested successfully in Ireland.

In future blog posts, I will show how to write R scripts that distribute machine learning computation in R libraries to different nodes in your Big Data cluster by leveraging Apache Spark in-memory analytics.


Eine Antwort zu „Update: Next Generation Big Data Lab V2 in the Cloud“

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert