Schlagwort: R
-
Collaborative Data Science: About Storing, Reusing, Composing and Deploying Machine Learning Models
Why is this important? Machine Learning has re-emerged in recent years as new Big Data platforms provide means to use them with more data, make them more complex as well as allowing combining several models to make an even more intelligent predictive/prescriptive analysis. This requires storing as well as exchaning machine learning models to enable…
-
Big Data Lab in the Cloud with Hadoop+Spark+R+Python
This is an update of the second big data lab for the cloud. Similar to previous versions, this document described how you can create a Big Data Lab in the cloud on Amazon EMR. Besides some major upgrades to the newest Amazon Hadoop AMI (3.6.0) Spark (1.3.0) and R, it includes now also the possibility…
-
Update: Next Generation Big Data Lab V2 in the Cloud
Recently, I presented the first version of the Big Data Lab in the cloud. Now I extended this version and kept most of the features of the previous version. However, I provide upgrades for important software components. It still runs on Amazon EMR, but with the newest Amazon AMI (including Amazon Linux). It now features…
-
Creating a Big Data lab in the Cloud using Amazon EMR
This first blog post is about creating your own Big Data lab in the Cloud using Amazon EMR. Follow my instructions here. These instructions allow you within 15 minutes the following: You can use the analytics language R in a browser to access the full functionality of Hadoop/Spark, Hive/Shark (data warehouse), Rhipe (MapReduce for R),…
-
Big Data: Bring Computation to Data
Big Data is the topic of the coming years. Even today large Internet companies store exabytes of data and their revenue model is based on selling products as well as services around this data. Consequently, they need to process data using advanced statistical methods, such as machine learning. Hence, they need to think about how…