Big Data – What is next? OLTP, OLAP, Predictive Analytics, Sampling and Probabilistic Databases

Big Data has matured over the last years and is becoming more and more a standard technology used in various industries. Coming from established concepts, such as OLAP or OLTP, in context of Big Data, I go in this blog post beyond them describing what is needed for next generation applications, such as autonomous cars, industry… Big Data – What is next? OLTP, OLAP, Predictive Analytics, Sampling and Probabilistic Databases weiterlesen

Big Data Lab in the Cloud with Hadoop+Spark+R+Python

This is an update of the second big data lab for the cloud. Similar to previous versions, this document described how you can create a Big Data Lab in the cloud on Amazon EMR. Besides some major upgrades to the newest Amazon Hadoop AMI (3.6.0) Spark (1.3.0) and R, it includes now also the possibility… Big Data Lab in the Cloud with Hadoop+Spark+R+Python weiterlesen

Master Data Management and the Internet of Things

Master Data Management (MDM) has matured and grown significantly over the last years. The main motivation for master data management is to have a complete and accurate view on master data objects in your organization. Master data objects describe key assets, such as machines or customers, generating value for your organization. Hence, MDM fosters processes… Master Data Management and the Internet of Things weiterlesen

Update: Next Generation Big Data Lab V2 in the Cloud

Recently, I presented the first version of the Big Data Lab in the cloud. Now I extended this version and kept most of the features of the previous version. However, I provide upgrades for important software components. It still runs on Amazon EMR, but with the newest Amazon AMI (including Amazon Linux). It now features… Update: Next Generation Big Data Lab V2 in the Cloud weiterlesen

Example projects for using various NoSQL and Big Data technologies

Recently, I published on github.com several example Java projects for using various NoSQL technologies: cassandra-tutorial : Apache Cassandra tutorial (Column-oriented database) mongodb-tutorial : Mongo DB tutorial (Document database) neo4j-tutorial : Neo4J (Graph Database) redis-tutorial : Redis (Key/Value Store) solr-tutorial : Apache SolrCloud (Search technology) Other example Java projects aim at standardized big data processing platforms:… Example projects for using various NoSQL and Big Data technologies weiterlesen

The Lambda Architecture for Big Data in your Enterprise

I will present in this blog post the Lambda architecture for Big Data. This architecture is about integrating historical Big Data with “live” streaming Big Data. Afterwards, the concept of a large data lake in your enterprise or amongst enterprises in a B2B scenario is explained. This data lake – based on the lambda architecture… The Lambda Architecture for Big Data in your Enterprise weiterlesen

Creating a Big Data lab in the Cloud using Amazon EMR

This first blog post is about creating your own Big Data lab in the Cloud using Amazon EMR. Follow my instructions here. These instructions allow you within 15 minutes the following: You can use the analytics language R in a browser to access the full functionality of Hadoop/Spark, Hive/Shark (data warehouse), Rhipe (MapReduce for R),… Creating a Big Data lab in the Cloud using Amazon EMR weiterlesen

DevOps for your business? – About Uniting Development and Operations

DevOps has become in recent years a term for a new paradigm of integrating and managing development as well as operations of software within and cross organizations. I will describe in this blog entry what DevOps is and relate it to existing methodologies, such as agile development, and organizational structures. Basically, DevOps is a broad… DevOps for your business? – About Uniting Development and Operations weiterlesen

Big Data: Bring Computation to Data

Big Data is the topic of the coming years. Even today large Internet companies store exabytes of data and their revenue model is based on selling products as well as services around this data. Consequently, they need to process data using advanced statistical methods, such as machine learning. Hence, they need to think about how… Big Data: Bring Computation to Data weiterlesen