Apache Drill with ODI 12C

Here we will see drill implementation with ODI using simple joins between different sources. Prerequisites: Little bit familiarity with hadoop ecosystem. For more about drill refer docs. You have a virtual box and cdh or oracle bigdatalite appliance is already imported & running. Hadoop, Hive services are up and running. Drill is an Apache opensource SQL query…

ODI with GIT

ODI 12.2 introduced the use of Subversion as an integrated VCS tool within ODI. To setup GIT access in ODI go to the team menu use ‘Switch Versioning Application’ to select GIT then in the ‘Settings’ create connection Avoid selecting the Auto Version option checkbox. Selecting this feature will commit and push a new version…

sparklyr — R interface for Apache Spark

Original Post Connect to Spark from R — the sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Orchestrate distributed machine learning from R using eitherSpark MLlib or H2O Sparkling Water. Create extensions that call the full Spark API and provide interfaces to Spark packages

Toree (Spark Kernel) in OSX El-Capitan

Apache Spark is topping the charts as a reference for Big Data, Advanced Analytics and “fast engine for large-scale computing”. In an earlier post, we saw how to use PySpark leveraging Jupyter notebook interactive interface. Here we will see how to use Apache Toree multi-interpreter and use Spark-Kernel, SparkR and and SparkQL as well. The Github docs for Toree are still in incubator mode & wip.…