sparklyr — R interface for Apache Spark

Original Post Connect to Spark from R — the sparklyr package provides a complete dplyr backend. Filter and aggregate Spark datasets then bring them into R for analysis and visualization. Orchestrate distributed machine learning from R using eitherSpark MLlib or H2O Sparkling Water. Create extensions that call the full Spark API and provide interfaces to Spark packages Advertisements

R with Jupyter Notebook in OSX El-Capitan

Jupyter Notebook is perfect tool to combine in one document, code, text and visuals. Here we will see how to set up Jupyter to use R on OS X, same steps can be used for linux & windows as well. Installing Anaconda – it is a free Python distribution (including commercial use and redistribution!). You can download it here then install as below. Installing…

Clustering

Clustering is an unsupervised learning method, meaning that clustering is not used to predict an outcome, or dependent variable. The main goal is to segment a set of observations into similar groups, based on the available data. However, although clustering is not designed to predict anything, clustering can be useful to improve the accuracy of…

Linear Regression

Linear regression – is used to determine how an outcome variable, called the dependent variable, linearly depends on a set of known variables, called the independent variables. The dependent variable is typically denoted by y and the independent variables are denoted by x1,x2,…xk, where k is the number of different independent variables. We are interested in…

Data Visualization in R

Data visualization is often useful to find hidden patterns and trends in data, to visualize and understand the results of analytical models, and to communicate analytics to the public. It is defined as a mapping of data properties to visual properties. Data properties are usually numerical or categorical, like the mean of a variable, the…

Logistic Regression

Logistic regression – Extends the idea of linear regression to cases where the dependent variable, y, only has two possible outcomes, called classes. Examples of dependent variables that could be used with logistic regression are predicting whether a new business will succeed or fail, predicting the approval or disapproval of a loan, and predicting whether a stock…

Text Analytics

Text analytics is a set of techniques that model and structure the information content of textual sources, which are frequently loosely structured and complex. The ultimate goal is to convert text into data for analysis. One popular and commonly-used text analytics technique is called “bag of words“. While fully understanding text is difficult, this approach…