In the last article of this series, we had discussed multivariate linear regression model. Fernando creates a model that estimates the price of the car based on five input parameters. Fernando indeed has a better model. Yet, he wanted to select the best set of variables for input. This article will elaborate on model selection […]

# Data Science Simplified Part 2: Key Concepts of Statistical Learning

In the first article of this series, I had touched upon key concepts and processes of Data Science. In this article, I will dive in a bit deeper. First, I will define what is Statistical learning. Then, we will dive into key concepts in Statistical learning. Believe me; it is simple. As per Wikipedia, Statistical […]

# Data Science Simplified Part 5: Multivariate Regression Models

In the last article of this series, we discussed the story of Fernando. A data scientist who wants to buy a car. He uses Simple Linear Regression model to estimate the price of the car. The regression model created by Fernando predicts price based on the engine size. One dependent variable predicted using one independent […]

# Data Science Simplified Part 4: Simple Linear Regression Models

In the previous posts of this series, we discussed the concepts of statistical learning and hypothesis testing. In this article, we dive into linear regression models. Before we dive in, let us recall some important aspects of statistical learning. Independent and Dependent variables: In the context of Statistical learning, there are two types of data: […]

# Data Science Simplified Part 3: Hypothesis Testing

Edward Teller, the famous Hungarian-American physicist, once quoted: “A fact is a simple statement that everyone believes. It is innocent, unless found guilty. A hypothesis is a novel suggestion that no one wants to believe. It is guilty, until found effective.” Application of hypothesis testing is predominant in Data Science. It is imperative to simplify […]

# Data Science Simplified Part 1: Principles and Process

In 2006, Clive Humbly, UK Mathematician, and architect of Tesco’s Clubcard coined the phrase “Data is the new oil. He said the following: ”Data is the new oil. It’s valuable, but if unrefined it cannot be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable […]

# Demystifying Data Lake Architecture

According to Gartner, 80% of successful CDOs will have value creation or revenue generation as their Number 1 priority through 2021. To create the maximum value out the organization’s data landscape, traditional decision support system architecture are no longer adequate. New architectural patterns need to be developed to harness the power of data. To fully […]