Data Scientist

Job Description
Data Collection:
  • Develop a model to Collect data from data sources like databases (MySQL, Oracle), web API’s, Hadoop, and cloud data storages (Azure, AWS) using SQL and Data API’s
  • ETL(Extract-Transform-Load) & data modeling of the data
Data Preparation:
  • Data cleaning and Imputing missing data in dataset using Mean Imputation, Single imputation, stochastic imputation techniques
  • Data transformation & manipulation using R packages (Dplyr, reshape2, Tidyr, lubridate) and Python (Scipy, Numpy, Mlpy, Theano)
Data Analysis:
  • Exploratory analysis of data to provide meaningful insights of the data using plyr, Data.tables, Pandas, statmodels, Pytables
  • Perform statistical analysis to understand the patterns and distributions in the data using WEKA and RATTLE
Predictive Models development:
  • Perform feature engineering using SMAC method to determine important features in the dataset to perform predictive modelling
  • Analyze the data and design the machine learning models preliminarily in Rapidminer, Azure ML, Rattle, WEKA
  • Perform supervised machine learning techniques – linear regression, decision trees, random forests, neural network, support vector machines using e1071, rpart, nnet, caret, glmnet, rpart, scikit-learn, milk
  • Implement unsupervised learning techniques – k-means clustering, hierarchical clustering, Hidden Markov models using kmeans, hclust, scikit-learn, pattern
  • Developing front end user interfaces, dashboards using Django, shiny, tableau
Cross Validation & Testing:
  • Testing the application in production environment (, Docker) using cross validation data and determining the efficiency of the prediction models using statistical methods.
Data Visualization & Deployment:
  • Reporting the results in Tableau, shiny, D3, ggplot2 based dashboards
  • Deploying the R, Python based applications in Shiny Server, cloud infrastructure (AWS, Azure)
  • Bachelor degree in CS or Engineering

Please submit your resume to: