Data Scientist
Job Description
Data Collection:
- Develop a model to Collect data from data sources like databases (MySQL, Oracle), web API’s, Hadoop, and cloud data storages (Azure, AWS) using SQL and Data API’s
- ETL(Extract-Transform-Load) & data modeling of the data
Data Preparation:
- Data cleaning and Imputing missing data in dataset using Mean Imputation, Single imputation, stochastic imputation techniques
- Data transformation & manipulation using R packages (Dplyr, reshape2, Tidyr, lubridate) and Python (Scipy, Numpy, Mlpy, Theano)
Data Analysis:
- Exploratory analysis of data to provide meaningful insights of the data using plyr, Data.tables, Pandas, statmodels, Pytables
- Perform statistical analysis to understand the patterns and distributions in the data using WEKA and RATTLE
Predictive Models development:
- Perform feature engineering using SMAC method to determine important features in the dataset to perform predictive modelling
- Analyze the data and design the machine learning models preliminarily in Rapidminer, Azure ML, Rattle, WEKA
- Perform supervised machine learning techniques – linear regression, decision trees, random forests, neural network, support vector machines using e1071, rpart, nnet, caret, glmnet, rpart, scikit-learn, milk
- Implement unsupervised learning techniques – k-means clustering, hierarchical clustering, Hidden Markov models using kmeans, hclust, scikit-learn, pattern
- Developing front end user interfaces, dashboards using Django, shiny, tableau
Cross Validation & Testing:
- Testing the application in production environment (Shinyapps.io, Docker) using cross validation data and determining the efficiency of the prediction models using statistical methods.
Data Visualization & Deployment:
- Reporting the results in Tableau, shiny, D3, ggplot2 based dashboards
- Deploying the R, Python based applications in Shiny Server, cloud infrastructure (AWS, Azure)
Education:
- Bachelor degree in CS or Engineering
Please submit your resume to: hr@devrabbit.com