Remote Data Mining And Management Job In Data Science And Analytics

Assignment maker needed to complete INF30030 – Business Analytics assignment using R-studio

Find more Data Mining And Management remote jobs posted recently Worldwide

- Assignment name: Business Analytics Project Report for predicting employee absenteeism
- Assignment type: Group Assessment (4 members per group. All members of a group have to be from the same tutorial)
- Due date: one month
Overview
From time to time, employees miss work. Expecting employees to come in every day throughout the year is likely unrealistic. But when employees make a habit of missing work unexpectedly, their absences can set your business back. Learn how to calculate your businesss absenteeism rate to reduce absenteeism in the workplace.
Your task
In this assignment, your task is to design a predictive model using clustering and/or classification to predict the likelihood of an employee being absent from work. In particular, we ask you to apply the tools and techniques that can help you to predict employees with high likelihood of absenteeism. The final deliverable of your assignment task should be a report containing the following sections:
- Defining Business Objectives
The project report should start with the description of well-defined business objective. The model is supposed to address a business question. Clearly stating that objective will allow you to define the scope of your project and will provide you with the exact test to measure its success.
Swinburne University of Technology

- Exploring data
Once you have addressed missing values and duplicate data problem you will need to explore inherent relationships between the different variables. The focus variable for this study is the Result column (since you are asked to predict it). So, this section should show your efforts to identity from the remaining columns in the dataset which are likely to have high predictive power on the Result column. You may use both basic statistical analyses such as correlations and present them as visual graphs or tables (raw data).
- Preparing Data
Youll use historical data to train your model. Data may contain duplicate records and outliers; depending on the analysis and the business objective, you decide whether to keep or remove them. Also, the data could have missing values, may need to undergo some transformation, and may be used to generate derived attributes that have more predictive power for your objective. Overall, the quality of the data indicates the quality of the model. You need to provide a data dictionary of all data items used in your analysis and their justification to be included in your model.
- Sampling Your Data
After preparing the data, the next step is Data Sampling. The data needs to be split into two sets: training and test datasets. While splitting, consider the % split between training and test data - Its always good to have more training data than test data (Rule of thumb - 70% training and 30% test data). Also make sure that the splitting process produces a stratified sample rather than a pure random sample. You need to build the model using the training dataset and the Test data set should be used to verify the accuracy of the models output. Doing so is absolutely crucial. Otherwise you run the risk of overfitting your model training the model with a limited dataset, to the point that it picks all the characteristics (both the signal and the noise) that are only true for that particular dataset.
- Building the Model
Sometimes the data or the business objectives lend themselves to a specific algorithm or model. Other times the best approach is not so clear-cut. As you explore the data, run as many algorithms as you can. Make sure you use techniques such as cross validation and ensembles as well to see if your modelling improves.
- Evaluating the Model
Each model iteration has to be evaluated and improved upon. To do the comparison models need to be evaluated based on model metrics such as confusion matrix, accuracy, precision, and recall. The final model should be the most optimized model based on the model metrics. Finally, you have to be smart how to present your results to the business stakeholders in an understandable and convincing way (such as reports, charts and/or dashboard) so they adopt your model.
Swinburne University of Technology

Datasets
To assist you with your assignment task, you are provided with a dataset that would help you to build a model. At step 4 you should split your data into training and test. The train.csv should be used to train your model whereas the test.csv should be used to evaluate/test your prediction model.
Deliverables
Submit a softcopy of the project report including the six phases of model building (as mentioned above), the R Code, the reports/charts and any other relevant inputs. You would also be required to present your project after the submission of your assignment. See the marking rubric for more details.
About the recuiter
Member since Mar 14, 2020
Gunada Made
from Iasi, Romania

Skills & Expertise Required

Data Science & Analytics Data Mining & Management 

Open for hiringApply before - Nov 19, 2024

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$47.91

Cost

Offer to work on this project closes in 138 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Data pool from API to CSV for Power BI use

I have an API with multiple URLs for different data and need a tool to pull the data out daily or on-demand. I assume these will be to several CSV files where the new data is added to the bottom of the CSV file. I will then build multiple reports in...read more

Set up a marketing dashboard using SUPERMETRICS

I would like help setting up a marketing dashboard in google sheets for a content website.

I want data automatically pulled into this sheet on a monthly basis. E.g.
- traffic
- earnings (from affiliate programs, such as amazon)
...read more

Haskell - Reactive Streams for Big data & Systems programming

We are looking for engineers with extensive experience with Reactive Stream-based programming to build tooling for a stream library in Haskell.

We are building Realtime data processing systems using Haskell Streamly. Streamly is a high-perf...read more