Remote Data Mining And Management Job In Data Science And Analytics

Data Cleaning with Trifacta and R

Find more Data Mining And Management remote jobs posted recently Worldwide

I have a continuing flow of data that is extracted from school websites that needs to be checked and validated before it is made available for an R analytics platform.

I am looking at using a combination of R (we already have quite a lot of code) and Trifacta. The data sets are small but they need to be joined together very accurately. Often the data contains errors and incomplete data for linking across sources. We either access the required data from previous data that has been ingested or ask schools for the additional data.

The first task in the process is to identify all issues of validity and completeness in each data set, followed by implementing a strategy for to fix any issues.

I am seeking a consultant who is familiar with Trifacta and/or R to build a strategy that targets each data source with a series of analyses that locate the issues in the data that is drawn from that source. In total there could be up to 100 sources for which we need to develop recipes in this cleaning and validation stage.

We want to automate the process as much as possible, by adding additional rules/procedures to each recipe until it contains all the steps required for the data that comes from each specific source.
About the recuiter
Member since May 20, 2018
Sunil Raj
from Maramures, Romania

Skills & Expertise Required

R 

Open for hiringApply before - Dec 12, 2024

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$26.68

Cost

Offer to work on this project closes in 89 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Interview Preparation help for Python, R and Go Language needed

I need to work on interview preparation questions on R, Python and Go language.

Need a statistician for a new clinical trial proposal

Looking for a statistician (preferably with prior experience in the medical/scientific research field) to work on creating data charts and graphs using employer-provided data. Clinical trial proposal due in 10 days.

Convert R functions to python

Convert approximately -1200 lines R function(s) to python
Urgent requirement

Analyze Google BigQuery data using R and GCE cluster

We are looking for an expert in Google Cloud Platform and R to advise and build a framework for analyzing large public BigQuery data tables (and storing the results in GCP). The goal will be to set up a reusable GCP environment that will scale flexib...read more

Webscrape Korean Trade Data (Data processing and mining)

This project requires an outstanding understanding of web-scraping techniques and tools. A candidate with some knowledge of Korean is preferred (not required).

This should be a faily quick project. I am happy to discuss more details and com...read more