Remote Data Mining And Management Job In Data Science And Analytics

Filter price from order book level 2 summary data

Find more Data Mining And Management remote jobs posted recently Worldwide

Data:
- The raw data is level 2 order book snapshots and incremental updates at tick frequency (each update = new observation) for a single asset.
- To simplify the task and limit the scope, you need to work with a summarized dataset in the form of csv files with the following columns: timestamp, bidPrice_x1, bidPrice_x2, ... askPrice_x1, askPrice_x2, ..., where bidPrice_x1 = the average price at which a market sell order of size x1 would be executed if it arrived at this instant.
- The scope of this task is limited to the summarized dataset. If you believe that you could do much better if you could only calculate different features from the raw orderbook data, we could discuss it as a separate job.
- Expect to work with 10M-100M rows, 10-20 columns with possible subsampling.

Goal:
For each row output a summarizing price P_t such that P_t = E[ (best bid price + best ask price at t + dt) / 2 | data available at t]. dt = at the scale of 1-10 minutes, tbd.

For example, the simplest summary price of the orderbook would just be mid price between best bid and ask, but it misses the information content of the order book imbalance (if there is more volume on bid than on ask, the price will on average go up) and momentum/mean reversion time series dynamics. You need to take the form of the orderbook and time series into account in some basic fashion. It is not a goal to outperform the market with such prediction, but just to reasonably summarize 80% of the information content in the order book l2 dynamics that is essentially common knowledge to market participants. Obviously, you can only use past data for prediction.

Deliverable:
You should deliver a script that reads the data and outputs the summarized price for each input row as well as explain to me how it works. You can use R (preferred) or Python on a single server, no cluster solutions. Please stick to the simplest and fastest algorithms, essentially linear models only, and discuss with me if you go for anything more complicated than OLS/Kalman filter.

Ill provide access to an RStudio Server for R, tbd for Python.

About you:
You have experience working with order book level 2 and time series data or at least have a solid understanding of relevant methods. You value simplicity and dont throw all the fancy machine learning stuff at the solution just because this is cool and it makes you look more sophisticated.

I would like to hire several people for this job for different assets and exchanges. Feel free to ask questions and discuss the task and conditions.
About the recuiter
Member since May 20, 2018
Abhishek Mishra
from Kayseri, Turkey

Skills & Expertise Required

Quantitative Analysis R Python 

Open for hiringApply before - Oct 30, 2024

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$477.61

Cost

Offer to work on this project closes in 90 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Power BI expert required to assist with visualizing data from 3 different ERPs

Consolidate data from 3 ERP systems
Visualize data together
Segment data for better decision making ability

System architect for tech platform

We have a site and 3 mobile apps. Infrastructure is growing. Site is getting slower. Currently monolithic PHP...we need someone that gets current tech and can help us plan. We have programmers but we just need a roadmap.


I think this ap...read more

Reverse Engineer APIs of 1 website that can be wrapped into Lambda functions

We are building a data aggregation SDK that would allow external developers to call our APIs to programmatically fetch data from certain websites that requires login.

Essentially, we would like to create Lambda Functions that can generate p...read more

Data Scientist needed to create online courses

We are looking for Data Scientists who are willing to share their knowledge to our community by building courses on specific fields in Data Science.

Send us a message for more details !

API for importing contents of another system into our SQL database

We are looking for a developer that can help create a method for our clients to submit data to our database by sending the data via an API to our systems.

Environment:
We have a Microsoft Server 2012 environment with SQL Server 2012 (wit...read more