- multi threading/multiprocessing
- rotating IPs (I will provide proxies)
- working with virtual machines (google compute engine)
- scraping using selenium (requests or urllib are very welcome if You can handle javaScript)
- working with database (expecting more than 200GB at end, speed is important)
Project is about scraping one website for page sources and inserting same into database. Since there are few million queries, I want to run it on virtual machine or even on few of them. Except You can handle javaScript with requests or urllib, than I can run it localy. Everything is about total time and getting at least 90% of results from website. Website is using a third-party anti web-scraping service Distil Networks (https://www.distilnetworks.com/). You must be able to handle it!
I already have code from previous developer but it is not working like I expected. You can modify this code or create Your own.
Write apple in first line when You apply so I know You read full post. Reply without this word will be ignored.
I need all buildings and their units from this page https://streeteasy.com/buildings/nyc There are four tabs for units, active listings , past sales, past rentals and all units. Tab all units doesnt have all units, so You must check in all tabs (each tab will require click() with selenium. otherwise data will not be inside page source)
I need four scripts:
1) collect_building_urls
- first script should collect all building urls from website and insert them into database without duplicates
2) collect_building_page_sources
- second script should collect page source for each building that is not already done and insert them into database
3) parse_unit_urls
- third script should parse each building page source that is not already done, find all unit links and insert them into database, without duplicates
4) collect_unit_page_sources
- fourth script should collect page source for each unit that is not already done and insert them into database
If You are interested for this project, Youll need to create four scripts, set them on virtual machine, test and make able to finish job.
Finding page sources and urls is very easy, few line of codes, but there are a lot of checking to make sure script is on right page and all content is loaded.
There must be log about each thread/process
I need this to be finished in next few days.
I already waste a lot of my time, so please apply only if You are able to finish this on time and let me know Your price for this project.
Conversation via skype will be required and also sharing screen if needed.","employmentType":["FULL_TIME","PART_TIME","CONTRACTOR","TEMPORARY","PER_DIEM"],"jobLocationType":"TELECOMMUTE","hiringOrganization":{"@type":"Organization","name":"Toogit","sameAs":"https://www.toogit.com/","logo":"https://www.toogit.com/images/toogit_logo_initial.png"},"identifier":{"@type":"PropertyValue","name":"Toogit","value":355087},"skills":["Multithreaded Programming","Python"],"applicantLocationRequirements":[{"@type":"Country","name":"IN"},{"@type":"Country","name":"Canada"},{"@type":"Country","name":"USA"},{"@type":"Country","name":"Germany"},{"@type":"Country","name":"Pakistan"},{"@type":"Country","name":"Philippines"},{"@type":"Country","name":"Indonesia"},{"@type":"Country","name":"Sri Lanka"},{"@type":"Country","name":"Nigeria"},{"@type":"Country","name":"China"},{"@type":"Country","name":"Russia"},{"@type":"Country","name":"Bangladesh"}],"validThrough":"2024-09-23T18:37:42+05:30","url":"https://www.toogit.com/freelance-jobs/MzU1MDg3"}
Remote Data Mining And Management Job In Data Science And Analytics
Find more Data Mining And Management remote jobs posted recently Worldwide
Work from Anywhere
40 hrs / weekHourly Type
Remote Job$143.85
Cost Looking for help? Checkout our video tutorial
How to search and apply for jobs
How to apply? Do you have more questions about the Job?
See frequently asked questions
We are looking for subject matter experts to review our Big Data Engineer interview screen tests: Spark-Python and related topics. These questions will be used as interview questions in a hiring process and we need to be sure that the problem stateme...read more
Looking for an experienced Python developer with functional knowledge in stock trading. He/she should be able to
- Understand the basics of stock trading / Equities
- Data mining, Data science experience.
Well be using Quantopian to...read more
Were looking for an expert Computer vision developer for long term who is comfortable on working on both Server (testing models), and mobile (implementing via TFlite and CoreML).
** Only apply if youre experienced in TensorFlow Lite
**...read more
I know very little about programming and want to learn programming enough that I understand and can operate on my own.
This will be an ongoing project to learn Programming:
I would like a plan created with assignments.
One on on...read more
Parse data from 6 PDFs and 3 Excel sheets into a specific format in Excel.
I will share all data files with you, once you have shortlisted for this task.
Those file shows what the final columns should be, examples of entries and data that ne...read more