Remote Web Development Job In IT And Programming

webscraping english conversations

Find more Web Development remote jobs posted recently Worldwide

I need someone to web-scrape some English conversations

Im building a chatbot for learning English and want someone to scrape a bunch of English conversations from various websites to use as training material.
Im looking for short and simple conversations

There are two parts to the task
- some googling to find basic relevant conversations
- scraping code for different sites


Most of these sites are pretty simple plain text services. Here are some example sites, but there are hundreds of resources.
I would want the scraped results in a TSV or CSV format:

convoId | line | url | topic | who | text

convoId - an ID for each conversation so we can sort things later
line - simple increment count for each line in that conversation
url - place it was from for attribution later
topic - please try to get a topic from the page. if this is a LOT more work maybe not needed
who - usually the conversations have role playing A: xxx, B: replies
text - scraped line of text

You can use NodeJS or Python.

Let me know what experience you have in scraping, although this should not be a challenging scraping task - most of these are amateur sites with no Logins or other blockers.

If youre trying to improve your English, this also might be an interesting project!

If youre into machine learning, Ive also looked at the various online corpus for dialog training, but havent found anything great yet.
These datasets dont work for basic language learning conversations.

Id like to start with a small sample task, but then manage this as an on-going project with some regular work each month as we refine the idea. There will be on-going cleaning up of the dataset for training etc.

Respond to me with some info on what kind of scraping tasks youve done before and how many sites you think you can cover for the initial budget Ive proposed.
About the recuiter
Member since May 20, 2018
Naresh Yadav
from New York, United States

Skills & Expertise Required

Web Scraping Node.js Scrapy Beauty Python 

Open for hiringApply before - Nov 22, 2024

Work from Anywhere

40 hrs / week

Hourly Type

Remote Job

$13.34

Cost

Offer to work on this project closes in 90 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Looking for developer who's worked with Youtube API!

I need Youtube API key. I want to find a developer whos worked with Youtube API before and who can sell me a Youtube API key with 1million usage quota per day (which used to be the base, before they toned it lower) or higher. If you have 50million u...read more

Need to create machine learning model for ct scanned images of brain tumor

First you must process the CT scanned image and then take tumor measurements for machine learning features and make a csv of it and then apply k-means algorithm to classify each scan into 1 cluster.

Librosa freelancer

We have some python librosa scripts that need to be tuned, looking for a freelancer with librosa experience to help with this

Need help with EC2 instance + firewall. Tutorial/Lesson wanted

Hello!

Im running a side project and need some quick help. Im learning about the world of AWS and currently have my ec2 set up running a cron job with a python script on a daily basis.

However, my python script wont connect to an...read more

Algorithm Developer/ Mathematician

The ideal candidate will have an active interest in applying math/statistics/physics/engineering concepts to solve multi-disciplinary problems. The candidate should be familiar with improving/optimizing/tuning existing algorithms as well as developme...read more