Remote Web Development Job In IT And Programming

WhatsApp Web Scrape Dynamic/Updating/Changing Page: Download & sync messages + media per chat/group

Find more Web Development remote jobs posted recently Worldwide

We need a script/executable/process which automates browsing to WhatsApp Web, go to selected chats & groups, and sync down & download the full ordered sequential data-consistent message history out of the live/changing page to a saved format (database / HTML structured data files) and media files downloaded to a related file storage directory.
We need the source code of this working project clean readable and well designed code-base, properly indented & spaced, good coding practices and up to date libraries & dependencies etc.

Target Browser: Google Chrome. Optional: Also other/cross browser.

Possible technologies:
- Selenium WebDriver with Java, Python, Node.js JavaScript etc.
- Chrome Extension using HTML/CSS/JavaScript (lacking database/file-system facilities)
- Tampermonkey Userscript using JavaScript (lacking database/file-system facilities)
- Other - Any appropriate technology stack you might recommend

Data Wanted: Any & all available data that can be obtained out of the webpage.
Some data points examples: Group/chat name, message date & time, all author information; phone number and/or name when available etc., full message content with emojis & Unicode content & links etc., all media (audio, image, video, documents etc.), in reply to message references, ordinal/position in reference to previous and next message, author/group/status & quote content from referenced message, is message deleted, is message forwarded (and forwarded details), and ANY other obtainable data from the webpage.

Some requirements descriptions and known complications/challenges:
- There needs to be communication between the already downloaded persisted database & file store, and the script processing the webpage, in order to identify messages and find what has already been downloaded what is already synced down & up to date, and what still needs to be fetched from the webpage.
- Data consistency; distinctly identifying messages correctly (perhaps by hashing a digest based on message group + date & time + message content + ordinal position relative to previous/next messages + other identifying factors etc.)
- Progressively & continuously syncing history from the web whatsapp interface down to database/file store, keeping track of where the process is holding, which messages are already synced, which need updating/re-downloading etc.
- Indexing & searching/seeking into message history
- Will probably need a lot of Deferred & Asynchronous processing; waiting for media to download into browser, detect timeouts/failures, retry, saving state about what needs to be retried again in future processing
- Possibly using the group info/chat info Media panel to access & load media files
- Apparently the only way to get full message text with emojis etc. in whole is by selecting the full text content and copying to clipboard.
- Dealing with long messages collapsed Read more content
- Scrolling up/down the message history roll and waiting for messages to load in infini-scroll loading batches.
- Downloading media, waiting for media to load, retry failures, store in organized folder structure with relations saved to database/HTML data files.
- How to download video clips
- Other complications that might be discovered in the effort to sync down whatsapp per-chat history
About the recuiter
Member since May 20, 2018
Jaswir Stela
from Panevezhio, Lithuania

Open for hiringApply before - Oct 19, 2024

Work from Anywhere

40 hrs / week

Fixed Type

Remote Job

$478.93

Cost

Offer to work on this project closes in 105 days!
Are you interested in this Opportunity?

Looking for help? Checkout our video tutorial
How to search and apply for jobs

How to apply? Do you have more questions about the Job?
See frequently asked questions

Similar Projects

Ansible and Ansible Tower Expert Linux DevOps System Cloud

We need an Expert for this Job who is using Ansible from last 3-5 years for configurations management.

1. What are the Best Practices to manage different environments with Ansible using SVN?

2. What is the best way to manage Ansibl...read more

Company Directory Search + Scraping & PDF-to-Excel Conversion

Hello,

We are compiling a directory of U.S minority-owned businesses to identify potential award winners. This project is to:

1.) Efficiently search for website directories and scrape company names and information into an excel for...read more

Data Scientist, Behavior Economist, Psychometrician, Industrial Scientist for Purpose Data Product

We have created a prototype assessment and scoring methodology to attempt to quantify purpose. This was based on a brief assessment/baseline we created in a client engagement, using insights from two major studies on corporate purpose. We used the...read more

Build a Leader Scoreboard for a JS game

Build a Leader Scoreboard

Open this game: ThienTai


Build a bootstrap for mobile responsiveness

Build a LeaderBoard :
Using the MeanJS stack. Build a mongodb to record the freeform nickname entered by the user corre...read more

Scrape a website to generate a data base of contacts

There is a website where we can search by name. I basically want to extract their data base, by having someone use a scraping bot to do multiples research by name (I will provide you with a list of 3000 popular name in my region), and then extractin...read more