Basic web scraper for [login to view URL]
$250-750 USD
Pagado a la entrega
If you have previous experience with developing a script for a web scraper on [login to view URL] I would like to hear from you. Please state in your reply if you have worked with [login to view URL] before. I will only consider offer from developer that has previous experience developing against the Apify api and platform.
I need a basic web scraper that can scrape all pages on a WordPress website. It needs to find an element with a specific class or id and get the text content from that element. If that element is not found the grab the content of the body
As it will be used as part of an automated process it must entirely be operated using the CLI.
Additional requirements:
- Scrape the text content of pdf files.
- Report progress to a webhook at intervals
- Post the content in batches to a webhook
- Must be written in javascript or typescript
Specifications
===========
It takes 5 input parameters:
- starturl [required]
- contentidentifier (default: “”, type:string)
- maxpages (default: 10, zero = all pages, type:string)
- issitemap (default: false, type:string)
- batchsize (default: 25, type:integer)
crawling behavior
=============
-if the “issitemap” param is set to "true" only scrape the links on the sitemap. Otherwise, follow all links that point to the same domain as the starturl.
-Respect [login to view URL]
- If maxpages has been set to 0 we must crawl all pages. Otherwise only scrape the number of pages set by the maxpages parameter.
- I need the content of the element specified by the "contentidentifyer” parameter. For example, if
"ContentArea" is specified get the text inside the div/span that has that id or class
Suggested value format:
“[login to view URL]” e.g. “[login to view URL]”
“[login to view URL]” e.g. “[login to view URL]”
- Clean up the output and strip all HTML
Output format
============
I need the output as a JSON:
[{
“url”: “<fully qualified url1 : [login to view URL]>”,
“content”: $scrapedContent1
},{
“url”: “<[login to view URL]>”,
“content”: $scrapedContent2
}]
Webhook: content
=================
A webhook needs to be called when a page has been scrapped. Instead of calling the webhook every time a page has been scraped, the content must be sent in batches. The size of the batch is set by the “batchsize” input parameter.
Development of these webhooks are not part of this task.
Webhook: Progress report
====================
Progress reports containing statistics about the progress of the crawling process to be sent to a webhook.
This is my first scraper and am I not entirely sure about my options in this regard, but on the wish list of info I would like to receive is:
- Number of pages indexed / Total number of pages found.
- Event: CRAWLER_RUN_STARTED
- Event: CRAWLER_RUN_SUCCEEDED
- Event: CRAWLER_RUN_FAILED
- Event: CRAWLER_RUN_TIMED_OUT
- Event: CRAWLER_RUN_ABORTED
- The total cost of the task when the crawl is over.
Please note, that this is my first experience with running an actor and with Apify, and will be happy for any suggestions you might have.
Best regards
Tony
Nº del proyecto: #37124002
Sobre el proyecto
66 freelancers están ofertando un promedio de $474 por este trabajo
Hi Good evening , How are you? I just saw your job posting . I see you have been looking for someone experience with these technologies Typescript, Web Scraping and JavaScript. I believe this is some thing I can help Más
I understand you need a basic web scraper that can scrape all pages on a WordPress website. It needs to find an element with a specific class or id and get the text content from that element. If that element is not fou Más
Hello there! My name is Abhishek and I am a Full Stack Developer with over 12 years of experience in the tech industry. I specialize in the MEAN/MERN/LAMP (Laravel, Codeigniter, CakePHP) tech stack and have worked on Más
Hi there, I am the best here! Please check out my profile and see what others have to say about the work I've done related to the skills you're looking for. Hope to work together soon. Thanks!
Hi there! I am Md Ashrak, a highly skilled and experienced data entry, data collection, web scraping, Python scripts, lead generation, human translation, and WordPress specialist with over 10 years of experience. I und Más
Hi tony, I can make web scraper for apify.com. I hope you will give me a chance to work on this project. Please initiate a message for further discussion.
Hello, I am Waqas, a web developer with 3 years of experience in the field. I understand you are looking for a web scraper that can scrape all pages on a WordPress website and get the text content from that element. If Más
Hello, my name is Murad, and I am a qualified JavaScript developer with three years of freelance marketplace experience. I understand you need a basic web scraper that can scrape all pages on a WordPress website. It ne Más
Hello there, my name is Manpreet and I'm a web developer with extensive experience in the field of software development. I noticed you are looking for a basic web scraper that can scrape all pages on a WordPress websit Más
★★★ Hi Tony F.★★★ Going through your description, it seems like you might be looking for a senior web developer for your project - Basic web scraper for apify.com. As I have worked on similar projects previously, I am Más
"Ihor K was very cooperative, listened to my feedback & succesfully finished the task I gave him. I will definitely hire him again for any new projects I will have." Dear Tony F. I'm thrilled to submit my applicati Más
Dear Client, I am a full-stack developer with 6+ years of experience in developing web applications. I have a strong understanding of Apify and have developed several web scrapers using the platform. I am confident tha Más
Hello there! My name is Mst Amrin Nahar and I am a JavaScript expert, freelancer with over 10 years of experience. I understand that you are looking for a basic web scraper that can scrape all pages on a WordPress webs Más