Application to run automated deep search of targeted domains
$750-1500 USD
Pagado a la entrega
OVERVIEW
Goal of application:
A Windows 7 through 10 native application that allows the user to run scheduled searches of targeted domains, including linked documents, for a series of keywords and phrases with a deeper search than is provided by Google Alerts, then publish any new results to an RSS feed or send to an email address. The application will have a simple user interface that can be used by anyone with average computer skills. The application will run in the background with an icon in the system tray for easy access.
I have detailed my anticipated approach, function and interface below. Other approaches are welcome, but submitter should justify their approach and demonstrate the functionality of their approach in both ease of use and ability to return search results.
Anticipated solution:
Use Scrapy to scrap domains and pyPDF to search through linked PDFs.
USER INTERFACE
Enter Domains:
Ability to input at least 10,000 domains, Can be a simple text box with each domain separated on a different line.
Search Words & Phrases:
I want to be able to input as many as 30 individual keywords or phrases that will be searched for on these domains.
Activity Indicator:
I want a button to turn the scraper on/off, along with a dialogue that indicates that the scraper is turned on, turned off, or actively running.
Run-time Selector:
The ability to input a start and finish time for the scraper to run between. Scraper will pause at end time allowed and resume where left off the next day at the prescribed start time.
Output selector:
The ability to select between publishing content as an RSS feed or distributing through a daily email, with the ability to enter an email address for distribution.
FUNCTIONALITY
The program will work as a native Windows 7+ application that can be installed, uninstalled, and operated by someone with average computer skills. If additional resources are required to run application, they should be included in the installer. Once activated the program will run in the background and sit in the System Tray for easy access. The program will open and run in the background any time the computer is booted up.
There will be a user interface that does not have to be fancy, but must work as described above. There will be no command line operations required.
The scraper will run at the prescribed time in the user interface.
The scraper will scrape all web pages and documents published to all domains within the list, along with any linked PDFs. This data will then be searched through using the keywords and phrases and all new results since the previous run will be published to an RSS feed or emailed to an address that can be entered through the UI.
There will be a list of keywords and phrases that can be entered through the user interface. Keywords will work as single words returning search results for pages and documents on which they are found. Key phrases will work as “and” functions, with any page or document including all of the words entered returning search results. That is to say, key phrases should not look for exact word ordering, any page that contains all words within the key phrase should return as a result.
OUTPUT
All pages and documents that meet the the keywords / key phrase selection criteria that are new or changed since the last run should be returned. Search results need only be compared to the previous run, not all previous results, to be considered new or changed.
Whether outputting through RSS or email, the output should be similar to Google search results in that it shows the title of the page or document as a heading and link, and below should be an excerpt from the document showing the use of the keywords or phrases as a paragraph or span.
Nº del proyecto: #12885079
Sobre el proyecto
18 freelancers están ofertando un promedio de $1266 por este trabajo
I am ready to get started right away.... Can we discuss the project details? My distinction, payment after your complete satisfaction with the resulted task.
Hello! We are a professional team of web developers with huge experience in using python for custom webapps based on django and odoo. We are available and will be happy to help you with the project. Looking forward fo Más
What experience do you have that is relevant to this project? Hi sir, I am scraping expert, I have did too many similar projects, please provide me website list that i can give you exact time frame and sample. Check Más
What experience do you have that is relevant to this project? Hi we are a software development company from india in past 8 years me & my Team developed approx 950 projects for global market , we try to satisfy cli Más