Crawler

Cerrado Publicado Apr 22, 2007 Pagado a la entrega
Cerrado Pagado a la entrega

I require a crawler/spider that will have a number of features: Project Requirements 1. Database creation 2. Routine for allowing the entry of the websites to be crawled 3. Means of tracking what site records came from and then deleting or achiving if record has been removed from original site 4. Multitheaded crawling 5. Ability to recognise data fields and include them in appropriate tables 6. Ability to identify if we are being blocked and then go through proxies if required (a list of proxies is needed) 7. Needs to be able to get around [url removed, login to view] telling it cant crawl 8. Needs to be able to be slowed down if it is being firewalled due to sites rejecting too many queries 9. Needs a means to identify if the same record is in two sites 10. Needs a means to report its success or if it is being blocked and updates 11. Ability to follow to other sites that are linked and run routine again 12. Must use open source software Its a condition of accepting the project that the programmer must assign copyright to us for this and any subsequent work performed for us. All work must be done on our servers. Payment by escrow

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

I don't care what you use.

Amazon Web Services Ingeniería MySQL PHP Arquitectura de software Verificación de software Web Hosting Gestión de páginas web Verificación de páginas web

Nº del proyecto: #2935482

Sobre el proyecto

2 propuestas Proyecto remoto Activo May 13, 2007

2 freelancers están ofertando un promedio de $159 por este trabajo

askacodervw

See private message.

$170 USD en 14 días
(29 comentarios)
5.5
microsmnet

See private message.

$148.75 USD en 14 días
(3 comentarios)
0.0