Python/Kubernets/Docker Custom task
₹600-1500 INR
Pagado a la entrega
There are 4 tasks.
1. Extract and manipulate data
Using the lookup data in [login to view URL], you should extract information about each node's tags in the HTML trees. In particular, for each node in each HTML page, we need its tag, the tag of its left and right siblings, and the tag of its parent. The utility function load_single_warc_record will allow you to download the HTML and the get_* functions will should help you extract the relevant columns (but you will have to implement one of those functions yourself).
2. Store in a database
Record all this information in an SQLite3 database. As a minimum, you should create and populate these tables:
1. webpage for storing data about the website / HTML. Namely, the URL, but also anything else you find important
2. tags for storing the four extracted tag columns and anything else you find important
As part of your assessment, we ask that you supply the SQLite3 database file containing extracted data in the relevant tables.
Note
The script used to upload the data to the database should be able to deal with new data that has been extracted by the script in part 1. The requirements are
1. It should not upload duplicate data again.
2. If the tags of a URL change it should not overwrite existing data.
3. New URLs and corresponding tags should be inserted if found.
3. Dockerize
Please write a Dockerfile that can be used to run your code end-to-end. That is, it must perform steps 1) and 2) above. To test your solution, we will run your Dockerfile with multiple files like [login to view URL] to make sure duplicates and new data is being handled correctly.
Write an accompanying script containing the exact docker build and docker run commands for that Dockerfile.
4. CI/CD
4a. Docker container
Write a CI workflow to build and deploy the docker container from the Dockerfile in step 3. You can use Github Actions for this.
4b. Orchestration
The docker container should be run daily. We use Kubernetes for orchestration, and if you have experience of Kube please write a manifest that will run this docker container on a daily basis.
Result
You should make a GitHub repository containing the code you developed for this task, structuring it in a sensible way. If you choose not to commit the file containing your SQLite database, please send it to us as an attachment along with the link to your GitHub repo.
Good luck!
Supplied to you: - [login to view URL] - [login to view URL] - [login to view URL] (this file)
Required by us: - Data: - SQLite3 database file produced by your code
• Code:
– Extraction / storage script(s)
– Dockerfile
– Script with the docker build and docker run commands
– Kubernetes manifest
– CI workflow
Resources:
• [login to view URL]
• [login to view URL]
• [login to view URL]
• [login to view URL]
• [login to view URL]
• [login to view URL]
Nº del proyecto: #31636331
Sobre el proyecto
2 freelancers están ofertando un promedio de ₹1050 por este trabajo
I have extensive knowledge and 12 Years Experience in Python Statistics and Probability Machine Learning -UNSUPERVISED LEARNING Machine Learning - SUPERVISED LEARNING Natural Language Processing Deep Learning Artif Más
I have hands on Experience in Python, SQLite, Docker, Kubernetes, GitHub. I Have Very Much interested in this Project. Please Let us Discuss in detail in chat.