Python/Kubernets/Docker Custom task

Cerrado Publicado hace 2 años Pagado a la entrega
Cerrado Pagado a la entrega

There are 4 tasks.

1. Extract and manipulate data

Using the lookup data in [login to view URL], you should extract information about each node's tags in the HTML trees. In particular, for each node in each HTML page, we need its tag, the tag of its left and right siblings, and the tag of its parent. The utility function load_single_warc_record will allow you to download the HTML and the get_* functions will should help you extract the relevant columns (but you will have to implement one of those functions yourself).

2. Store in a database

Record all this information in an SQLite3 database. As a minimum, you should create and populate these tables:

1. webpage for storing data about the website / HTML. Namely, the URL, but also anything else you find important

2. tags for storing the four extracted tag columns and anything else you find important

As part of your assessment, we ask that you supply the SQLite3 database file containing extracted data in the relevant tables.

Note

The script used to upload the data to the database should be able to deal with new data that has been extracted by the script in part 1. The requirements are

1. It should not upload duplicate data again.

2. If the tags of a URL change it should not overwrite existing data.

3. New URLs and corresponding tags should be inserted if found.

3. Dockerize

Please write a Dockerfile that can be used to run your code end-to-end. That is, it must perform steps 1) and 2) above. To test your solution, we will run your Dockerfile with multiple files like [login to view URL] to make sure duplicates and new data is being handled correctly.

Write an accompanying script containing the exact docker build and docker run commands for that Dockerfile.

4. CI/CD

4a. Docker container

Write a CI workflow to build and deploy the docker container from the Dockerfile in step 3. You can use Github Actions for this.

4b. Orchestration

The docker container should be run daily. We use Kubernetes for orchestration, and if you have experience of Kube please write a manifest that will run this docker container on a daily basis.

Result

You should make a GitHub repository containing the code you developed for this task, structuring it in a sensible way. If you choose not to commit the file containing your SQLite database, please send it to us as an attachment along with the link to your GitHub repo.

Good luck!

Supplied to you: - [login to view URL] - [login to view URL] - [login to view URL] (this file)

Required by us: - Data: - SQLite3 database file produced by your code

• Code:

– Extraction / storage script(s)

– Dockerfile

– Script with the docker build and docker run commands

– Kubernetes manifest

– CI workflow

Resources:

• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

Kubernetes Python Docker SQLite GitHub

Nº del proyecto: #31636331

Sobre el proyecto

2 propuestas Proyecto remoto Activo hace 2 años

2 freelancers están ofertando un promedio de ₹1050 por este trabajo

prachetasoftware

I have extensive knowledge and 12 Years Experience in Python Statistics and Probability Machine Learning -UNSUPERVISED LEARNING Machine Learning - SUPERVISED LEARNING Natural Language Processing Deep Learning Artif Más

₹1050 INR en 7 días
(0 comentarios)
0.0
kprakasheee35

I have hands on Experience in Python, SQLite, Docker, Kubernetes, GitHub. I Have Very Much interested in this Project. Please Let us Discuss in detail in chat.

₹1050 INR en 7 días
(0 comentarios)
0.0