R or python project
$10-30 USD
Pagado a la entrega
SUBJECT / AREA: "Azerbaijan Old Literature"
You are expected to create CORPORA of text in Azebaijan language that belong to category specified in SUBJECT.
REQUIREMENTS:
- Only Azerbaijani text is allowed (any text in other language must be excluded);
- All final text must be in textual file;
- Sentences shorter than 3 words must be excluded;
- Only complete sentences should be used;
- Poem/Poetry is not allowed;
- Each sentence must start with Letter (first symbol can't be number, or any other symbol like "-, _, (, ), ..." etc.);
- Format is one sentence per line - each sentence must start from new line and end with ".";
- Broken sentences (when sentence has EOF in middle) are not allowed;
- Only Single space between all words;
- All page-numbers, headers, titles, etc. must be excluded - just senten
DELIVERABLES:
1) Final Textual file (.TXT) with all sentences
2) List of book-title used as source
3) Source files (.PDF, .DOC electronic books) where text extracted from
TEXT SOURCE: Its totally your responsibility to find publicly available text source (.PDF, .DOC, etc.)
Example of source: [login to view URL]
Nº del proyecto: #16941470