create_docs
Prepare documents for topic extraction.
DocStudy
Bases: TypedDict
Data container for a study that will be used to generate a doc.
Attributes:
| Name | Type | Description |
|---|---|---|
title |
str
|
Title of the study. |
abstract |
str
|
Abstract of the study. |
keywords |
str
|
Keywords of the study. |
Examples:
>>> study: DocStudy = {
... "title": "machine learning",
... "abstract": "machine learning is often used in the industry with the goal of...",
... "keywords": "machine learning, code smells, defect detection"
... }
>>> study
{'title': 'machine learning', 'abstract': 'machine learning is often used in the industry with the goal of...', 'keywords': 'machine learning, code smells, defect detection'}
Source code in src/sesg/topic_extraction/create_docs.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | |
concat_study_info(study)
Concatenates the information of the study into a string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
study |
DocStudy
|
Study with title, abstract and keywords. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A string with the following format: "{title}\n{abstract}\n{keywords}". |
Examples:
>>> study: DocStudy = {
... "title": "machine learning",
... "abstract": "machine learning is often used in the industry with the goal of...",
... "keywords": "machine learning, code smells, defect detection"
... }
>>> concat_study_info(study)
'machine learning\nmachine learning is often used in the industry with the goal of...\nmachine learning, code smells, defect detection'
Source code in src/sesg/topic_extraction/create_docs.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
create_docs(studies_list)
Creates a list of documents where each document is a string with the title, abstract and keywords of the study.
Can be used with extract_topics_with_lda or extract_topics_with_bertopic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
studies_list |
list[DocStudy]
|
List of studies with title, abstract and keywords. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List of documents. |
Examples:
>>> s1: DocStudy = {
... "title": "machine learning",
... "abstract": "machine learning is often used in the industry with the goal of...",
... "keywords": "machine learning, code smells, defect detection"
... }
>>> s2: DocStudy = {
... "title": "artificial intelligence",
... "abstract": "artificial intelligence is often used in the industry with the goal of...",
... "keywords": "artificial intelligence, code smells, defect detection"
... }
>>> create_docs([s1, s2])
['machine learning\nmachine learning is often used in the industry with the goal of...\nmachine learning, code smells, defect detection', 'artificial intelligence\nartificial intelligence is often used in the industry with the goal of...\nartificial intelligence, code smells, defect detection']
Source code in src/sesg/topic_extraction/create_docs.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | |