Research Trends on Cancer-Related Cognitive Impairment in Patients with Non-Central Nervous System Cancer: Text Network Analysis and Topic Modeling
Article information
Abstract
Purpose
This study aimed to understand the knowledge structure and trends in research on cancer-related cognitive impairment (CRCI) in patients with non-central nervous system (non-CNS) cancer through text network analysis and topic modeling.
Methods
From 2011 to 2021, studies on CRCI in patients with non-CNS cancer registered in databases including Ovid-MEDLINE, EMBASE, Cochrane, CINAHL, CENTRAL, and PsycInfo, were extracted and cleaned into words using Python's natural language toolkit package. Text network analysis was performed using the NetworkX library, and topic modeling analysis based on the latent Dirichlet allocation algorithm was carried out using the Gensim library.
Results
In total, 24,030 keywords were extracted from the abstracts of 490 selected papers, of which “chemotherapy,” “breast cancer,” and “quality of life” showed high frequency and centrality. As a result of the topic modeling analysis, four subject groups were derived, including cognitive impairment due to chemotherapy, breast cancer and cognitive impairment, factors related to cognitive impairment, and symptom experience.
Conclusion
These findings will help cancer researchers to understand the trends and insights of research on CRCI in patients with non-CNS cancer and suggest important areas and directions for future studies.
INTRODUCTION
Early detection of cancer and advances in treatment technology have greatly improved the survival rate of patients with cancer [1]. However, side effects from cancer treatment continue to negatively affect the quality of life of such patients with cancer [2]. Studies reported that changes in cognitive function in patients who were treated for Non-Central Nervous System (Non-CNS) cancer, had a negative impact on individuals’ activities and quality of life [3].
Cancer-related cognitive impairment (CRCI) mainly affects memory and concentration and is one of the most common health problems in patients with non-CNS cancer [4,5]. Although the exact mechanism of CRCI is un-known, reportedly, it is related to various and complex causes such as decrease in nervous tissue development, hormonal abnormalities, changes in blood flow, and cerebral atrophy [4,5]. CRCI was reported to occur in 12∼82% of patients with non-CNS cancer receiving systemic chemotherapy [4]; among them, about 17∼35% experienced severe or worse cognitive impairment that persisted for several years after the end of treatment [5]. Furthermore, 45.7% of patients with cancer experienced cognitive impairment for an average of 4.6 years after cancer diagnosis [6]. Cancer patients with CRCI are not only directly affected by hospital visits, follow-up appointments, and taking prescribed medications during treatment, but also face reduced ability to perform roles in the family, community, and at work. Moreover, they face difficulties in maintaining a social life due to psychological withering, thereby negatively affecting interpersonal relationships. Furthermore, CRCI is a critical health problem that adversely affects the overall quality of life [7-9].
Previous studies on CRCI-related research trends are mostly systematic literature reviews and meta-analyses to analyze the effect size according to short-term memory, long-term memory, information processing speed, language ability, spatial technology, and motor function [7,10,11]. Although meta-analyses and systematic reviews were commonly used research methods to examine previous studies on CRCI, they are unsuitable for macro-analyses as they aim to identify answers to specific research questions [12]. By comprehensively examining trends in research related to CRCI, it is possible to gain valuable insights. This analysis can help identify strategic utilization plans and suggest future research directions in the field.
Studies have recently been adopting an unstructured data analysis method called data mining to identify research trends and changes in a specific field. Social network analysis (SNA) is an analytical method that is commonly used to identify the contextual meaning of words and their relationships. Text network analysis (TNA) is based on network theory and can be used to identify knowledge structures and research topic trends based on the frequency, centrality, and co-occurrence of keywords [13]. The advantage of this approach lies in its ability to more intuitively examine relational phenomena, such as the interconnections and progression of research topics, as well as structural phenomena including cohesion, diffusion, and relational patterns [14]. Additionally, topic modeling is a probabilistic statistical analysis method that finds latent topics in a document and analyzes how each topic is connected under the assumption that each document is a set of specific words. The advantage of topic modeling is that it can reveal multiple topics and explain the structure by clustering multiple unstructured words according to their relationships [15,16].
In the field of nursing, recent studies used data mining techniques to identify social issues and a change of perception on social media [17], along with statistical analysis of the inter-relationships between symptoms and symptom clusters [18]. Studies analyzing the research trends could identify potential meanings and directions by forming a network of concepts with their keywords or abstracts and pinpointing logical connections between their core concepts [19,20]. Moreover, topic modeling shows the distribution for each topic and the frequency of words in numerous documents, thus simplifying subject matters and the process of gaining knowledge from them [21]. Therefore, this study aimed to identify (i) the relationship between keywords in existing CRCI-related studies using TNA and (ii) the topic groups of CRCI-related studies using topic modelling analysis. These methods can be employed to describe trends in CRCI-related research and to suggest future research topics from a new perspective.
METHODS
1. Subject of the Study
CRCI-related studies were selected by searching the international academic databases, including Ovid-MED-LINE, EMBASE (Excerpta Medica Database), Cochrane, CINAHL (Cumulative Index to Nursing and Allied Health Literature), CENTRAL (Cochrane Central Register of Controlled Trials), and PsycInfo. Studies published between January 1, 2011 and December 31, 2021 were included in this study. Studies related to CRCI have been actively conducted since 2011[4], and the search year was limited to about 10 years in order to identify the current research trends at the time of this research.
2. Data Collection
After the development of the search strategy, supervised by a librarian, data were collected from January 30, 2022 to February 3, 2022. The keywords used for the search were as follows: “cancer (keywords: cancer, neoplasm, tumor, carcinoma, malignant; MeSH term: Neoplasms)”; “cognition disorders (keywords: cognitive disorder, cognitive impairment, cognitive decline, cognitive deterioration, cognitive deficit, cognitive function, cognitive problem, cognitive dysfunction, memory, chemo fog; MeSH term: cognition disorders, neurobehavioral manifestations, cognition, executive function, learning, spatial navigation, attention, neuropsychological tests)”; and “patient (keywords: patient, person, individual, client, survivor; MeSH term: patients).” These terms were conjoined with “AND,” and all possible variations were included to enhance the search. Those studies published before December 31, 2021, were searched and selected. We excluded the following types of participants: patients who received treatments, such as cranial radiation, and patients with primary or metastatic cancer of the brain or CNS. Also, studies were excluded if they did not have an abstract or if the abstract was not written in English.
Of the 33,175 papers retrieved from the database, 22,848 papers were excluded because of duplication, and 10,327 papers were reviewed based on their title and abstract. Next, we excluded 7,792 papers that did not examine CRCI as variables in patients with cancer, 235 papers that did not target patients with cancer, 1,587 papers about patients with CNS cancer, 35 duplicate studies, and 188 papers for which abstracts were not provided. To select papers, one researcher (HJK) and one nursing professor (JHP) independently reviewed the titles and abstracts of both the included and excluded papers. When there were disagreements, the papers were reviewed to assess whether they met the selection criteria through discussion. A total of 490 papers for which consensus was reached were used for data analysis in this study.
3. Data Analysis
In this study, the Natural Language Toolkit (NLTK) package was used for word extraction and refinement, followed by TNA. NLTK is a natural language processing tool based on the programming language Python. We also visualized TNA through the NetworkX library, which is also a Python package. Finally, for each major topic's Latent Dirichlet Allocation (LDA) model, topic modeling analysis was performed using the Python package Gensim. The analysis process referred to the tutorials on the official home pages of NLTK, NetworkX, and Gensim libraries [22].
1) Word extraction and refinement
Abstracts of CRCI-related studies on patients with non-CNS cancer were analyzed to extract words using the NLTK Python package. For morpheme separation, synonyms, defined words, and exception words were created using the stopwords function and the multi-word expression tokenizer (MWETokenizer) function, and refinement was performed by repeating statements. Synonyms consisted of words or phrases with similar meanings, which were interchangeably used based on the authors’ discretion, and the means adopted to unify the singular, plural, and lowercase letters. Defined words were phrases expressing a single meaning, such as “quality of life,” “social support,” or “cancer survivors.” Exceptional terms were removed as they were single-letter words that are difficult to understand and words that express general concepts, such as “result,” “score,” and “behavior.”
2) Text network analysis and keyword centrality analysis
The term frequency refers to the number of occurrences of a word in the entire document and indicates the key words used frequently in papers. Accordingly, we used the most common function of the Counter library to extract 30 words with a frequency of more than 50 occurrences, considering the readability of the visualization. In addition, for TNA, a link was created using the bigram function in the NLTK package and visualized using the pandas_adjacency function in the networkX package. Centrality measures refer to a number indicating how central a role each word plays compared to other words in the generated network, and how much influence it can have on other words. The centrality measure is a relative and not an absolute number and has a value between 0 and 1. The higher the value, the more central the word is to the network and the larger its node. Representative centrality measures include degree centrality, closeness centrality, and between centrality [22].
Degree centrality is defined as the number of links incident upon a node. This is considered high when there are a significant number of other words directly connected to one word in the text. Closeness centrality is a concept that measures how close one word is to another [22]. The higher is the closeness centrality, the closer it is to other words in a central position of the entire network, and the easier it is to form relationships with them. Between centrality measures how well a word plays a mediating role in establishing a network with other words. When the between centrality is high, the word's role in giving meaning across words increases, such that when the word is excluded, sentence comprehension becomes difficult [14]. This is an integrated and intuitive identification of words located at the center of the entire network and the degree of connection in CRCI-related studies, and represents the quantitative importance and influence of each word.
3) Topic modeling analysis
Based on the data refined in the previous step, topic modeling was performed using an LDA model. A document (abstract)-word matrix was created to apply the LDA model, and topic modeling was performed using the Gensim library [22]. To select the number of topics (hyperpara-meter), the number of topics was successively increased by one, and the coherence score was reviewed. The higher is the coherence score, the more appropriate is the number of topics. Based on the results, the coherence score was the highest at 0.34 when the number of topics was 4. The ⍺ value representing the allocation value for generating the topic distribution within the document was set to 0.05. The β value representing the allocation value for generating the word distribution within the topic was set to 0.01. In addition, the number of iterations to derive the optimal number of topics was set to 100. In addition, by changing the λ value representing the composition of words within the topic, words that represented the topic of each paper were identified and named. The λ value was set to 0.8.
4. Ethical Consideration
Our institution does not require ethical approval for reporting text analysis.
RESULTS
1. Analysis of Keywords and Their Frequency in Cancer Related Cognitive Impairment Research
A total of 24,030 words were extracted by refining the selected 490 abstracts. Figure 1 shows the top 15 words of simple frequency in the CRCI-related studies. Including “Chemotherapy,” which appeared 843 times, the top 150 keywords were “Breast cancer,” “Quality of life,” “Age,” “Women,” “Acute lymphoblastic leukemia,” “Depression,” “Fatigue,” “Symptom,” “Older,” “Executive function,” “Attention,” “Anxiety,” “Radiotherapy,” and “Working memory.”
2. Analysis of Network Connection Structure and Centrality of Cancer-related Cognitive Disorder Research
As a result of network analysis of the top 30 frequent keywords among the 24,030 extracted keywords, a network connection structure consisting of 6,093 nodes and 18,959 links was confirmed (Figure 2). The density of the analyzed network was 0.001, and the average length of the network was 3.771. Table 1 shows the degree, closeness, and between centrality of the CRCI-related studies. First, in centrality analysis, the top 5 keywords with a high degree centrality, which indicate the degree to which one keyword is related to other keywords, were “Breast cancer,” “Chemotherapy,” “Quality of life,” “Age,” and “Acute lymphoblastic leukemia.” The top 5 keywords with high closeness centrality, which confirms the proximity to other keywords, were “Breast cancer,” “Chemotherapy,” “Quality of life,” “Older,” and “Age.”” The top 5 keywords with high between centrality, confirming its keyword mediator role, were “Breast cancer,” “Chemotherapy,” “Quality of life,” “Age,” and “Acute lymphoblastic leukemia” (Table 1). Meanwhile, as a result of centrality analysis, the main keywords included in the top 30 were “Breast cancer,” “Chemotherapy,” “Quality of life,” “Age,” “Acute lymphoblastic leukemia,” “Symptom,” “Women,” “Older,” “Fatigue,” “Depression,” “Executive function,” “Attention,” “Stress,” “Neuropsychological test,” “Working memory,” “Radiotherapy,” “Surgery,” and “Verbal memory.”
3. Topic Modeling Analysis
Four groups of topics were extracted as a result of the topic modeling (Figure 3). Topic 1 accounted for 41.2% of all papers, and included keywords such as “Chemotherapy,” “Breast cancer,” “Women,” “Age,” “Executive function,” “Acute lymphoblastic leukemia,” “Older,” “Quality of life,” “Neuropsychological test,” “Anxiety,” “Working memory,” “Fatigue,” “Attention,” “FACT-COG,” and “Testicular cancer.” Topic 1 was named “Chemotherapy-induced cognitive impairment” as it included major cancers that cause cognitive decline due to chemotherapy and papers measuring cognitive tests and cognitive domains.
Topic 2 accounted for 23.8% of all papers, and constituted keywords such as “Breast cancer,” “Quality of life,” “Chemotherapy,” “Fatigue,” “Symptom,” “Women,” “Physical activity,” “Age,” “Radiotherapy,” “Attention,” “Acute lymphoblastic leukemia,” “Androgen Deprivation Therapy,” “Depression,” “Older,” and “Psychological distress.” Topic 2 included papers measuring the cognitive function of patients with breast cancer and was thus named “Cognitive impairment in patients with breast cancer.”
Topic 3 accounted for 18.1% of all papers, and comprised keywords such as “Chemotherapy,” “Breast cancer,” “Age,” “Acute lymphoblastic leukemia,” “Quality of life,” “Depression,” “Older,” “Adolescent and young adult,” “Attention,” “Radiotherapy,” “Anesthesia,” “Surgery,” “Anxiety,” “HRQOL,” and “Insomnia.” Topic 3 included papers measuring the cognitive decline in patients with various cancers, including breast cancer, and was named “Factors related to cognitive decline.”
Topic 4 accounted for 16.9% of all papers and included keywords such as “Breast cancer,” “Chemotherapy,” “Quality of life,” “Work,” “Symptom,” “Prospective memory,” “Fatigue,” “Depression,” “Lung Cancer,” “Retrospective memory,” “Cognitive limitations,” “Minimental state examination,” “Age,” “Acute lymphoblastic leukemia,” and “Older.” As this topic included papers measuring factors such as “Symptoms,” “Fatigue,” “Depression,” and papers measuring “Prospective memory,” “Retrospective memory,” and “Cognitive limitations,” it was named “Experience symptoms of cognitive decline.”
DISCUSSION
TNAs and topic modeling can be used as useful research methods in producing evidence for the formation of knowledge bodies in nursing. This study identified the core keywords in literature through TNA and topic modeling, and determined the relevance between keywords, their inter-relationships, and the context of core research topics [23]. Furthermore, we developed a network of keywords on CRCI-related studies and classified the outcomes into the following four groups.
The first group, namely “chemotherapy-induced cognitive impairment,” consisted of keywords related to subjective and objective cognitive decline in cancer patients receiving chemotherapy, accounting for 41.2% of the analyzed papers. “Chemotherapy” showed a significantly high frequency of occurrence, degree, closeness, and between centrality, which means that many studies have been conducted on cognitive functions related to chemotherapy. In a previous study [3] that identified research trends on cognitive impairment in cancer survivors, the incidence of cognitive decline due to chemotherapy was reported to be the highest among those receiving hormone therapy, targeted therapy, and chemotherapy.
Keywords derived through topic modeling analysis were cognitive function tests such as “neuropsychological test” and cognitive domains such as “attention” and “work memory.” Cognitive function is divided into objective cognitive function, measured by neuropsychological tests, and subjective cognitive function, measured by self-reported questionnaires (e.g., Functional Assessment of Cancer Therapy Cognitive Scale). Objective cognitive function is divided into attention, memory, executive function, psychomotor processing speed, and language fluency [24]. Chemotherapy-related cognitive impairment was reported mainly in memory, processing speed, attention, and executive function, although there were differences between studies [3,7,10,11]. Subjective cognitive function is measured by a patient's perceived cognitive complaints, which remains controversial because of its low correlation with the objective cognitive function tests [3]. However, the subjective cognitive function should be considered important given that it can identify subtle cognitive changes in daily life perceived by patients and evaluate their impact on the patients’ quality of life.
The second group, namely “breast cancer and cognitive impairment,” consisted of keywords related to cognitive decline in patients with breast cancer, which accounted for 23.8% of the analyzed papers. In CRCI studies, breast cancer showed high frequency, degree, closeness, and between centrality. Keywords derived through topic modeling analysis were related to cognitive decline in patients with breast cancer, such as “chemotherapy,” “radiotherapy,” and “androgen deprivation therapy.” These individual therapies, even when combined, are commonly provided as adjuvant therapies to patients with breast cancer [11]. Cognitive complaints are common adverse effects for patients with breast cancer, with potential negative impacts on quality of life [9]. Nurses and nursing researchers should pay attention to the cognitive functions and interventions of patients with breast cancer. However, considering that CRCI frequently affects not only breast cancer patients but also those suffering from colorectal cancer and hematologic tumors such as leukemia [25], further research is necessary to identify the pattern of occurrence and related factors of CRCI in patients with cancer.
The third group, namely “related factors for cognitive decline,” consisted of keywords related to demographics, diseases, and treatments for various cancer types and cognitive decline, which accounted for 18.1% of the analyzed papers. This result reflects that most CRCI-related research was aimed at identifying cognitive impairment characteristics, predictors, influencing factors, and mechanisms [26,27]. Subjective or objective cognitive impairment of patients with cancer includes demographic characteristics such as age, education level, and intelligence (IQ); treatment-related characteristics such as chemotherapy, radiotherapy, and surgery; psychological factors such as depression, anxiety, stress, and worry; and other factors such as sleep disturbance and fatigue [26,28,29]. CRCI is a complex phenomenon mediated by various factors and cancer-related therapies [28]. Therefore, it is necessary to continuously identify factors affecting the occurrence of CRCI to identify high-risk groups for cognitive decline due to cancer and develop effective intervention strategies.
Most studies on CRCI focused on the occurrence and related factors of cognitive impairment and prevention or non-pharmacological therapies such as rehabilitation and training to improve cognitive impairment and drug therapies had low frequency and centrality. Previous studies reported that cognitive stimulation, cognitive training, and physical activity effectively improve subjective cognitive function, memory, and concentration [30]. Therefore, in the future, it is necessary to develop and implement programs for improving cognitive impairment to help patients with cancer improve their quality of life, and to undertake studies to examine the effects of such programs.
The fourth group, namely “symptom experience of cognitive decline” consisted of keywords on cognitive decline and symptom experience in patients with cancer, which accounted for 16.9% of the analyzed papers. Keywords derived through topic modeling analysis were identified as “quality of life,” “fatigue,” and “depression.” For “quality of life,” frequency and centrality were found to be high, along with those for “chemotherapy” and “breast cancer”; as a result of topic modeling analysis, it was evenly distributed in all subject groups. This could mean that quality of life was used as an outcome in CRCI research, and that cognitive decline has a significant impact on the quality of life of patients with cancer. In general, patients with cancer have a lower quality of life than the general population, and thosewith cognitive impairment experience various complex problems, such as psychosocial problems and issues with returning to work, compared to those without cognitive impairment, resulting in a lower quality of life [2,31]. Considering that a deterioration in the quality of life of patients with cancer has a negative impact on cancer recurrence and return to daily life, it is necessary to determine high-risk groups for cognitive decline at the time of cancer diagnosis and apply for customized cognitive rehabilitation and training programs to improve the quality of life of patients with cancer [32].
In addition, keywords related to symptom clusters such as “fatigue,” “depression,” and “anxiety,” were derived from topic modeling analysis. Patients with cancer experience fatigue, depression, psychological distress, and anxiety even after the end of treatment [33]. Moreover, these symptom clusters potentially cause cognitive impairment in patients with cancer, along with fatigue, depression, and anxiety [9,26,29], thus having a negative impact on their treatment and quality of life [33,34]. To improve cognitive decline of patients with cancer and help them return to daily life, individualized intervention and management is required in areas of “physical activity,” “education,” “strategies,” and “cognitive rehabilitation” as derived from the subject groups [9,34].
The significance of this study lies in its identification of the latest research trends through TNA and topic modeling analysis of previous CRCI studies. Cancer related cognitive impairment is a common problem among patients with cancer. Therefore, nursing researchers should pay greater attention to conducting research on the CRCI for patients with cancer. This study's findings enable a quantitative visualization of the current CRCI research status and can be utilized by researchers to identify areas for future nursing studies. However, few limitations have hindered the flow of this research to some extent. A librarian's advice led to the search for analyzable data through research on cognitive impairment in patients with cancer. However, only two researchers agreed to provide such data, as per the selection criterion, and could be included in the analysis. This data requires a cautious interpretation, since only English and Korean papers were selected for analysis. Although the NLTK package was used to extract and refine the words, the words used in the study were expressed as per the authors’ discretion. This may have led to a probable bias in keyword extraction from the abstracts. Furthermore, as this study focused on keywords with high frequencies and centralities, generalization of these results should be done carefully.
CONCLUSION
This studyconducted TNA and topic modeling of historical CRCI studies in patients with non-CNS cancer to identify the core keywords and network structure. This led to the discussion of the knowledge structure among the core keywords of cancer-related cognitive impairment. Based on the results, research on CRCI is primarily conducted on patients with chemotherapy induced cognitive decline and breast cancer, to the extent of the emergence of the term “Chemobrain.” In addition, many studies have investigated the effects of CRCI and its related factors, on a cancer patient's quality of life. Nevertheless, further research is required on developing interventions, to help patients with cancer recover from CRCI, and adapt to and enjoy a good quality of life.
Statistics Korea. 2010 life tables for Korea [Internet]. Seoul: Statistics Korea; 2011[cited 2012 January 16]. Available from: http://kostat.go.kr/portal/korea/kor_nw/3/index.board?bmode=read&aSeq=252533.
Notes
CONFLICTS OF INTEREST
Jin-Hee Park is now editorial board members of the Journal of Korean Academy of Fundamentals of Nursing. She was not in-volved in the review process of this manuscript. Otherwise, there was no conflict of interest.
AUTHORSHIP
Conceptualization - Park J-H; design - Park J-H; resources and acquisition of data - Kim H-J and Park J-H; analysis and interpretation of data - Kim H-J and Park J-H; writing-original draft preparation - Kim H-J, Bae SH, and Park J-H; writing-review and editing - Kim H-J, Bae SH, and Park J-H; supervision - Park J-H; funding acquisition - Park J-H. All authors have read and agreed to the published version of the manuscript.
DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author upon reasonable request.