Applying Social Network Analysis to Understand the Percentages of Keywords within Abstracts of Journals: A System Review of Three Journals
Tsair Wei Chien1,2, Yang Shao3 and Willy Chou4,5*
1Research Departments, Chi-Mei Medical Center, Taiwan
2Department of Hospital and Health Care Administration, Chia-Nan University of Pharmacy and Science, Tainan, Taiwan
3Department of Electronics and Information Engineering, Tongji Zhejiang College, China
4Department of Sports Management, Chia Nan University of Pharmacy and Science, Taiwan
5Rehabilitation Department, Chi-Mei Medical Center, Taiwan
Submission: May 02, 2018; Published: July 02, 2018
*Corresponding author: Willy Chou, Department of Sports Management, College of Leisure and Recreation Management, Chia Nan University of Pharmacy and Science, Tainan, Taiwan, Email: firstname.lastname@example.org
How to cite this article: Tsair Wei Chien, Yang Shao, Willy Chou. Applying Social Network Analysis to Understand the Percentages of Keywords within
Abstracts of Journals: A System Review of Three Journals. Curr Trends Biomedical Eng & Biosci. 2018; 16(1): 555926.
Background:Academic literature suggests keywords that are retrieved from a paper’s title and abstract represent important concepts in that study. The percentage of keywords within an abstract (PKWA) is required to investigate.
Objective:To compare the PKWA in journals of medical informatics and the keyword network relationship in order to develop a self-examining policy for the journal.
Methods:Selecting 5,985 abstracts and their corresponding keywords in three journals (JMIR, JAMIA, and BMC Med Inform Decis Mak.) published between 1995 to 2017(April) on the US National Library of Medicine National Institutes of Health (Pubmed.org), we computed the PKWA for each journal by using MS Excel modules and compared the percentage differences across journals and years via a two-way ANOVA. Social Network Analysis (SNA) was performed to explore the relations of keywords in journals.
Results:The PKWA are 48.81, 41.59, and 56.84 for the three journals, respectively. A statistically significant difference (p<0.05) is found in the percentages among journals selected. In contrast, no differences (p>0.05) are found (1) between years (2016 and 2017) and (2) in interaction effects between journals and years. Three journals display significantly different patterns in network keywords and major cohesion measures.
Conclusion:It is required to apply the computer module when inspecting whether keywords are within abstracts. The cohesion measure provides journal editors with a method of examining keywords within an abstract for a paper under review.
Keywords: Cohesion measure; Journal; Social network analysis; Abstract
Authors are required to provide three to ten keywords that represent the main content of the article when submitting it to a journal [1-5]. Keywords or short phrases published with an abstract can assist indexers in cross-indexing the article. However, few studies have investigated whether keywords are substantially associated with the abstract and what percentages of keywords truly exist in the corresponding abstract.
Meanwhile, we have seen some computer scientists placing high hopes in machine-learning algorithms, data mining and artificial intelligence. All of these methods are based on recently developed technologies of Natural Language Processing (NLP) and Text Prediction to process natural spoken language, to read
unstructured data in Big Biomedical Data (BBD), to comprehend
the intent of physicians, to quantify research information, and
even to create a structured database [6-16]. Furthermore, informal patient data on the Web is increasing, accessible, inexpensive, available in real-time, and seems likely to cover a significant proportion of the population. Accordingly, extracting the intent of authors from unstructured journal papers may be possible and reachable in the near future. The keywords suitable for use in an index should be examined on the matter.
In literature, keywords retrieved from a paper’s title and abstract as important words to a study can help readers to find the article. We expect keywords are specific enough to represent the manuscript content. To answer whether each keyword appears in the accompanying abstract requires analysis. The Percentage of
Keywords (PKW) within an abstract for a paper can be used to
In search of keywords “internet OR Internet” to Pub-med on
2017/04/24, we have seen 84,069 published papers, in which
2,073 articles are subject to J Med Internet Res. What keyword in
papers is most closely associated with “internet” is still unknown.
An apocryphal story often told to illustrate data mining concepts
is about beer and diaper sales,which were strongly correlated
[17-19]. We are interested in using Social Network Analysis (SNA)
[20-22] to analyze keywords related to a journal’s aims and scope
as some studies reporting co-authorship relations within and
between papers [23-25].
The SNA approach [26-28] is used to define facilities as the
“nodes” of a keyword network connecting to another node (e.g.,
a square box) with a relation represented as an edge (e.g., an
arrow line) [20,24]. For instance, a string of 4 3 5 denotes that the
keyword 4 associated with another keyword 3 accounts for 5 times
(with a weight 5) within a specific period, displayed graphically
as γ4γ3. Several algorithms and measures have been applied
to SNA. When the aim is to investigate the status of an actor in
the network, the centrality measures should be applied . This
means that an actor is analyzed generally by its centrality [29,30].
In this study, we selected three journals (i.e., J Med Internet
Res. [JMIR], J Am Med Inform Assoc. [JAMIA], and BMC Med Inform
DecisMak. [MIDM]) from the category of medical informatics
to compare their PKWA and their centrality (which takes into
account three measures of Degree centrality, Closeness centrality
and between’s centrality for the published papers in journals).
Our aims are to
a) Compare the PKWA among journals.
b) Show the pattern of a journal according to the keywords’
association and compute the macro cohesion measure.
c) Apply SNA to identify whether author’s papers target
the journal’s scopes and aims according to the minor cohesion
measure of the journal.
d) Evaluate the equality of centrality for a journal using
Ferguson’s delta coefficient [31-34].
Selecting 5,985 abstracts and their corresponding keywords in
three journals (JMIR, JAMIA, and MIDM) published between 1995
and 2017 (April 15th) from the US National Library of Medicine
National Institutes of Health (Pubmed.com), we computed the
PKWA by using an Excel module made by the author. All papers
with one keyword or more are included. We can see in Multimedia
1 showing both JMIR and JAMIA start from 2013 and MIDM from
2016 instead. The Figure 1 presents the study flowchart.
We demonstrated two ways to show each journal’s PKWA: (i)
the MML (Method of Maximum Likelihood)  with a diagram
comparison and (ii) the mean comparison using a two-way
ANOVA across journals and years. The former was employed to
select the maximal count number determined as the PKWA across
all possible PKWAs (from 0 to 1.0 by an interval of 0.1 and the
nil representing no keyword in an article) for each journal. The
latter was used to compute the total count within the respective
abstract over the total count across all abstracts for a specific
journal in 2016 and 2017, due to the minimal overlapping years
being limited to the MIDM PKW that were available, Percentages
of key words within an abstract across years and journals.
To select two keywords with the strongest association for
ease of display, i.e., with a large number of counts simultaneously
listed in an article, we extracted the top 100 pairs with the highest
linkage count using Pajek SNA software  to draw the visualized
representations. The wider and darker linkage line between two
keywords (i.e., called the edge between nodes in SNA) is shown, the
stronger the association will be. The larger bubble represents the
higher probability of a keyword’s occurrence in the journal. Any
node with an identical color means it belongs to a similar category
of the keyword occurrence number. We chose the weighted degree
centrality measure to draw the keyword pattern and selected the
separate component algorithm to plot the drawing. For detailed
information, interested readers are advised to refer Extracting
data using an author-made MS Excel module http://www.healthup.org.tw/marketing/course/marketing/
There are three centrality measures usually applied to SNA
1. Degree centrality of a node is defined as the total number
of edges that are adjacent to this node. This measures how
many linkages directly connect keywords to their neighbours
in the network. Closeness centrality focuses on how close an
actor is to all other actors. It is measured as a function of mean
geodesic/shortest distances .
2. Closeness centrality thus extends the description of
degree centrality with a focus on that a keyword is relatively
most close to all the other authors.
3. Between’s centrality expresses an operationalization
of centrality on the basis of specifying how often a node is
found on the shortest route between each pair of nodes in the
Due to different scaling scores across all three measures, we
standardized them following ~N(0,1). The cohesion measure for
examining the extent of any paper targeting a journal’s scope is
obtained by averaging the above mentioned three standardized
centrality measures. A higher cohesion measure means a stronger
keyword association with the journal’s features. For detailed
information, interested readers are recommended to consult
Computing major cohesion measure for a journal using Pajek
SNA software. http://www.healthup.org.tw/marketing/course/
Ferguson’s delta [31-34] is an index of discrimination
measured by the proportion of discriminations (i.e., the degree
of uniform distribution). It is reported that a normal distribution
would be expected to have a discrimination of delta>0.90. We
applied it to examine whether journals have an identical delta
coefficient. A higher value means a more uniform distribution
among the journal papers in cohesion measures.
Summarizing data from Multimedia 1, we examined the top
point on the line chart foreach journal in Figure 2 (i.e., 30% for
JMIR, 80% for MIDM, and nil for JAMIA) and found that JAMIA has
many articles without any keyword in this period from 2013 to
2017(April). If ignoring the nil portion (e.g., non-research articles
such as perspectives, reviews, editorials, etc), JAMIA’s PKW is 30%
equal to JMIR using the MML approach.
From Table 1, we can see that a significant difference exists
among the journals, but there is no difference between years (i.e.,
year 2016 and 2017). The PKWA means are 56.84% (MIDM),
48.82 (JMIR), and 41.59(JAMIA), respectively.
The most frequently used keywords listed by authors in
papers (with keywords in the period from 2013 to 2017) are
internet (JMIR), electronic health records (JAMIA), and area under
the curve (MIDM), see Table 2. Relatively, the most frequently
used keywords are information (JMIR), ONC (JAMIA), and clinical
(MIDM) when applying journal keywords (2,051 in JMIR, 2,688 in
JAMIA, and 1,246 in MIDM) to search abstracts of all papers from
the beginning of the journal article publication.
We traced the keyword patterns of the three journals. We
can see that internet and electronic health records present a
significant core category in networks of JMIR in Figure 3A and
JAMIA in Figure 3B. MIDM, on the other hand, has not shown
any core category in its network, see Figure 3C. The closest
association pairs between two keywords are internet and social
media for JMIR in Figure 3(A), electronic health records and
health information technology policy for JAMIA in Figure 3B, and
decision aids and shared decision making for MIDM in Figure 3C.
Excluding those cases without any keyword in a paper, the
macro cohesion measures [=mean of standardized centrality
= (Weighted all degree + Closeness + Between’s)/3] are 7.73
(SD=0.24) for JMIR, 4.47 (SD=0.25) for JAMIA, and 1.01 (SD=0.21)
for MIDM, respectively, indicating that JMIR earns the greatest
cohesion measure. The Ferguson’s delta coefficients are 0.86 for
JMIR, 0.90 for JAMIA, and 0.97 for MIDM, respectively, implying
that JAMIA suffers from less equality in the macro cohesion
measure, see Figure 4.
The journal with the most cohesion is JMIR with a measure
of 7.73 (SD=0.24). Both JMIR and MIDM earn a high Ferguson
coefficient (0.96 and 0.97). Although MIDM gains the highest
PKWA among the three journals, its keyword count begins in
2016, later than its two counterparts, which start in 2013.
Many studies reported co-authorship relations within and
between papers using SNA [23-25]. The association between beer
and diaper sales [17-19] can be easily found by the SNA approach.
However, we have not seen any paper using keywords in papers
to investigate journal cohesion tendency and it’s PKWA, though
keywords are required to be extracted from a paper’s title and
abstract to help readers interested in its topic to find the article
in the future
Through this study, we suggest that the journal editor’s
assistant be able to (i) objectively measure the extent of paper
cohesion in accordance with the journal scope and aims, as in
Figure 4, (ii) efficiently examine keywords emerging in each
paper’s abstract, and (iii) graphically depict journal’s keyword
associations, as in Figure 3.
Machine-learning algorithms and data mining have
incorporated artificial intelligence based on Natural Language
Processing (NLP) and Text Predictions to interpret natural spoken
language [6-16], which could be applied to an article and its
abstract. Before reaching this milestone, we are looking forward
to seeing more papers that analyze keywords among similar
journals using SNA.
In statistics, Exploratory Data Analysis (EDA) is an approach
to analyzing data in order to summarize their main characteristics,
often with visual methods. Thus, EDA discovers what the data
can tell us beyond formal modelling or hypothesis testing .
The information shown in Figure 3 can help us know the journal
image using the keyword SNA. Furthermore, journal editors and
reviewers will focus more efforts on keywords and its PKW in the
future. As a result, the journal’s aims and scope will be obviously
recognized from its keywords’ alignment with the abstracts and
titles of its contents.
Readers may be curious about the relations between centrality
measures. We conducted a small study on correlations among
Degree, Closeness and between’s . The Closeness centrality
(i.e., corr.≈0.30) is less correlated to the other two measures. The
Degree is closely associated to between’s (i.e., corr.>0.90). For
simplicity’s sake, we can select either Degree or between’s as a
measure in the future, see Multimedia 3. In addition, the keyword
is a noun instead of an adjective. We see some, such as medical
and clinical abbreviations and acronyms, were found in Table
2. Journal editors and reviewers should put more emphasis on
keyword correction and are suggested to use the checking system
of Me SH term  in the future.
We present two videos in Multimedia 2 and 3 to interested
readers: (i) how to extract data from such internet cloud databases
as the US National Library of Medicine National Institutes of Health
(Pubmed.org), and (ii) how to proceed with the cohesion measure
using Pajek SNA software. Future researchers are suggested
to mimic this approach on other journals’ keywords using SNA,
which is somewhat different from search and extraction methods
in literature .
We used SNA to analyze keyword associations in journals,
which is different from others applying to health report issues
[21,41]. In Figure 3, we can see that JMIR is dominated by the
keyword “internet” and JAMIA by “electronic health record”
because the closest association pairs are centred by the keywords
“internet” and “electronic health records” for the both journals. As
for MIDM, no special term was to dominate the journal, indicating
that EDA is very different from initial data analysis (IDA) ,
which focuses more narrowly on checking the assumptions required for model fitting and hypothesis testing, handling
missing values, and transforming variables as needed. EDA thus
encompasses IDA to help us in policy making.
This study has several limitations. First, all data were
extracted from Pubmed.com. Some keywords were originally
incorrectly saved in the dataset, such as comma, asterisk, and
period separation symbols that interacted between keywords,
and this will affect the results and inference making of the study.
Second, there are many algorithms used for SNA. We merely
applied the separation components shown in Figure 3. Any
changes made along with algorithm used will present different
pattern and judgment. Third, we applied Ferguson’s delta as a
uniform distribution index that cannot represent any better or
worse performance to the journal when the cohesion measure is
an indicator used in the study. The major cohesion measure (i.e.,
the mean of the minor cohesion measures in papers) is suggested
to be used to determine the focus of journal aims and scope
attained. A cutting point is needed to determine in the future for
any specific journal. Fourth, social network analysis is not subject
to the Pajek software we used in this study. Others, such as Ucinet
 & Gephi , are recommended to readers for use in future
studies considering the topic of journal keyword analysis.
It is necessary to apply the compute module in inspecting
whether keywords are within abstracts. The cohesion measure
provides journal editors with a way to examine keywords
accurately within an abstract before reviewing the paper.