39
97
although we could).
# get first matched group
sub(sig_pattern, "nn1", sig_hrefs)
## [1] "https://stat.ethz.ch/mailman/listinfo/r-sig-mac"
## [2] "https://stat.ethz.ch/mailman/listinfo/r-sig-db"
## [3] "https://stat.ethz.ch/mailman/listinfo/r-sig-debian"
## [4] "https://stat.ethz.ch/mailman/listinfo/r-sig-dynamic-models"
## [5] "https://stat.ethz.ch/mailman/listinfo/r-sig-epi"
## [6] "https://stat.ethz.ch/mailman/listinfo/r-sig-ecology"
## [7] "https://stat.ethz.ch/mailman/listinfo/r-sig-fedora"
## [8] "https://stat.ethz.ch/mailman/listinfo/r-sig-finance"
## [9] "https://stat.ethz.ch/mailman/listinfo/r-sig-geo"
## [10] "https://stat.ethz.ch/mailman/listinfo/r-sig-gr"
## [11] "https://stat.ethz.ch/mailman/listinfo/r-sig-gui"
## [12] "https://stat.ethz.ch/mailman/listinfo/r-sig-hpc"
## [13] "https://stat.ethz.ch/mailman/listinfo/r-sig-jobs"
## [14] "https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models"
## [15] "https://stat.ethz.ch/mailman/listinfo/r-sig-mediawiki"
## [16] "https://stat.ethz.ch/mailman/listinfo/r-sig-networks"
## [17] "https://stat.ethz.ch/mailman/listinfo/r-sig-phylo"
## [18] "https://stat.ethz.ch/mailman/listinfo/r-sig-qa"
## [19] "https://stat.ethz.ch/mailman/listinfo/r-sig-robust"
## [20] "https://stat.ethz.ch/mailman/listinfo/r-sig-s"
## [21] "https://stat.ethz.ch/mailman/listinfo/r-sig-teaching"
## [22] "https://stat.ethz.ch/mailman/listinfo/r-sig-wiki"
As you can see, we are using the regex pattern nn1 in the sub() function. Generally speaking
nnN is replaced with the N-th group specied in the regular expression. The rst matched
group is referenced by nn1. In our example, the rst group is everything that is contained in
the curved brackets, that is: (https.*), which are in fact the links we are looking for.
7.4 Text Analysis of BioMed Central Journals
For our last application we will work analyzing some text data. We will analyze the catalog
of journals from the BioMed Central (BMC), a scientic publisher that specializes in
open access journal publication. You can nd more informaiton of BMC at: : http://www.
biomedcentral.com/about/catalog
The datawith the journal catalogis availablein csv format at:http://www.biomedcentral.
com/journals/biomedcentraljournallist.txt
CC BY-NC-SA 3.0 GastonSanchez
Handling and Processing Strings in R