56
Barr´on-Cede˜noetal.
PlagiarismMeetsParaphrasing
2.ParaphraseTypology
Typologies are a precise and efficient way to draw the boundaries of a certain phe-
nomenon,identifyitsdifferentmanifestations,and,inshort,gointoitscharacterization
in depth. Also, typologies constitute the basis of many corpus annotation processes,
which have their own effects on the typologies themselves: The annotation process
tests the adequacy of the typology for the analysis of the data, and allows for the
identificationofnewtypesandtherevisionoftheexistingones.Moreover,anannotated
corpusfollowingatypologyisapowerfulresourceforthedevelopmentandevaluation
ofcomputational linguisticssystems.Inthissection,aftersettingoutabriefstateofthe
art on paraphrase typologies and the weaknesses they present, the typology used for
theannotationoftheP4Pcorpusisdescribed.
Paraphrasetypologieshavebeen addressedin differentfields,includingdiscourse
analysis, linguistics, and computational linguistics, which has originated typologies
that are very different in nature. Typologies coming from discourse analysis classify
paraphrases according to the reformulation mechanisms or communicative intention
behind them (G¨ulich 2003; Cheung 2009), but without focusing on the linguistic
nature of paraphrases themselves, which, in contrast, is our main focus of interest.
From the perspective of linguistic analysis, some typologies are strongly tied to
concrete theoretical frameworks, as the case of Meaning–Text Theory (Mel’ˇcuk 1992;
Mili
´
cevi
´
c2007). In this field, typologies of transformations and diathesis alternations
can be considered indirect approaches to paraphrasing in the sense that they deal
with equivalent expressions (Chomsky 1957; Harris 1957; Levin 1993). They do
not cover paraphrasing as a whole, however, but focus on lexical and syntactic
phenomena.Othertypologiescomefrom linguistics-relatedfields likeediting(Faigley
and Witte 1981), which is interesting in our analysis because it is strongly tied to
paraphrasing.
Anumber of paraphrasetypologies havebeen builtfrom the perspectiveofcom-
putational linguistics. Some of these typologies are simple lists of paraphrase types
usefulforaspecificsystemorapplication,orthemostcommontypesfoundinacorpus.
Theyarespecific-workorientedandfarfrombeingcomprehensive:Barzilay,McKeown,
and Elhadad (1999), Dorr et al. (2004), and Dutrey et al. (2011), among others. Other
typologies classify paraphrases in a very generic way, setting out only two or three
types(Barzilay2003;Shimohata2004);theseclassificationsdonotreachthecategoryof
typologies sensu stricto.Finally,therearemorecomprehensivetypologies,suchasthe
onesbyDras(1999),Fujita(2005),andBhagat(2009).Theyusuallytaketheshapeofvery
fine-grained lists of paraphrase types grouped into bigger classes following different
criteria. They generally focus on these lists of specific paraphrase mechanisms,which
will alwaysbeendless.
Our paraphrase typology is based on the paraphrase concept defined in
Recasens and Vila (2010) and Vila, Mart´ı, and Rodr´ıguez (2011), and consists of an
upgraded version of the one presented in the latter. Ourparaphrase concept is based
on the idea that paraphrases should have the same or an equivalent propositional
content, that is, the same core meaning. This conception opens the door to para-
phrasessometimesdisregardedintheliterature,mainlyfocusedonlexicalandsyntactic
mechanisms.
The paraphrase typology attempts to capture the general linguistic phenomena
of paraphrasing, rather than presenting a long, fine-grained, and inevitably incom-
plete list of concrete mechanisms. In this sense, it also attempts to be comprehen-
sive of paraphrasing as a whole: It was contrasted with, and sometimes inspired by,
3