CreatinganOnlineDictionaryofAbbreviations
fromMEDLINE
JeffreyT.Chang
StanfordBiomedicalInformatics
jchang@smi.stanford.edu
HinrichSch¨utze,Ph.D.
NovationBiosciences
hinrich@novationbio.com
RussB.Altman,M.D.,Ph.D.
1
StanfordBiomedicalInformatics
StanfordSchoolofMedicine
MedicalSchoolOfficeBuilding,X-215
251CampusDr.
Stanford,CA94305
(650)725-3394
russ.altman@stanford.edu
1
Towhomcorrespondenceshouldbeaddressed
Pdf file size - Compress reduce PDF size in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
C# Code & .NET API to Compress & Decompress PDF Document
change font size pdf comment box; reader shrink pdf
Pdf file size - VB.NET PDF File Compress Library: Compress reduce PDF size in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET PDF Document Compression and Decompression Control SDK
advanced pdf compressor online; advanced pdf compressor
Abstract
Theimmensevolumeandrapidgrowthofbiomedicalliteraturepresentspecialchallenges
forhumansaswellascomputerprogramsanalyzingit.Onesuchchallengecomesfromthe
commonuseofabbreviationsthateffectivelyaugmentsthesizeofthevocabularyforthefield.To
copewiththis,wehavedevelopedanalgorithmtoidentifyabbreviationsintext.Itusesa
statisticallearningalgorithmlogisticregressiontoscoreabbreviationsbasedontheirresemblance
topreviouslyidentifiedones,achievingupto84%recallat81%precision.Wethenscannedallof
MEDLINEandfound781,632high-scoringabbreviationdefinitions.Wearemakingthese
availableasapublicabbreviationserverathttp://abbreviation.stanford.edu/.
C# PDF File Split Library: Split, seperate PDF into multiple files
Divide PDF file into multiple files by outputting PDF file size. Split PDF document by PDF bookmark and outlines. Split PDF file by output file size.
pdf reduce file size; pdf form change font size
VB.NET PDF File Split Library: Split, seperate PDF into multiple
Divide PDF file into multiple files by outputting PDF file size. Split Split PDF File by Output File Size Demo Code in VB.NET. This
pdf file compression; change font size fillable pdf
Chang,CreatinganOnlineDictionary... Page3of23pages
1 Introduction
TheamountofliteratureinbiomedicineisexplodingasMEDLINEgrowsby400,000citations
eachyear.Withbiomedicalknowledgeexpandingsoquickly,professionalsmustacquirenew
strategiestocopewithit.Toalleviatethis,thebiomedicalinformaticscommunityisinvestigating
methodstoorganize([1]),summarize([2]),andmine([3])theliterature.
Understandingbiomedicalliteratureisparticularlychallengingduetoitsexpanding
vocabulary,includingtheunfetteredintroductionofnewabbreviations.Anautomaticmethodto
defineabbreviationswouldhelpresearchersbyprovidingaself-updatingabbreviationdictionary
andalsofacilitatecomputeranalysisoftext.
Inthispaper,wedefineabbreviationbroadlytoincludeallstringsthatareshortenedformsof
sequencesofwords(itslongform).Althoughthetermacronymappearsmorecommonlyin
literature,itistypicallydefinedmorestrictlyasaconjunctionoftheinitialletterofwords;some
authorsalsorequirethemtobepronounceable.
Usingsuchastrictdefinitionexcludesmanytypesofabbreviationsthatappearinbiomedical
literature.Authorscreateabbreviationsinmanydifferentwaysassummarizedhere:
Abb. Definition
Description
VDR
vitaminDreceptor
Thelettersaligntothebeginningsofthewords.
PTU
propylthiouracil
Thelettersaligntoasubsetofsyllableboundaries.
JNK
c-JunN-terminalkinase Thelettersaligntopunctuationboundaries.
IFN
interferon
Thelettersaligntosomeotherplace.
SULT
sulfotransferase
The abbreviation contains contiguous characters
fromaword.
ATL
adultT-cellleukemia
Thelongformcontainswordsnotintheabbreviation.
CREB-1
CREbindingprotein
Theabbreviationcontainslettersnotinthelongform.
beta-EP
beta-endorphin
Theabbreviationcontainscompletewords.
Nevertheless,thenumerouslistsofabbreviationscoveringmanydomainsattesttobroad
interestinidentifyingthem.Opaui,awebportalforabbreviations,containslinksto152lists
C# Convert: PDF to Word: How to Convert Adobe PDF to Microsoft
options. UseDefaultPageSize: Determine whether your PDF to Word conversion will use the page size defined in input file. Default: true.
change page size pdf acrobat; adjust pdf page size
C# PDF Convert to Jpeg SDK: Convert PDF to JPEG images in C#.net
JPEG image file, owing to its small-size feature, is counted as a more suitable choice for publishing in web services than PDF document file.
best way to compress pdf file; pdf paper size
Chang,CreatinganOnlineDictionary... Page4of23pages
alone([4]).Somearecompiledbyindividualsorgroups([5],[6]).Othersacceptsubmissions
fromusersovertheinternet([7],[8]).Forthemedicaldomain,amanually-collectedpublished
dictionarycontainsover10,000entries([9]).
Becauseofthebiomedicalliterature
'
ssizeandgrowth,manualcompilationsofabbreviations
sufferfromproblemsofcompletenessandtimeliness.Automatedmethodsforfinding
abbreviationsarethereforeofgreatpotentialvalue.Ingeneral,thesemethodsscantextfor
candidateabbreviationsandthenapplyanalgorithmtomatchthemwiththesurroundingtext.
Mostabbreviationfindersfallintooneofthreetypes.
Thesimplesttypeofalgorithmmatchesanabbreviation
'
sletterstotheinitiallettersofthe
wordsaroundit.Thealgorithmforrecognizingthisisrelativelystraightforward,althoughitmust
performsomespecialprocessingtoignorecommonwords.TaghvagivesanexampleOffice
of Nuclear Waste Isolation (ONWR)wheretheOcanbematchedwiththeinitial
letterofeitherOfficeorof([10]).
Morecomplexmethodsrelaxthefirstletterrequirementandallowmatchestoother
characters.Thesetypicallyuseheuristicstofavormatchesonthefirstletterorsyllable
boundaries,uppercaseletters,lengthofacronym,etc.([11])However,Yeatesnotesthechallenge
infindingoptimalweightsforeachheuristicandfurtherpositsthatmachinelearningapproaches
mayhelp([12]).
Anotherapproachrecognizesthatthealignmentbetweenanabbreviationanditslongform
oftenfollowsasetofpatterns([13],[14]).Thus,asetofcarefullyandmanuallycraftedrules
governingallowedpatternscanrecognizeabbreviations.Furthermore,onecancontrolthe
performanceofthesystembyadjustingthesetofrules,tradingoffbetweentheleniencyinwhich
aruleallowsmatchesandthenumberoferrorsthatitintroduces.
Intheirrule-basedsystem,Pustejovskyetal.introducedaninterestinginnovationby
C# PDF Convert to Tiff SDK: Convert PDF to tiff images in C#.net
DocumentType.DOCX DocumentType.TIFF. zoomValue, The magnification of the original PDF page size. 0.1f
batch reduce pdf file size; pdf page size dimensions
C# PDF Convert to Word SDK: Convert PDF to Word library in C#.net
PDF document, keeps the elements (like images, tables and chats) of original PDF file and maintains the original text style (including font, size, color, links
reader compress pdf; best pdf compressor online
Chang,CreatinganOnlineDictionary... Page5of23pages
includinglexicalinformation([14]).Theirinsightisthatabbreviationsareoftencomposedfrom
nounphrases,andthatconstrainingthesearchtodefinitionsinthenounphrasesclosesttothe
abbreviationwillimproveprecision.Withthesearchconstrained,theyfoundthattheycould
furthertunetheirrulestoalsoimproverecall.
Finally,thereisonecompletelydifferentapproachtoabbreviationsearchbasedon
compression([15]).Theideahereisthatacorrectabbreviationgivesbettercluestothebest
compressionmodelforthesurroundingtextthananincorrectone.Thus,anormalized
compressionratiobuiltfromtheabbreviationgivesascorecapableofdistinguishing
abbreviations.
Inthispaper,wepresenttwocontributions:anovelalgorithmforidentifyingabbreviations,
andapublically-accessibleabbreviationservercontainingallabbreviationdefinitionsfoundin
MEDLINE.
2 Methods
Wedecomposetheabbreviation-findingproblemintofourcomponents:1)scanningtextfor
occurrencesofpossibleabbreviations,2)aligningthecandidatestotheprecedingtext,
3)convertingtheabbreviationsandalignmentsintoafeaturevector,and4)scoringthefeature
vectorusingastatisticalmachinelearningalgorithm(Figure1).
2.1 FindingAbbreviationCandidates
Wesearchedforpossibleabbreviationsinsideparentheses,assumingthattheyfollowedthe
pattern:
C# PDF insert text Library: insert text into PDF content in C#.net
Ability to change text font, color, size and location and output a new PDF document. how to use C#.NET class code to add and insert text to PDF file page.
change page size of pdf document; can pdf files be compressed
VB.NET TWAIN: Specify Size and Location for TWAIN Image Scanning
the size and location for TWAIN image scanning, but also allows you to conduct Console based TWAIN scanning and scan many pages into a single PDF document
change font size pdf document; pdf reduce file size
Chang,CreatinganOnlineDictionary... Page6of23pages
longform(abbreviation)
Foreverypairofparentheses,weretrievedthewordsuptoacommaorsemicolon.We
rejectedcandidateslongerthantwowords,candidateswithoutanyletters,andcandidatesthat
exactlymatchedthewordsintheprecedingtext.
Foreachabbreviationcandidate,wesavedthewordsbeforetheopenparenthesis(theprefix)
sothatwecouldsearchthemfortheabbreviation
'
slongform.Althoughwecouldhaveincluded
everywordfromthebeginningofthesentence,asacomputationaloptimization,weonlyused
Chang,CreatinganOnlineDictionary... Page7of23pages
substrings
Chang,CreatinganOnlineDictionary... Page8of23pages
Chang,CreatinganOnlineDictionary... Page9of23pages
Acromed.ThegoldstandardispublicallyavailableasanXMLfileat
http://www.medstract.org/gold-standards.html.
WeranouralgorithmagainsttheMedstractgoldstandard(aftercorrecting6typographical
errorsintheXMLfile)andgeneratedalistofthepredictedabbreviations,definitions,andtheir
scores.Withthesepredictions,wecalculatedtherecallandprecisionateverypossiblescore
cutoffgeneratingarecall/precisioncurve.Recall
#correctabbreviations
allcorrectabbreviations
(5)
measureshowthoroughlythemethodfindsalltheabbreviations.Precision
#correctabbreviations
allpredictions
(6)
indicatesthenumberoferrorsproduced.
Wecountedanabbreviation/longformpaircorrectifitmatchedthegoldstandardexactly,
consideringonlythehighestscoringpairforeachabbreviation.TobeconsistentwithAcromed
'
s
evaluationonMedstract,weallowedmismatchesin10caseswherethelongformcontained
wordsnotindicatedintheabbreviation.Forexample,weacceptedprotein kinase e Afor
PKAanddidnotrequirethefullcAMP-dependent protein n kinase Aindicatedinthe
goldstandard.
Inaddition,weevaluatedthecoverageofthedatabaseagainstalistofabbreviationsfromthe
ChinaMedicalTribune,aweeklyChineselanguagenewspapercoveringmedicalnewsfrom
Chinesejournals[18].Thewebsiteincludesadictionaryof452commonlyusedEnglishmedical
abbreviationswiththeirlongforms.Wesearchedthedatabasefortheseabbreviations(after
correcting21spellingerrors)andcalculatedtherecallas
Chang,CreatinganOnlineDictionary... Page10of23pages
#longformsidentified
#abbreviations(
Documents you may be interested
Documents you may be interested