OpenBabelDocumentation,Release2.3.1
MOL_00000083
Tanimoto from MOL_00000067 = = 0.810811
MOL_00000105
Tanimoto from MOL_00000067 = = 0.833333
MOL_00000296
Tanimoto from MOL_00000067 = = 0.425926
MOL_00000320
Tanimoto from MOL_00000067 = = 0.534884
MOL_00000328
Tanimoto from MOL_00000067 = = 0.511111
MOL_00000338
Tanimoto from MOL_00000067 = = 0.522727
MOL_00000354
Tanimoto from MOL_00000067 = = 0.534884
MOL_00000378
Tanimoto from MOL_00000067 = = 0.489362
MOL_00000391
Tanimoto from MOL_00000067 = = 0.489362
10 molecules s converted
Largedatasets
Onlargerdatasetsitisnecessarytofirstbuildafastsearchindex.Thisisanewfilethatstoresadatabaseoffingerprints
forthefilesindexed.Youwillstillneedtokeepboththenew.fsfastsearchindexandtheoriginalfiles. However,the
newindexwillallowsignificantlyfastersearchingandsimilaritycomparisons.Theindexiscreatedwiththefollowing
command:
babel mymols.sdf -ofs
Thisbuildsmymols.fswiththedefaultfingerprint(unfolded).Thefollowingcommandusestheindextofindthe5
mostsimilarmoleculestothemoleculeinquery.mol:
babel mymols.fs s results.sdf f -squery.mol -at5
ortogetthematcheswithTanimoto>0.6to1,2-dicyanobenzene:
babel mymols.fs s results.sdf f -sN#Cc1ccccc1C#N N -at0.6
5.1.2 Substructuresearching
Smalldatasets
Thiscommandwillfindallmoleculescontaining1,2-dicyanobenzeneandreturntheresultsasSMILESstrings:
babel mymols.sdf -sN#Cc1ccccc1C#N results.smi
Ifallyouwantoutputarethemoleculenamesthenadding-xtwillreturnjustthemoleculenames:
babel mymols.sdf -sN#Cc1ccccc1C#N results.smi -xt
Theparameterofthe-soptionintheseexamplesisactuallySMARTS,whichallowsarichermatchingspecification,
ifrequired.Itdoesmeanthatthearomaticityofatomsandbondsissignificant;use[#6]ratherthanCtomatchboth
aliphaticandaromaticcarbon.
The-soption’sparametercanalsobeafilenamewithanextension.Thefilemustcontainamolecule,whichmeans
onlysubstructurematchingispossible(ratherthanfullSMARTS).Thematchingisalsoslightlymorerelaxedwith
respecttoaromaticity.
Largedatasets
Firstofall,youneedtocreateafastsearchindex(seeabove).Theindexiscreatedwiththefollowingcommand:
babel mymols.sdf -ofs
5.1. Fingerprintformat
35
Pdf split file - Split, seperate PDF into multiple files in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Explain How to Split PDF Document in Visual C#.NET Application
cannot select text in pdf; pdf split and merge
Pdf split file - VB.NET PDF File Split Library: Split, seperate PDF into multiple files in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET PDF Document Splitter Control to Disassemble PDF Document
break a pdf apart; split pdf into individual pages
OpenBabelDocumentation,Release2.3.1
Substructuresearchingisasforsmalldatasets,exceptthatthefastsearchindexisusedinsteadoftheoriginalfile.This
commandwillfindallmoleculescontaining1,2-dicyanobenzeneandreturntheresultsasSMILESstrings:
babel mymols.fs s -ifs -sN#Cc1ccccc1C#N results.smi
Ifallyouwantoutputarethemoleculenamesthenadding-xtwillreturnjustthemoleculenames:
babel mymols.fs s -ifs -sN#Cc1ccccc1C#N results.smi -xt
5.1.3 Casestudy:SearchChEMBLdb
Thiscasestudyusesacombinationofthetechniquesdescribedaboveforsimilaritysearchingusinglargedatabasesand
usingsmalldatabases.Notethatweareusingthedefaultfingerprintforalloftheseanalyses.Thedefaultfingerprint
isFP2,apath-basedfingerprint(somewhatsimilartotheDaylightfingerprints).
1. DownloadVersion2ofChEMBLdbfromftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/.
2. Afterunzippingit,makeafastsearchindex(thistook18minutesonmymachineforthe500K+molecules):
babel chembl_02.sdf f -ofs
3. Let’s s use e the e first t molecule e in n the e sdf file as a query. . Using g Notepad (or r on Linux, head d -79
chembl_02.sdf)extractthefirstmoleculeandsaveitas first.sdf. . Note e thatthemoleculesinthe
ChEMBLsdfdonothavetitles;instead,theirIDsarestoredinthe“chebi_id”propertyfield.
4. Thisfirstmoleculeis100183. . CheckitsChEMBLpage. It’sprettyweird, , butisthereanythingsimilarin
ChEMBLdb?Let’sfindthe5mostsimilarmolecules:
babel chembl_02.fs mostsim.sdf f -s s first.sdf f -at5
5. Theresultsarestoredinmostsim.sdf,buthowsimilararethesemoleculestothequery?:
babel first.sdf mostsim.sdf -ofpt
>
>
Tanimoto from first mol = 1
Possible superstructure e of first mol
>
Tanimoto from first mol = 0.986301
>
Tanimoto from first mol = 0.924051
Possible superstructure e of first mol
>
Tanimoto from first mol = 0.869048
Possible superstructure e of first mol
>
Tanimoto from first mol = 0.857143
6 molecules s converted
76 audit log g messages
6. That’sallverywell,butitwouldbenicetoshowtheChEBIIDs. . Let’ssetthetitlefieldofmostsim.sdfto
thecontentofthe“chebi_id”propertyfield,andrepeatstep5:
babel mostsim.sdf mostsim_withtitle.sdf --append d "chebi_id"
babel first.sdf mostsim_withtitle.sdf f -ofpt
>
>100183
Tanimoto from first mol = 1
Possible superstructure e of first mol
>124893
Tanimoto from first mol = 0.986301
>206983
Tanimoto from first mol = 0.924051
Possible superstructure e of first mol
>207022
Tanimoto from first mol = 0.869048
Possible superstructure e of first mol
>607087
Tanimoto from first mol = 0.857143
36
Chapter5. Molecularfingerprintsandsimilaritysearching
Online Split PDF file. Best free online split PDF tool.
Split PDF file. Just upload your file by clicking on the blue button or drag-and-drop your PDF file into the drop area. Then set your PDF file split settings.
break a pdf file into parts; break password on pdf
VB.NET PDF File Compress Library: Compress reduce PDF size in vb.
Also able to uncompress PDF file in VB.NET programs. Offer flexible and royalty-free developing library license for VB.NET programmers to compress PDF file.
pdf print error no pages selected; break pdf into multiple pages
OpenBabelDocumentation,Release2.3.1
6 molecules s converted
76 audit log g messages
7. HerearetheChEMBLpagesforthesemolecules: 100183,124893206983207022,607087. Ithinkitis
fairtosaythattheyareprettysimilar. Inparticular, , theoutputstatesthat206983and207022arepossible
superstructuresofthequerymolecule,andthatisindeedtrue.
8. Howmanyofthemoleculesinthedatasetaresuperstructuresofthemoleculeinfirst.sdf?Todothisand
tovisualizethelargenumbersofmoleculesproduced,wecanoutputtoSVGformat(seeSVG2Ddepiction
(svg)):
obabel chembl_02.fs
-O out.svg
-s first.sdf
Notethatobabelhasbeenusedherebecauseofitsmoreflexibleoptionhandling.
Thiscommanddoesasubstructuresearchandputsthe47matchingstructures inthe fileout.svg.
ThiscanbeviewedinabrowserlikeFirefox,OperaorChrome(butnotInternetExplorer). Thedisplay
willgiveanoverallimpressionofthesetofmoleculesbutdetailscanbeseenbyzoominginwiththe
mousewheelandpanningbydraggingwithamousebuttondepressed.
9. Thesubstructurethatisbeingmatchedcanbehighlightedintheoutputmoleculesbyaddinganotherparameter
tothe-soption. Justforvariety,thedisplayisalsochangedtoablackbackground,‘uncolored’(noelement-
specificcoloring), andterminalcarbonnotshownexplicitly. . (Justrefreshyourbrowsertoseethemodified
display.)
obabel chembl_02.fs
-O out.svg
-s first.sdf green
-xb -xu u -xC
Thishighlightingoptionalsoworkswhenthe-soptionisusedwithoutfastsearchonsmalldatasets.
10. Thesubstructuresearchherehastwostages.Theindexedfingerprintsearchquicklyproduces62matchesfrom
the500K+moleculesinthedataset.Eachoftheseisthencheckedbyaslowdetailedisomorphismcheck.There
are15falsepositivesfromthefingerprintstage.Theseareofnosignificance,butyoucanseethemusing:
obabel chembl_02.fs
-O out.svg
-s ~first.sdf
Thefingerprintsearchisunaffectedbuttheselectioninthesecondstageisinverted.
5.2 Spectrophores™
5.2.1 Introduction
Spectrophores
1
areone-dimensionaldescriptorsgeneratedfromthepropertyfieldssurroundingthemolecules. This
technologyallowstheaccuratedescriptionofmoleculesintermsoftheirsurfacepropertiesorfields.Comparisonof
molecules’propertyfieldsprovidesarobuststructure-independentmethodofaligningactivesfromdifferentchemical
classes. Whenappliedtomoleculessuchasligandsanddrugs,Spectrophorescanbeusedaspowerfulmolecular
descriptorsinthefieldsofchemoinformatics,virtualscreening,andQSARmodeling.
SpectrophoreisaregisteredtrademarkofSilicosNV.
5.2. Spectrophores™
37
C# PDF File & Page Process Library SDK for C#.net, ASP.NET, MVC
Well-designed APIs are provided. Splitting PDF File. If you want to split PDF file into two or small files, you may refer to this online guide.
cannot print pdf no pages selected; reader split pdf
VB.NET PDF File Merge Library: Merge, append PDF files in vb.net
Professional VB.NET PDF file merging SDK support Visual Studio .NET. Merge PDF without size limitation. Append one PDF file to the end of another one in VB.NET.
break apart a pdf file; break password pdf
OpenBabelDocumentation,Release2.3.1
CommercialSupportforSpectrophores
CommercialsupportforSpectrophoresisavailablefromSilicosNV,thedevelopersoftheSpectrophoretechnol-
ogy.
Silicosisafee-for-servicecompanyempoweringopensourcechemo-informaticsvirtualscreeningtechnologies
forthediscoveryofnovelleadcompoundsanddatabasecharacterization.Silicosfullyendorsestheconceptof
openinnovationandopensourcesoftwaredevelopment,andprovidesitsclientswithawidevarietyofcomputa-
tionalchemistry-basedleaddiscoveryservices,includingOpenBabelsupport,trainingandcodedevelopment.
PleasevisitSilicosformoredetails.
ThecomputationofSpectrophoresisindependentofthepositionandorientationofthemoleculeandthisenables
easyandfastcomparisonofSpectrophoresbetweendifferentmolecules.Moleculeshavingsimilarthree-dimensional
propertiesandshapesalwaysyieldsimilarSpectrophores. ASpectrophoreiscalculatedbysurroundingthethree-
dimensionalconformationofthemoleculebyathree-dimensionalarrangementofpoints,followedbycalculatingthe
interactionbetweeneachoftheatompropertiesandthesurroundingthepoints.Thethree-dimensionalarrangementof
thepointssurroundingthemoleculecanberegardedasan‘artificial’cageorreceptor,andtheinteractioncalculated
betweenthemoleculeandthecagecanberegardedasanartificialrepresentationofanaffinityvaluebetweenmolecule
andcage.Becausethecalculatedinteractionisdependentontherelativeorientationofthemoleculewithinthecage,
themoleculeisrotatedindiscreteanglesandthemostfavorableinteractionvalueiskeptasfinalresult. Theangular
stepsizeatwhichthemoleculeisrotatedalongitsthreeaxiscanbespecifiedbytheuserandinfluencestheaccuracy
ofthemethod.
TheSpectrophorecodewasdevelopedbySilicosNV,anddonatedtotheOpenBabelprojectinJuly2010(seesidebar
forinformationoncommercialsupport). Spectrophorescanbegeneratedeitherusingthecommand-lineapplication
obspectrophore(seenextsection)orthroughtheAPI(OBSpectrophore,asdescribedintheAPIdocumentation).
5.2.2 obspectrophore
Usage
obspectrophore -i i <input t file> [options]
Parameterdetails
-i<inputfile>
Specifytheinputfile
Spectrophoreswillbecalculatedforeachmoleculeintheinputfile.Thefiletype
isautomaticallydetectedfromthefileextension.
-n<type>
Thetypeofnormalizationthatshouldbeperformed
Validvaluesare(withoutquotes):
• No(default)
• ZeroMean
• UnitStd
• ZeroMeanAndUnitStd
-a<accuracy>
Therequiredaccuracyexpressedastheangularstepsize
Onlythefollowingdiscretevaluesareallowed:1,2,5,10,15,20(default),30,
36,45,60
38
Chapter5. Molecularfingerprintsandsimilaritysearching
C# PDF File Compress Library: Compress reduce PDF size in C#.net
Reduce image resources: Since images are usually or large size, images size reducing can help to reduce PDF file size effectively.
break a pdf into smaller files; add page break to pdf
C# PDF File Merge Library: Merge, append PDF files in C#.net, ASP.
Professional C#.NET PDF SDK for merging PDF file merging in Visual Studio .NET. Append one PDF file to the end of another and save to a single PDF file.
can't select text in pdf file; c# print pdf to specific printer
OpenBabelDocumentation,Release2.3.1
-s<type>
Thekindofcagesthatshouldbeused
Thecagetypeisspecifiedintermsoftheunderlyingpointgroup:P1orP-1.Valid
valuesare(withoutquotes):
• No(default)
• Unique
• Mirror
• All
-r<resolution>
Therequiredresolutionexpressedasarealpositivenumber
Thedefaultvalueis3.0Angstrom. Negativevaluesoravalueof0generatesan
errormessage.
-h
Displayshelp
5.2.3 Implementation
Atomicproperties
ThecalculationofaSpectrophore™startsbycalculatingtheatomiccontributionsofeachpropertyfromwhichone
wantstocalculateaSpectrophore. Inthecurrentimplementation,fouratomicpropertiesareconvertedintoaSpec-
trophore;thesefourpropertiesincludetheatomicpartialcharges,theatomiclipophilicities,theatomicshapedevia-
tionsandtheatomicelectrophilicities.Theatomicpartialchargesandatomicelectrophilicitypropertiesarecalculated
usingtheelectronegativityequalisationmethod(EEM)asdescribedbyBultinckandcoworkers[bll2002][blc2003].
Atomiclipophilicpotentialparametersarecalculatedusingarule-basedmethod.Finally,theatomicshapedeviation
isgeneratedbycalculating,foreachatom,theatom’sdeviationfromtheaveragemolecularradius. Thisisdoneina
fourstepprocess:
• Themolecularcenterofgeometry(COG)iscalculated
• ThedistancesbetweeneachatomandthemolecularCOGarecalculated
• Theaveragemolecularradiusiscalculatedbyaveragingalltheatomicdistances
• ThedistancesbetweeneachatomandtheCOGarethendividedbytheaveragemolecularradiusandcentered
onzero
Interactionbetweentheatomsandcagepoints
Followingthecalculationofallrequiredatomicproperties,thenextstepinthecalculationofaSpectrophoreconsists
ofdeterminingthetotalinteractionvalueV(c,p)betweeneachoftheatomiccontributionsofpropertypwithasetof
interactionpointsonanartificialcagecsurroundingthemolecularconformation.
Forthispurpose,eachoftheseinteractionpointsioncagecisassignedavalueP(c,i)whichiseither+1or-1,withthe
constraintthatthesumofallinteractionpointsonaparticularcageshouldbezero.InatypicalSpectrophorecalcula-
tion,acageisrepresentedasarectangularboxencompassingthemolecularconformationinallthreedimensions,with
thecentersoftheboxedgesbeingtheinteractionpoints.Suchaconfigurationgivestwelveinteractionpointspercage,
and,inthecaseofanon-stereospecificdistributionoftheinteractionpoints,leadsto12differentcages. Although
therearenoparticularrequirementsastothedimensionsoftherectangularcage,thedistancebetweentheinteraction
pointsandthegeometricalextremesofthemoleculeshouldbesuchthatameaningfulinteractionvaluebetweeneach
cagepointandthemolecularentitycanbecalculated.Inthisrespect,thedefaultdimensionsofthecageareconstantly
adjustedtoenclosethemoleculeataminimumdistanceof3Aalongalldimensions.Thiscagesizecanbemodified
bytheuserandinfluencestheresolutionoftheSpectrophore.
5.2. Spectrophores™
39
C# Word - Split Word Document in C#.NET
C# DLLs: Split Word File. Add references: RasterEdge.Imaging.Basic.dll. using RasterEdge.XDoc.Word; Split Word file into two files in C#.
split pdf by bookmark; break a pdf into parts
C# PowerPoint - Split PowerPoint Document in C#.NET
File: Split PowerPoint Document. |. Home ›› XDoc.PowerPoint ›› C# PowerPoint: Split PowerPoint Document. Split PowerPoint file into two files in C#.
pdf rotate single page; cannot select text in pdf file
OpenBabelDocumentation,Release2.3.1
Figure5.1:Schematicrepresentationofamoleculesurroundedbytheartificalcage
ThetotalinteractionvalueV(c,p)betweentheatomiccontributionvaluesA(j,p)ofpropertypforagivenmolecular
conformationandthecageinteractionvaluesP(c,i)foragivencageciscalculatedaccordingastandardinteraction
energyequation. IttakesintoaccounttheEuclideandistancebetweeneachatomandeachcagepoint. Thistotal
interactionV(c,p)foragivenpropertypandcagecforagivenmolecularconformationisminimizedbysampling
themolecularorientationalongthethreeaxisinangularstepsandthecalculationoftheinteractionvalueforeach
orientationwithinthecage.
ThefinaltotalinteractionV(c,p)foragivencagecandpropertypcorrespondstothelowestinteractionvalueobtained
this way, andcorresponds tothec’thvalueinthe one-dimensionalSpectrophore vectorcalculatedformolecular
propertyp. Asaresult, , aSpectrophoreisorganizedasavectorofminimizedinteractionvaluesV,eachofthese
organizedinorderofcagesandpropertyvalues. SinceforatypicalSpectrophoreimplementationtwelvedifferent
cagesareused,thetotallengthofaSpectrophorevectorequalsto12timesthenumberofproperties. Sincefour
differentpropertiesareusedinthecurrentimplementation(electrostatic,lipophilic,electrophilicpotentials,andan
additionalshapeindexasdescribedbefore),thisleadstoatotalSpectrophorelengthof48realvaluespermolecular
conformation.
SinceSpectrophoredescriptorsaredependentontheactualthree-dimensionalconformationofthemolecule,atypical
analysisincludesthecalculationofSpectrophoresfromareasonablesetofdifferentconformations. Itisthenupto
theusertodecideonthemostoptimalstrategyforprocessingthedifferentSpectrophorevectors. Inatypicalvirtual
screeningapplication,calculatingtheaverageSpectrophorevectorfromallconformationsofasinglemoleculemay
beagoodstrategy;otherapplicationshavebenefitfromcalculatingaweightedaverageortheminimalvalues. For
eachmoleculeintheinputfile,aSpectrophoreiscalculatedandprintedtostandardoutputasavectorof48numbers
(inthecaseofanon-stereospecificSpectrophore.The48doublesareorganisedinto4setsof12doubleseach:
• numbers01-11:Spectrophorevaluescalculatedfromtheatomicpartialcharges;
• numbers13-24:Spectrophorevaluescalculatedfromtheatomiclipophilicityproperties;
• numbers25-36:Spectrophorevaluescalculatedfromtheatomicshapedeviations;
• numbers37-48:Spectrophorevaluescalculatedfromtheatomicelectrophilicityproperties;
40
Chapter5. Molecularfingerprintsandsimilaritysearching
OpenBabelDocumentation,Release2.3.1
5.2.4 ChoiceofParameters
Accuracy
Asalreadymentioned,thetotalinteractionbetweencageandmoleculeforagivenpropertyisminimizedbysampling
themolecularorientationinangularstepsofacertainmagnitude.Asatypicalangularstepsize,20degreeswasfound
tobethebestcompromisebetweenaccuracyandcomputerspeed. Largerstepssizesarefastertocalculatebuthave
theriskofmissingtheglobalinteractionenergyminimum,whilesmallerangularstepssizesdosampletherotational
spacemorethoroughlybutatasignificantcomputationalcost.Theaccuracycanbespecifiedbytheuserusingthe-a
option.
Resolution
Spectrophorescaptureinformationaboutthepropertyfieldssurroundingthemolecule,andtheamountofdetailthat
needstobecapturedcanberegulatedbytheuser.Thisisdonebyalteringtheminimaldistancebetweenthemolecule
andthesurroundingcage.Theresolutioncanbespecifiedbytheuserwiththe-roption.Thedefaultdistancealong
alldimensionsis3.0Angstrom.Thelargerthedistance,thelowertheresolution.
Withahigherresolution, moredetailsofthepropertyfieldssurroundingthemoleculearecontainedbytheSpec-
trophore.Ontheotherhand,lowresolutionsettingsmayleadtoamoregeneralrepresentationofthepropertyfields,
withlittleornoemphasisonsmalllocalvariationswithinthefields.Usingalowresolutioncanbethemethodofchoice
duringtheinitialvirtualscreeningexperimentsinordertogetaninitial,butnotsodiscriminative,firstselection.This
initialselectioncanthenfurtherberefinedduringsubsequentvirtualscreeningstepsusingahigherresolution.Inthis
setting,smalllocaldifferencesinthefieldsbetweenpairsofmoleculeswillbepickedupmuchmoreeasily.
TheabsolutevaluesoftheindividualSpectrophoredatapointsaredependentontheusedresolution.Lowresolution
valuesleadtosmallvaluesofthecalculatedindividualSpectrophoredatapoints,whilehighresolutionswillleadto
largerdatavalues.ItisthereforeonlymeaningfultocompareonlySpectrophoresthathavebeengeneratedusingthe
sameresolutionsettingsoraftersomekindofnormalizationisperformed.Computationtimeisnotinfluencedbythe
specifiedresolutionandhenceisidenticalforalldifferentresolutionsettings.
Stereospecificity
SomeofthecagesthatareusedtocalculatedSpectrophoreshaveastereospecificdistributionoftheinteractionpoints.
Theresultinginteractionvaluesresultingfromthesecagesarethereforesensitivetotheenantiomericconfiguration
ofthemoleculewithinthecage. Thefactthatbothstereoselectiveaswellasstereonon-selectivecagescanbeused
makesitpossibletoincludeorexcludestereospecificityinthevirtualscreeningsearch. Dependingonthedesired
output,thestereospecificityofSpectrophorescanbespecifiedbytheuserusingthe-soption:
• Nostereospecificity(default): Spectrophoresaregeneratedusingcagesthatarenotstereospecific. Formost
applications,theseSpectrophoreswillsuffice.
• Uniquestereospecificity: Spectrophoresaregeneratedusinguniquestereospecificcages.
• Mirrorstereospecificity: MirrorstereospecificSpectrophores s are Spectrophores s resulting from m the mirror
enantiomericformoftheinputmolecules.
ThedifferencesbetweenthecorrespondingdatapointsofuniqueandmirrorstereospecificSpectrophoresarevery
smallandrequireverylongcalculationtimestoobtainasufficientlyhighqualitylevel. Thisincreasedqualitylevel
istriggeredbytheaccuracysettingandwillresultincalculationtimesbeingincreasedbyatleastafactorof100.
Asaconsequence, itisrecommendedtoapplythisincreasedaccuracyonlyincombinationwithalimitednumber
ofmolecules,andwhenthesmalldifferencesbetweenthestereospecificSpectrophoresarereallycritical. However,
forthevastmajorityofvirtualscreeningapplications, thisincreasedaccuracyisnotrequiredaslongas itisnot
theintentiontodrawconclusionsaboutdifferencesintheunderlyingmolecularstereoselectivity. Non-stereospecific
Spectrophoreswillthereforesufficeformostapplications.
5.2. Spectrophores™
41
OpenBabelDocumentation,Release2.3.1
Normalisation
ItmaysometimesbedesiredtofocusontherelativedifferencesbetweentheSpectrophoredatapointsratherthan
focussingontheabsolutedifferences. Inthesecases,normalizationofSpectrophoresmayberequired. . Thecurrent
implementationofferswiththe-noptionthepossibilitytonormalizeinfourdifferentways:
• Nonormalization(default)
• Normalizationtowardszeromean
• Normalizationtowardsstandarddeviation
• Normalizationtowardszeromeanandunitstandarddeviation
Inallthesecases,normalizationisperformedona‘per-property’basis,whichmeansthatthedatapointsbelonging
tothesamepropertysetaretreatedasasinglesetandthatnormalizationisonlyperformedonthedatapointswithin
eachofthesesetsandnotacrossalldatapoints.
Normalizationmaybeimportant whencomparingthe Spectrophores ofchargedmolecules withthose ofneutral
molecules. Formoleculescarryingaglobalpositivecharge, , theresultingSpectrophoredatapointsofthecharge
andelectrophilicitypropertieswillbothbeshiftedinabsolutevaluecomparedtothecorrespondingdatapointsof
therespectiveneutralspecies. NormalizationoftheSpectrophoresremovestheoriginalmagnitudedifferencesfor
thedatapointscorrespondingtothechargeandelectrophilicitypropertiesofchargedandneutralspecies.Therefore,
iftheemphasisofthevirtualscreeningconsistsoftheidentificationofmoleculeswithsimilarpropertyfieldswith-
outtakingintoaccountdifferencesinabsolutecharge,thenSpectrophoresshouldbenormalizedtowardszeromean.
However,ifabsolutechargedifferencesshouldbetakenintoaccounttodifferentiatebetweenmolecules,unnormalized
Spectrophoresarerecommended.
42
Chapter5. Molecularfingerprintsandsimilaritysearching
Chapter
6
obabelvsChemistryToolkitRosetta
TheChemistryToolkitRosettaisthebrainchildofAndrewDalke. Itisawebsitethatillustrateshowtoprogram
variouschemicaltoolkitstodoasetoftasks.Tomakeiteasilyunderstandable,thesetasksareprobablyonthesimpler
sideofthoseintherealworld.TheRosettaalreadycontainsseveralexamplesofusingtheOpenBabelPythonbindings
tocarryouttasks.
Herewefocus ontheuseofthecommandlineapplicationobabeltoaccomplishthetaskslistedintheRosetta.
Inevitablywewillstrugglewithmorecomplicatedtasks;howeverthissectionisintendedtoshowhowfaryoucango
simplyusingobabel,andtoillustratesomeofitslesscommonfeatures.Someofthetaskscannotbedoneexactlyas
specified,buttheyareareusuallycloseenoughtouseful.
Notethatexceptfortheexamplesinvolvingpiping,theGUIcouldalsobeused.Alsothecopyoutputformatatpresent
worksonlyforfileswithUnixlineendings.
6.1 HeavyatomcountsfromanSDfile
Foreachrecordfromthebenzodiazepinefile,printthetotalnumberofheavyatomsineachrecord(that
is,excludehydrogens).Theoutputisoneoutputlineperrecord,containingthecountasaninteger.Ifat
allpossible,showhowtoreaddirectlyfromthegzip’edinputSDfile.
obabel benzodiazepine.sdf.gz -otxt --title "" --append d atoms -d d -l5
Thetxtformatoutputsonlythetitlebutwesetthattonothingandthenappendtheresult.Theatomsdescriptorcounts
thenumberofatomsafterthe-doptionhasremovedthehydrogens.The-l5limitstheoutputtothefirst5molecules,
incaseyoureallydidn’twanttoprintoutresultsforall12386molecules.
6.2 ConvertaSMILESstringtocanonicalSMILES
ParsetwoSMILESstringsandconvertthemtocanonicalform.Checkthattheresultsgivethesamestring.
obabel -:"CN2C(=O)N(C)C(=O)C1=C2N=CN1C" -:"CN1C=NC2=C1C(=O)N(C)C(=O)N2C" " -ocan
giving:
Cn1cnc2c1c(=O)n(C)c(=O)n2C
Cn1cnc2c1c(=O)n(C)c(=O)n2C
2 molecules converted
43
OpenBabelDocumentation,Release2.3.1
6.3 ReporthowmanySDfilerecordsarewithinacertainmolecular
weightrange
Readthebenzodiazepinefileandreportthenumberofrecordswhichcontainamolecularweightbetween
300and400.
obabel benzodiazepine.sdf.gz -onul --filter r "MW>=300 0 MW<=400"
3916 molecules s converted
6.4 ConvertSMILESfiletoSDfile
ConvertaSMILESfileintoanSDfile. TheconversionmustdoitsbesttousetheMDLconventionsfor
theSDfile,includingaromaticityperception. NotethattheuseofaromaticbondtypesinCTABsisonly
allowedforqueries,soaromaticstructuresmustbewritteninaKekuleform.Becausethestereochemistry
ofmoleculesinSDfilesisdefinedsolelybythearrangementofatoms,itisnecessarytoassigneither2D
or3Dcoordinatestothemoleculebeforegeneratingoutput.Thecoordinatesdonothavetobereasonable
(i.e.it’sokiftheywouldmakeachemistscreaminhorror),solongastheresultingstructureischemically
correct.
obabel infile.smi -O O outfile.sdf --gen3D
6.5 Reportthesimilaritybetweentwostructures
Reportthesimilaritybetween“CC(C)C=CCCCCC(=O)NCc1ccc(c(c1)OC)O”(PubChemCID1548943)
and“COC1=C(C=CC(=C1)C=O)O”(PubChemCID1183).
Twotypesoffingerprintareused:thedefaultFP2path-basedone,andFP4whichisstructurekeybased:
obabel -:"CC(C)C=CCCCCC(=O)NCc1ccc(c(c1)OC)O" -:"COC1=C(C=CC(=C1)C=O)O" -ofpt
Tanimoto from m first mol = = 0.360465
obabel -:"CC(C)C=CCCCCC(=O)NCc1ccc(c(c1)OC)O" -:"COC1=C(C=CC(=C1)C=O)O" -ofpt
-xfFP4
Tanimoto from m first mol = = 0.277778
6.6 Findthe10nearestneighborsinadataset
Thedatawillcomefromthegzip’edSDfileofthebenzodiazepinedataset. Usethefirststructureas
thequerystructure,andusetherestofthefileasthetargetstofindthe10mostsimilarstructures. The
outputissortedbysimilarity,frommostsimilartoleast.Eachtargetmatchisonitsownline,andtheline
containsthesimilarityscoreinthefirstcolumnintherange0.00to1.00(preferablyto2decimalplaces),
thenaspace,thenthetargetID,whichisthetitlelinefromtheSDfile.
Afastsearchindex,usingthedefaultFP2fingerprint,ispreparedfirst:
obabel benzodiazepine.sdf f -ofs
Thequerymolecule(firstinthefile)isextracted:
obabel benzodiazepine.sdf f -O O first.sdf f -l1
44
Chapter6. obabelvsChemistryToolkitRosetta
Documents you may be interested
Documents you may be interested