count pages in pdf without opening c# : Copy one page of pdf control SDK platform web page wpf windows web browser antocuni-thesis2-part302

Chapter 2
The problem
2.1 Is Python intrinsically slow?
Atthemoment,therearefourmainimplementationsofPythonasalanguage: CPython
isthereferenceimplementation,writteninC;Jythonis writteninJavaandtargetsthe
JVM;IronPythoniswritteninC#andtargets theCLI(i.e.,theVMof.NET);nally,
PyPyiswritteninPythonitselfandtargetsanumberofdierentplatforms,includingC,
JVMand.NET.Inthefollowing,wearediscussingCPython,whennotexplicitlystated
otherwise;mostofconceptsareequallyapplicabletotheotherimplementations,though.
RunningPythoncodeisslow;dependingonthebenchmark used,Pythonprograms can
runupto250timesslowerthanthecorrespondingprogramswritteninC.
ThereareseveralreasonsthatmakePythonsoslow;hereisaroughattempttolistsome
ofwhatwethinkarethemostimportantones:
1. Interpretationoverhead
2. Boxedarithmeticandautomaticover owhandling
3. Dynamicdispatchofoperations
4. Dynamiclookupofmethodsandattributes
5. \Theworldcanchangeunderyourfeet"
6. Extremeintrospectiveandre ectivecapabilities
12
Copy one page of pdf - application control utility:C# PDF Page Extract Library: copy, paste, cut PDF pages in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Easy to Use C# Code to Extract PDF Pages, Copy Pages from One PDF File and Paste into Others
www.rasteredge.com
Copy one page of pdf - application control utility:VB.NET PDF Page Extract Library: copy, paste, cut PDF pages in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
Detailed VB.NET Guide for Extracting Pages from Microsoft PDF Doc
www.rasteredge.com
2.1.1 Interpretationoverhead
In CPython,Python programsare compiledintobytecode which is theninterpretedby
avirtualmachine. The e interpreter consists of a main interpreter loop that t fetches the
nextinstructionfromthebytecodestream,decodesitandthenjumpstotheappropriate
implementation: this is called d instruction dispatching. Compared to o emitting directly
executable machine code, , the e extrawork needed for instruction dispatching adds some
overheadintermsofperformance,whichiscalledinterpretationoverhead.
Theinterpretationoverheadstronglydependsonthenatureofthebytecodelanguage:for
languageswhosebytecodeinstructionsarefasttoexecutethetimespenttodoinstruction
dispatchingisrelativelyhigherthanforlanguageswhoseinstructionstakemoretime.
ForPython,ithasbeenmeasuredthattheinterpretationoverheadcausesroughlya2.5x
slowdowncomparedtoexecutingnativecode[CPC
+
07].
BothJythonandIronPythondonotsuerfromthisproblem,sincetheyemitbytecodefor
theunderlyingvirtualmachine(eitherJVMofCLI),whichisthentranslatedintonative
codebytheJustInTimecompiler.
2.1.2 Boxedarithmeticandautomaticover owhandling
InPythonallvalues,includingnumbers,areobjects;thedefaulttypeforsignedintegersis
int
,whichrepresentsnativeintegers(i.e.,numberswhoselengthisthesameasaWORD,
typically32or64bit).
Moreover, Python provides s also a
long
type,representing arbitrary precisionnumbers;
recentversionsofPythontakecareofswitchingautomaticallyto
long
whenevertheresult
ofanoperationscannottinto
int
.
Themostnaturalwaytoimplementthesefeaturesistoallocatebothvaluesoftype
int
and
long
ontheheap,thusgivingthemthestatusof\objects"toswitchfromonetothe
otherveryeasily.
i = 0
while i < 10000000:
i = i+1
Figure 2.1: : Simpleloop
Unfortunately,thisincursasevereperformancepenalty.Fig-
ure2.1showsasimpleloopdoingonlyarithmeticoperations,
without over ows. Benchmarks show that t this code runs
about40timesslowerthanthecorrespondingprogramwrit-
teninCandcompiledby
gcc
4.4.1withoutoptimizations.
Formoreinformationonthebenchmarkingmachine,seeSec-
tion8.3.
However,mostoftheintegerobjectsusedinPythonprogramsdoesnotescapethefunction
13
application control utility:C# PDF copy, paste image Library: copy, paste, cut PDF images in
This C#.NET example describes how to copy an image from one page of PDF document and paste it into another page. // Define input and output documents.
www.rasteredge.com
application control utility:VB.NET PDF copy, paste image library: copy, paste, cut PDF images
VB.NET: Copy and Paste Image in PDF Page. This VB.NET example shows how to copy an image from one page of PDF document and paste it into another page.
www.rasteredge.com
inwhich they y are used d and never or rarely y over ow, , so they y do not really need d to be
eectivelyrepresentedasobjectsintheheap;asmartimplementationcoulddetectplaces
wherenumberscanbestored\onthestack"andunboxed,andemitecientmachinecode
forthosecases.
Thesamediscussionappliesequallywellalsoto oatingpointnumbers,withthedierence
thatinthiscasewedonotneedtocheckforover owsateveryoperation.
2.1.3 Dynamicdispatchofoperations
# while i < 10000000
9 LOAD_FAST
0 (i)
12 LOAD_CONST
2 (10000000)
15 COMPARE_OP
0 (<)
18 JUMP_IF_FALSE
14 (to 35)
21 POP_TOP
# i = i + 1
22 LOAD_FAST
0 (i)
25 LOAD_CONST
3 (1)
28 BINARY_ADD
29 STORE_FAST
0 (i)
# close the loop
32 JUMP_ABSOLUTE
9
Figure2.2: BytecodefortheloopinFigure2.1
MostoperationsinPythonaredynamically
overloaded, i.e., they can be dynamically
applied to values of f dierent types. Con-
siderforexamplethe+operator:depending
onthetypesofitsarguments,itcaneither
addtwonumbers,concatenatetwostrings,
appendtwolists,orcallsomeuser-dened
method.
Consider againtheloopinFigure 2.1and
anexcerptofitsbytecode,showninFigure
2.2.The
BINARY_ADD
operationcorresponds
tothe
i+1
expression.
Being dynamically typed, the e virtual l ma-
chinedoesnotknowinadvancethetypesof
theoperandsof
BINARY_ADD
,henceithasto
checkthematruntimetoselecttheproper
implementation. This could d lead to o poor
performance, especially y inside a loop: for r the loopinquestion, , thevirtualmachine e al-
waysdispatches
BINARY_ADD
tothesameimplementation.Inthisexample,itisveryclear
thatanecientimplementationcoulddetectthatthereareafastandaslowpath,and
emitecientcodefortheformer.
2.1.4 Dynamiclookupofmethodsandattributes
Pythonisanattributebasedlanguage:theforms
obj.x
and
obj.x = = y
areusedtogetand
setthevalueoftheattribute
x
of
obj
,respectively. Incontrasttootherobjectoriented
language,messagesendingormethodinvocationisnotaseparateoperationthanattribute
access:methodsaresimplyaspecialkindofattributethatcanbecalled.
14
application control utility:VB.NET PDF Page Delete Library: remove PDF pages in vb.net, ASP.
If you are looking for a solution to conveniently delete one page from your PDF document, you can use this VB.NET PDF Library, which supports a variety of PDF
www.rasteredge.com
application control utility:C# PDF File Merge Library: Merge, append PDF files in C#.net, ASP.
C# developers can easily merge and append one PDF document to document imaging toolkit, also offers other advanced PDF document page processing and
www.rasteredge.com
1 # an empty class
2 class MyClass:
3
pass
4
5 # arguments passed to foo can be of any type
6 def foo(target, flag):
7
# the attribute x is set only if flag is True
8
if flag:
9
target.x = 42
10
11 obj = MyClass()
12 # if we pass False, obj would not have any attribute x
13 foo(obj, True)
14 print obj.x
15 print getattr(obj, "x")
# use a string for the e attribute name
Figure 2.3: : Dynamiclookupofattributes
Theprocesstogetthevalueofanattributeiscalledattribute lookup. Thesemanticsof
theattributelookupdiersdependingonthedynamictypeoftheobjectinquestion: for
example,iftheobjectis aninstanceofaclasstheattributewillberstsearchedinthe
object,theninitsclass,theninitssuperclasses.Moreover,classescanoverridethedefault
behaviorbydeningthespecialmethod
__getattribute__
whichis calledwheneveran
attributelookupoccurs.
Sinceitssemanticsdependsonthedynamictypeoftheobjects,inPythonthelookupof
attributes isperformedat run-time. Moreover,thereare e specialbuilt-infunctions that
allowtogetandsetattributesbyspecifyingtheirnameasastring. Forexample,thecode
inFigure2.3isavalidPythonprogramthatprints
42
twice. Notethatthebehaviorof
line14dependsonthevalueofthe agpassedto
foo
: ifwepassed
False
,anexception
wouldberaised.
As usual, if f on the one hand these features allow a a great degree of  exibility, on the
otherhandtheyarevery dicult toimplementeciently;since theset ofmethodsand
attributesisnotstaticallyknown,itisimpossibletouseaxedmemorylayoutforobjects,
asithappensintheimplementationofstaticallytypedobject-orientedlanguagesbasedon
thenotionofvirtualmethodtables. Instead,classesandobjectsareusuallyimplemented
usingadictionary(i.e.ahashtable)thatmapsattributes,representedasstrings,tovalues,
whilstmethodsandinstancevariablesaretypicallymanageddierentlyinstaticallytyped
objectorientedlanguages.
2.1.5 Theworldcanchangeunderyourfeet
15
application control utility:VB.NET PDF File Merge Library: Merge, append PDF files in vb.net
all. This guiding page will help you merge two or more PDF documents into a single one in a Visual Basic .NET imaging application.
www.rasteredge.com
application control utility:C# PDF Image Extract Library: Select, copy, paste PDF images in C#
C#: Select All Images from One PDF Page. C# programming sample for extracting all images from a specific PDF page. // Open a document.
www.rasteredge.com
def fn():
return 42
def hello():
return ’Hello world!’
def change_the_world():
global fn
fn = hello
print fn()
# 42
change_the_world()
print fn()
# ’Hello world!’
Figure2.4: Changingtheglobalstate
InPython, the e globalstate ofthe program can
continuouslyanddynamicallyevolve: almostev-
erythingcanchangeduringtheexecution,includ-
ing for example e the denition n of functions and
classes, or the inheritance relationship between
classes.
Consider,forexample,theprograminFigure2.4:
the functionassociatedto the name
fn
changes
atruntime,withtheresultthattwo subsequent
invocations of f the same name can lead to very
dierentcodepaths.
The same e principle applies to o classes: we have
alreadyseeninsection2.1.4thattheattributesof
anobjectcandynamicallygrowruntime,butthingscanchangeevendeeper: theexample
inFigure2.5demonstrateshowobjectscanchangetheirclassatruntime,evenforunrelated
classes.Notealsothat
my_pet
preservesitsoriginalattributes.
Thus,inPythonitisusuallyunsafetoassumeanythingabouttheoutsideworld,andwe
needto access every single class, function, methodor attribute dynamically,because it
mightnotbewhatusedtobepreviously.
class Dog:
def __init__(self):
self.name = "Fido"
def talk(self):
print "%s: Arf! Arf!" % self.name
class Cat:
def __init__(self):
self.name = "Felix"
def talk(self):
print "%s: Meowww!" % self.name
my_pet = Dog()
my_pet.talk()
#
Fido: Arf! Arf!
my_pet.__class__ = Cat
my_pet.talk()
#
Fido: Meowww!
Figure2.5: Dynamicallychangingtheclassofanobject
16
application control utility:VB.NET PDF Annotate Library: Draw, edit PDF annotation, markups in
to display it. Thus, PDFPage, derived from REPage, is a programming abstraction for representing one PDF page. Annotating Process.
www.rasteredge.com
application control utility:C# PDF Page Delete Library: remove PDF pages in C#.net, ASP.NET
Using RasterEdge Visual C# .NET PDF page deletion component, developers can easily select one or more PDF pages and delete it/them in both .NET web and Windows
www.rasteredge.com
2.1.6 Extremeintrospectiveandre ectivecapabilities
Pythonoersalotofwaystoinspectandmodifyarunningprogram. Forexample,you
canndoutthemethodsofaclass,theattributesofaninstance,thenamesandthevalues
ofalllocalsvariablesdenedinthecurrentscope,andsoon.
def fill_list(name):
# get t the caller frame object
frame = sys._getframe().f_back
# get t the variable "name" in
# the e caller’s context
lst = = frame.f_locals[name]
lst.append(42)
def foo():
mylist = []
fill_list("mylist")
print mylist
# prints [42]
Figure2.6: Introspectionofthestackframe
Moreover,oftenitisalsopossibletomod-
ifythisinformation:itispossibletoadd,
remove or modify methods of a class,
changetheclassofanobject,etc. Since
in CPython eciency y is not considered
acompellingrequirement,allthesefea-
tures are implemented in n a straightfor-
ward way; however, it is very challeng-
ingtondalternativewaystoimplement
themeciently.
In particular,CPythonallows access to
alltheframesthatcomposethecurrent
executionstack: eachframecontainsin-
formationsuchastheinstructionpointer,
the frame of the caller and the local
variables;moreover,undersomecircum-
stancesitispossibletomutatethevalueofthelocalvariablesofthecallers,asshownby
theexampleifFigure2.6
1
.
The function
sys.settrace()
is another example of feature that add overheads to the
overallexecutiontimeofprograms.Itallowstheprogrammertoinstallatracertomonitor
andpossiblyaltertheexecutionoftheprogram: theeventsnotiedtothetracerinclude
entering/leavingafunction,executinganewlineofcode,raisinganexception.Figure2.7
showsanexampleofusing
sys.settrace
,andFigure2.8shoesitsoutput.
Thesefunctionalities areheavily usedbysomeapplications suchas debuggersortesting
frameworks,eventhoughtheirabusein\regularcode"isstronglydiscouraged.
1
Theocialdocumentationsaysthatsys._getframeisaCPythonimplementationdetails.However
itisusedbyrealworldprograms,soitisrequiredtobearealworldalternativetoCPython.
17
1 import sys
2
3 def f test(n):
4
j = 0
5
for i in range(n):
6
j = j + i
7
return j
8
9 def f tracer(frame, event, arg):
10
fname = frame.f_code.co_filename
11
lineno = frame.f_lineno
12
pos = ’%s:%s’ % (fname, lineno)
13
print pos, event
14
return tracer
15
16 sys.settrace(tracer) # install the tracer
17 test(2)
# trace the call
Figure2.7: Exampleofusing
sys.settrace
settrace.py:3 call
settrace.py:4 line
settrace.py:5 line
settrace.py:6 line
settrace.py:5 line
settrace.py:6 line
settrace.py:5 line
settrace.py:7 line
settrace.py:7 return
Figure2.8: Output
2.2 Interpreters vs. compilers: limits of static analy-
sis for Python
Duringtheyears,peopleoftenproposedtomakePythonfasterbyimplementingacompiler,
whichshouldbesupposedly much h faster thananinterpreter. However,ifyoucarefully
readtheprevioussection,itisimmediatelyclearthatanavecompilerisnotenoughtoget
goodperformance.ImplementingaPythoncompilertogetgoodperformanceisextremely
challenging.
SincePythonisdynamicallytyped,astaticcompilercanknowlittleaboutthetypesof
thevariablesinaprogram: hence,thecompilerhaslittlechancestodiscovershortpaths,
thusinevitablyincurringinthesameproblemsdescribedinSection2.1.
Notethataggressivetypeinferencedoesnothelpmuch,sinceunfortunatelyPythonhas
not really been designedwith this goalinmind, , withthe e result that it is hard,ifnot
impossible,todoiteectively: asdescribedinsection2.1.5,themajorityofentitiesina
Pythonprogramcanchangeatruntime:callingfunction
fn()
couldexecutedierentcode
thantheprevioustime,objectsofclass
Cls
cansuddenlygetnewattributesormethods
thattheydidnothavebefore,andsoon.
The net result is s that every time we reference a global l entity, either r directly (e.g. by
callingafunction)orindirectly(e.g. by y invoking amethodonanobject whose classis
known),littlecanbeassumedabouttheresult.Inthepast,therehavebeenfewattempts
18
toperform type inferencefor Python,butthey didnotworkwell[Can05]ortheynever
sawthelight[Sal04].
Inconclusion,astaticcompilerwouldmainlysolvetheproblemoftheinterpretationover-
headdescribedinsection2.1.1: hence,wecanexpectittospeedupprogramsbyafactor
of2-2.5x,notmore. Thus,toenhanceperformancewehavetoemploy y othertechniques
thatgobeyondastaticcompiler.
2.3 Our solution: automatically generated JIT com-
piler
As Section n 1.2 explains, the two major strategies toimplement t dynamic languages are
writingafastinterpreter,andwritingaJITcompiler.
ForthespeciccaseofPython,writingafastinterpreterisnotenough,becauseevenifwe
managetocompletelyremovetheintepretationoverheadwecangainalimitedspeedup,
asdiscussedinsection2.1.1.
Moreover, The e static JIT T compilation techniques s employed by Jython and IronPython
seemsnottogivegoodperformance,astheyrarelyoutperformCPython. Instead,what
weneedisadynamicJITcompiler thatbehavesbetterthanPsycoandsolvesitsmajor
drawbacks.
Inparticular,weseek acompilerthatcanbe easily maintainedandextendedwhenthe
languageismodied;moreover,itshouldbeeasytoportsuchacompilertodierenttarget
platforms,e.g.x86andCLI,andtoimplementitforotherdynamiclanguages.
ThesolutionadoptedbyPyPy istogenerateautomaticallyaJITcompilerinsteadof
writingitbyhand. Thelanguageisimplementedthroughasimpleinterpreter,whichis
thenfedintoatranslatorwhichaugmentsitwithmanyfeatures,includingadynamicJIT
compiler.TheJITgeneratorisdividedintoafrontendandseveralbackends,oneforeach
target platform. The e followingchapters describesindetailthe solutionimplementedby
PyPy.
19
Chapter 3
Enter PyPy
3.1 What is PyPy?
ThePyPyproject
1
[RP06]wasinitiallyconceivedtodevelopanimplementationofPython
whichcouldbeeasilyportableandextensiblewithout renouncing eciency. Toachieve
theseaims,thePyPyimplementationisbasedonahighlymodulardesignwhichallows
high-levelaspectstobeseparatedfromlower-levelimplementationdetails. Theabstract
semanticsofPythonis denedbyaninterpreterwritteninahigh-levellanguage,called
RPython[AACM07], which is s infact asubsetof Pythonwhere somedynamic features
havebeensacricedtoallowanecienttranslationoftheinterpretertolow-levelcode.
Compilationoftheinterpreterisimplementedasastepwiserenementbymeansofatrans-
lationtoolchainwhichperformstypeanalysis,codeoptimizationsandseveraltransforma-
tionsaimingatincrementallyprovidingimplementationdetailssuchasmemorymanage-
mentorthethreadingmodel. Thedierentkindsofintermediatecodeswhicharerened
duringthetranslationprocessareallrepresentedbyacollectionofcontrol owgraphs,at
severallevelsofabstractions.
Finally, the low-level control  ow-graphs produced by y the toolchain can be e translated
toexecutablecodeforaspecic platformby acorrespondingbackend. Currently,three
fully developedbackends are available toproduceexecutable C/POSIXcode,Java and
CLI/.NETbytecode.
AlthoughithasbeenspecicallydevelopedforPython,thePyPyinfrastructurecaninfact
be usedforimplementingother languages. Indeed,thereweresuccessfulexperimentsof
usingPyPytoimplementseveralotherlanguagessuchasSmalltalk[BKL
+
08],JavaScript,
SchemeandProlog[BLR09].
1
http://codespeak.net/pypy/
20
3.2 Architecture overview
PyPyiscomposedoftwoindependentsubsystems:thestandardinterpreterandthetrans-
lationtoolchain.
ThestandardinterpreteristhesubsystemimplementingthePythonlanguage,starting
fromtheparserendingtothebytecodeinterpreter. ItiswritteninRPython,alanguage
whichcanbeeasilybetranslatedintolower-levelandmoreecientexecutables.
RPythonisstaticallytyped,andthetypes arededucedbytypeinference. Themainre-
strictionscomparedtoPythoncomesfromthefactthatthetypesystemhasbeencarefully
designedinawaythattheprogramscanbeecientlytranslatedtoC,JVMorCLI.For
example,itisforbiddentomixintegerswithstringsinanyway,andsimilarlyitisforbid-
dentochangethelayoutofclassesatruntime,suchasaddingorremovingmethodsand
attributes
2
.RPythonisapropersubsetofPython,inthesensethatifaPythonprogram
isRPython,itisguaranteedtohavethesamesemanticsbothwhentranslatedandwhen
interpretedbyastandardPythoninterpreter.
Becauseofthis,thestandardinterpretercanalsobedirectlyexecutedontopofanyimple-
mentationofthelanguage,e.g. CPythonorPyPyitself: : thisisimmenselyusefulduring
thedevelopment,becausedebuggingaPythonprogramismucheasierthandebuggingthe
equivalentwritteninC.Obviously,the standardinterpreterisvery slowwhenexecuted
thisway,becauseofthehugeoverheadofthedoubleinterpretation.
Thus,we canthink ofthestandardinterpreter as an high-level, executable and testable
specicationofthelanguage.
Moreover,sinceRPythonis ahigh-levellanguage,thestandardinterpreterisfreeofthe
manylow-levelaspectsthatareusuallyhardcodedintotheotherPythonimplementations,
suchase.g.garbagecollectionstrategyorthreadingmodel.
Theseaspectsarewoveninattranslationtimebythetranslationtoolchain,whosejob
is toturnthehigh-levelspecicationinto an ecientexecutable. Thus, , the e translation
toolchainismuchmorethanacompilerasweusualthinkofit: notonlyittranslatesthe
RPythonsourcecodetothetargetlanguage,butitalsoimplementstheruntimesystem
neededtoexecuteafull- edgedvirtualmachine.
Thetargetplatformsaredividedintotwobigcategories: lltype e andootype (respectively
forlowlevelandobjectorientedtypesystem). Thechoiceofthetypesystemdetermines
theinternalrepresentationoftheprogramsbeingtranslated. Theymainlydierintheir
primitives: ontheonehandthebuildingblocks s oflltype arestructuresandpointers,on
theotherhandootype isaboutclasses,methodsandobjects.
2
Note that,althoughRPythonis alimitedsubset ofPython,it is only usedinternallyastheimple-
mentationlanguageofthestandardinterpreter,whichinturnfullysupportsthewholePythonsemantics.
21
Documents you may be interested
Documents you may be interested