c# pdf to image conversion : Add photo to pdf form control Library platform web page asp.net winforms web browser book_00716-part1811

12.5. ParsingHTMLusingRegularExpressions
147
Hereisasimplewebpage:
<h1>The First Page</h1>
<p>
If you u like, , you u can n switch to the
<a href="http://www.dr-chuck.com/page2.htm">
Second Page</a>.
</p>
Wecanconstructawell-formedregularexpressiontomatchandextractthelink
valuesfromtheabovetextasfollows:
href="http://.+?"
Ourregularexpressionlooksforstringsthatstartwith“href=”http://”followedby
oneormorecharacters“.+?” followedbyanotherdoublequote. . Thequestion
markaddedtothe“.+?”indicatesthatthematchistobedoneina“non-greedy”
fashioninsteadofa“greedy”fashion.Anon-greedymatchtriestofindthesmall-
estpossiblematchingstringandagreedymatchtriestofindthelargestpossible
matchingstring.
Weneedtoaddparenthesestoourregularexpressiontoindicatewhichpartofour
matchedstringwewouldliketoextractandproducethefollowingprogram:
import urllib
import re
url = = raw_input(
'
Enter -
'
)
html = = urllib.urlopen(url).read()
links = = re.findall(
'
href="(http://.*?)"
'
, html)
for link in links:
print link
The
findall
regularexpressionmethodwillgiveusalistofallofthestrings
thatmatchourregularexpression,returningonlythelinktextbetweenthedouble
quotes.
Whenweruntheprogram,wegetthefollowingoutput:
python urlregex.py
Enter - - http://www.dr-chuck.com/page1.htm
http://www.dr-chuck.com/page2.htm
python urlregex.py
Enter - - http://www.py4inf.com/book.htm
http://www.greenteapress.com/thinkpython/thinkpython.html
http://allendowney.com/
http://www.py4inf.com/code
http://www.lib.umich.edu/espresso-book-machine
http://www.py4inf.com/py4inf-slides.zip
RegularexpressionsworkverynicewhenyourHTMLiswell-formattedandpre-
dictable. Butsincethereisalotof“broken”HTMLpagesoutthere,youmight
Add photo to pdf form - C# PDF Field Edit Library: insert, delete, update pdf form field in C#.net, ASP.NET, MVC, Ajax, WPF
Online C# Tutorial to Insert, Delete and Update Fields in PDF Document
change font size in pdf form field; acrobat create pdf form
Add photo to pdf form - VB.NET PDF Field Edit library: insert, delete, update pdf form field in vb.net, ASP.NET, MVC, Ajax, WPF
How to Insert, Delete and Update Fields in PDF Document with VB.NET Demo Code
change text size pdf form; add fields to pdf form
148
Chapter12. Networkedprograms
findthatasolutiononlyusingregularexpressionsmighteithermisssomevalid
linksorendupwithbaddata.
ThiscanbesolvedbyusingarobustHTMLparsinglibrary.
12.6 ParsingHTMLusingBeautifulSoup
There areanumberofPythonlibrarieswhichcanhelpyouparseHTMLand
extractdatafromthepages.Eachofthelibrarieshasitsstrengthsandweaknesses
andyoucanpickonebasedonyourneeds.
Asanexample,wewillsimplyparsesomeHTMLinputandextractlinksusing
theBeautifulSouplibrary.YoucandownloadandinstalltheBeautifulSoupcode
from:
www.crummy.com
You can download d and “install” ” BeautifulSoup or you can n simply y place e the
BeautifulSoup.py
fileinthesamefolderasyourapplication.
EventhoughHTMLlookslikeXMLandsomepagesarecarefullyconstructedto
beXML,mostHTMLisgenerallybrokeninwaysthatcauseanXMLparserto
rejecttheentirepageofHTMLasimproperlyformed. BeautifulSouptolerates
highlyflawedHTMLandstillletsyoueasilyextractthedatayouneed.
Wewilluse
urllib
toreadthepageandthenuse
BeautifulSoup
toextractthe
href
attributesfromtheanchor(
a
)tags.
import urllib
from BeautifulSoup import t *
url = = raw_input(
'
Enter -
'
)
html = = urllib.urlopen(url).read()
soup = = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = = soup(
'
a
'
)
for tag in tags:
print tag.get(
'
href
'
, None)
Theprogrampromptsforawebaddress,thenopensthewebpage,readsthedata
andpassesthedatatotheBeautifulSoupparser,andthenretrievesalloftheanchor
tagsandprintsoutthe
href
attributeforeachtag.
Whentheprogramrunsitlooksasfollows:
python urllinks.py
Enter - - http://www.dr-chuck.com/page1.htm
http://www.dr-chuck.com/page2.htm
python urllinks.py
C# PDF insert image Library: insert images into PDF in C#.net, ASP
Insert images into PDF form field. Access to freeware download and online C#.NET class source code. How to insert and add image, picture, digital photo, scanned
create a form in pdf; create a form in pdf from word
VB.NET PDF insert image library: insert images into PDF in vb.net
Import graphic picture, digital photo, signature and logo into PDF Add images to any selected PDF page in VB.NET. Insert images into PDF form field in VB.NET.
add print button to pdf form; add fillable fields to pdf online
12.7. Readingbinaryfilesusingurllib
149
Enter - - http://www.py4inf.com/book.htm
http://www.greenteapress.com/thinkpython/thinkpython.html
http://allendowney.com/
http://www.si502.com/
http://www.lib.umich.edu/espresso-book-machine
http://www.py4inf.com/code
http://www.pythonlearn.com/
YoucanuseBeautifulSouptopulloutvariouspartsofeachtagasfollows:
import urllib
from BeautifulSoup import t *
url = = raw_input(
'
Enter -
'
)
html = = urllib.urlopen(url).read()
soup = = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = = soup(
'
a
'
)
for tag in tags:
# Look at the e parts s of f a a tag
print
'
TAG:
'
,tag
print
'
URL:
'
,tag.get(
'
href
'
, None)
print
'
Content:
'
,tag.contents[0]
print
'
Attrs:
'
,tag.attrs
Thisproducesthefollowingoutput:
python urllink2.py
Enter - - http://www.dr-chuck.com/page1.htm
TAG: <a href="http://www.dr-chuck.com/page2.htm">
Second Page</a>
URL: http://www.dr-chuck.com/page2.htm
Content: [u
'
\nSecond Page
'
]
Attrs: [(u
'
href
'
, u
'
http://www.dr-chuck.com/page2.htm
'
)]
TheseexamplesonlybegintoshowthepowerofBeautifulSoupwhenitcomesto
parsingHTML.Seethedocumentationandsamplesat
www.crummy.com
formore
detail.
12.7 Readingbinaryfilesusingurllib
Sometimesyouwanttoretrieveanon-text(orbinary)filesuchasanimageor
videofile. Thedatainthesefilesisgenerallynotusefultoprintoutbutyoucan
easilymakeacopyofaURLtoalocalfileonyourharddiskusing
urllib
.
ThepatternistoopentheURLanduse
read
todownloadtheentirecontentsof
thedocumentintoastringvariable(
img
)andthenwritethatinformationtoalocal
fileasfollows:
img = = urllib.urlopen(
'
http://www.py4inf.com/cover.jpg
'
).read()
fhand = = open(
'
cover.jpg
'
,
'
w
'
)
VB.NET Image: Mark Photo, Image & Document with Polygon Annotation
on PDF file without using external PDF editing software. VB.NET Methods to Add Polygon Annotation. In this Public Partial Class Form1 Inherits Form Public Sub
add fields to pdf; add picture to pdf form
VB.NET Image: Image Cropping SDK to Cut Out Image, Picture and
VB.NET image cropping method to crop picture / photo; size of created cropped image file, add antique effect Public Partial Class Form1 Inherits Form Public Sub
pdf form save in reader; add text field to pdf
150
Chapter12. Networkedprograms
fhand.write(img)
fhand.close()
Thisprogramreadsallofthedatainatonceacrossthenetworkandstoresitin
thevariable
img
inthemainmemoryofyourcomputerandthenopensthefile
cover.jpg
andwritesthedataouttoyourdisk.Thiswillworkifthesizeofthe
fileislessthanthesizeofthememoryofyourcomputer.
Howeverifthisisalargeaudioorvideofile,thisprogrammaycrashoratleast
runextremelyslowlywhenyourcomputerrunsoutofmemory.Inordertoavoid
runningoutofmemory,weretrievethedatainblocks(orbuffers)andthenwrite
eachblocktoyourdiskbeforeretrievingthenextblock.Thiswaytheprogramcan
readanysizedfilewithoutusingupallofthememoryyouhaveinyourcomputer.
import urllib
img = = urllib.urlopen(
'
http://www.py4inf.com/cover.jpg
'
)
fhand = = open(
'
cover.jpg
'
,
'
w
'
)
size = = 0
while True:
info = img.read(100000)
if len(info) < < 1 : break
size = size e + + len(info)
fhand.write(info)
print size,
'
characters copied.
'
fhand.close()
Inthisexample,wereadonly100,000charactersatatimeandthenwritethose
characterstothe
cover.jpg
filebeforeretrievingthenext100,000charactersof
datafromtheweb.
Thisprogramrunsasfollows:
python curl2.py
568248 characters s copied.
IfyouhaveaUnixorMacintoshcomputer,youprobablyhaveacommandbuilt
intoyouroperatingsystemthatperformsthisoperationasfollows:
curl -O http://www.py4inf.com/cover.jpg
Thecommand
curl
isshortfor“copyURL”andsothesetwoexamplesareclev-
erlynamed
curl1.py
and
curl2.py
on
www.py4inf.com/code
astheyimple-
mentsimilarfunctionalitytothe
curl
command.Thereisalsoa
curl3.py
sam-
pleprogramthatdoesthistaskalittlemoreeffectivelyincaseyouactuallywant
tousethispatterninaprogramyouarewriting.
12.8 Glossary
BeautifulSoup: APythonlibraryforparsingHTMLdocumentsandextracting
datafromHTMLdocumentsthatcompensatesformostoftheimperfec-
VB.NET Image: Image Scaling SDK to Scale Picture / Photo
about this VB.NET image scaling control add-on, we RE__Test Public Partial Class Form1 Inherits Form Public Sub can only scale one image / picture / photo at a
add form fields to pdf; cannot edit pdf form
VB.NET Image: Image Resizer Control SDK to Resize Picture & Photo
VB.NET Image & Photo Resizing Overview. The practical this VB.NET image resizer control add-on, can powerful & profession imaging controls, PDF document, image
adding a signature to a pdf form; best pdf form creator
12.9. Exercises
151
tionsintheHTMLthatbrowsersgenerallyignore. Youcandownloadthe
BeautifulSoupcodefrom
www.crummy.com
.
port: Anumberthatgenerallyindicates s whichapplicationyouare contacting
whenyoumakeasocketconnectiontoaserver.Asanexample,webtraffic
usuallyusesport80whilee-mailtrafficusesport25.
scrape: Whenaprogrampretendstobeawebbrowserandretrievesawebpage
andthenlooksatthewebpagecontent. Oftenprogramsarefollowingthe
linksinonepagetofindthenextpagesotheycantraverseanetworkof
pagesorasocialnetwork.
socket: Anetworkconnectionbetweentwoapplicationswheretheapplications
cansendandreceivedataineitherdirection.
spider: Theactofawebsearchengineretrievingapageandthenallthepages
linkedfromapageandsoonuntiltheyhavenearlyallofthepagesonthe
Internetwhichtheyusetobuildtheirsearchindex.
12.9 Exercises
Exercise12.1 Changethesocketprogram
socket1.py
toprompttheuserforthe
URLsoitcanreadanywebpage.Youcanuse
split(’/’)
tobreaktheURLinto
itscomponentpartssoyoucanextractthehostnameforthesocket
connect
call.
Adderrorcheckingusing
try
and
except
tohandletheconditionwheretheuser
entersanimproperlyformattedornon-existentURL.
Exercise12.2 Changeyoursocketprogramsothatitcountsthenumberofchar-
actersithasreceivedandstopsdisplayinganytextafterithasshown3000charac-
ters.Theprogramshouldretrievetheentiredocumentandcountthetotalnumber
ofcharactersanddisplaythecountofthenumberofcharactersattheendofthe
document.
Exercise12.3 Use
urllib
toreplicatethepreviousexerciseof(1)retrievingthe
documentfromaURL,(2)displayingupto3000characters,and(3)countingthe
overallnumberofcharactersinthedocument.Don’tworryabouttheheadersfor
thisexercise,simplyshowthefirst3000charactersofthedocumentcontents.
Exercise12.4 Changethe
urllinks.py
programtoextractandcountparagraph
(p)tagsfromtheretrievedHTMLdocumentanddisplaythecountofthepara-
graphsastheoutputofyourprogram. Donotdisplaytheparagraphtext-only
countthem.Testyourprogramonseveralsmallwebpagesaswellassomelarger
webpages.
Exercise12.5 (Advanced)Changethesocketprogramsothatitonlyshowsdata
aftertheheadersandablanklinehavebeenreceived. Rememberthat
recv
is
receivingcharacters(newlinesandall)-notlines.
VB.NET Image: How to Save Image & Print Image Using VB.NET
of saving and printing multi-page document files, like PDF and Word printing assembly with VB.NET web image viewer add-on, you VB.NET Code to Save Image / Photo.
convert word doc to pdf with editable fields; adding text fields to pdf acrobat
C# Image: How to Add Antique & Vintage Effect to Image, Photo
this C#.NET antique effect creating control add-on is widely used in modern photo editors, which powerful & profession imaging controls, PDF document, tiff
change font on pdf form; adding text fields to a pdf
152
Chapter12. Networkedprograms
VB.NET Image: Tutorial for Flipping Image Using Our .NET Image SDK
version of .NET imaging SDK and add the following becomes a mirror reflection of the photo on the powerful & profession imaging controls, PDF document, tiff
pdf forms save; convert pdf to editable form
C# PDF remove image library: remove, delete images from PDF in C#.
Highlight Text. Add Text. Add Text Box. Drawing vector image, graphic picture, digital photo, scanned signature and remove multiple or all images from PDF document
create a fillable pdf form from a pdf; cannot save pdf form in reader
Chapter13
UsingWebServices
OnceitbecameeasytoretrievedocumentsandparsedocumentsoverHTTPusing
programs,itdidnottakelongtodevelopanapproachwherewestartedproducing
documentsthatwerespecificallydesignedtobeconsumedbyotherprograms(i.e.
notHTMLtobedisplayedinabrowser).
Themostcommonapproachwhentwoprogramsareexchangingdataacrossthe
webistoexchangethedatainaformatcalledthe“eXtensibleMarkupLanguage”
orXML.
13.1 eXtensibleMarkupLanguage-XML
XMLlooksverysimilartoHTML,butXMLismorestructuredthanHTML.Here
isasampleofanXMLdocument:
<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 3 4456
</phone>
<email hide="yes"/>
</person>
OftenitishelpfultothinkofanXMLdocumentasatreestructurewherethere
isatoptag
person
andothertagssuchas
phone
aredrawnaschildrenoftheir
parentnodes.
154
Chapter13. UsingWebServices
Person
name
Chuck
+1 734
303 4456
phone
email
type=
intl
hide=
yes
13.2 Parsing XML
Here is a simple application that parses some XML and extracts some data ele-
ments from the XML:
import xml.etree.ElementTree as ET
data =
'''
<person>
<name>Chuck</name>
<phone type="intl">
+1 734 303 4456
</phone>
<email hide="yes"/>
</person>
'''
tree = ET.fromstring(data)
print
'
Name:
'
,tree.find(
'
name
'
).text
print
'
Attr:
'
,tree.find(
'
email
'
).get(
'
hide
'
)
Calling
fromstring
converts the string representation of the XML into a ’tree’ of
XML nodes. When the XML is in a tree, we have a series of methods which we
can call to extract portions of data from the XML.
The
find
function searches through the XML tree and retrieves a node that
matches the specified tag. Each node can have some text, some attributes (i.e.
like hide) and some “child” nodes. Each node can be the top of a tree of nodes.
Name: Chuck
Attr: yes
Using an XML parser such as
ElementTree
has the advantage that while the
XML in this example is quite simple, it turns out there are many rules regarding
valid XML and using
ElementTree
allows us to extract data from XML without
worrying about the rules of XML syntax.
13.3 Looping through nodes
Often the XML has multiple nodes and we need to write a loop to process all of
the nodes. In the following program, we loop through all of the
user
nodes:
13.4. Application Programming Interfaces (API)
155
import xml.etree.ElementTree as ET
input =
'''
<stuff>
<users>
<user x="2">
<id>001</id>
<name>Chuck</name>
</user>
<user x="7">
<id>009</id>
<name>Brent</name>
</user>
</users>
</stuff>
'''
stuff = ET.fromstring(input)
lst = stuff.findall(
'
users/user
'
)
print
'
User count:
'
, len(lst)
for item in lst:
print
'
Name
'
, item.find(
'
name
'
).text
print
'
Id
'
, item.find(
'
id
'
).text
print
'
Attribute
'
, item.get(
'
x
'
)
The
findall
method retrieves a Python list of sub-trees that represent the
user
structures in the XML tree. Then we can write a
for
loop that looks at each of
the user nodes, and prints the
name
and
id
text elements as well as the
x
attribute
from the
user
node.
User count: 2
Name Chuck
Id 001
Attribute 2
Name Brent
Id 009
Attribute 7
13.4 Application Programming Interfaces (API)
We now have the ability to exchange data between applications using HyperText
Transport Protocol (HTTP) and a way to represent complex data that we are send-
ing back and forth between these applications using eXtensible Markup Language
(XML).
The next step is to begin to define and document “contracts” between applications
using these techniques. The general name for these application-to-application con-
tracts is Application Program Interfaces or APIs. When we use an API, gener-
ally one program makes a set of services available for use by other applications
and publishes the APIs (i.e. the “rules”) that must be followed to access the ser-
vices provided by the program.
156
Chapter 13. Using Web Services
When we begin to build our programs where the functionality of our program
includes access to services provided by other programs, we call the approach a
Service-Oriented Architecture or SOA. A SOA approach is one where our over-
all application makes use of the services of other applications. A non-SOA ap-
proach is where the application is a single stand-alone application which contains
all of the code necessary to implement the application.
We see many examples of SOA when we use the web. We can go to a single web
site and book air travel, hotels, and automobiles all from a single site. The data
for hotels is not stored on the airline computers. Instead, the airline computers
contact the services on the hotel computers and retrieve the hotel data and present
it to the user. When the user agrees to make a hotel reservation using the air-
line site, the airline site uses another web service on the hotel systems to actually
make the reservation. And when it comes to charge your credit card for the whole
transaction, still other computers become involved in the process.
Auto
Rental
Service
Hotel
Reservation
Service
Airline
Reservation
Service
Travel
Application
API
API
API
AService-Oriented Architecture has many advantages including: (1) we always
maintain only one copy of data - this is particularly important for things like hotel
reservations where we do not want to over-commit and (2) the owners of the data
can set the rules about the use of their data. With these advantages, a SOA system
must be carefully designed to have good performance and meet the user’s needs.
When an application makes a set of services in its API available over the web, we
call these web services.
13.5 Twitter web services
Note: Since this section was written, Twitter has dramatically changed the format
and rulesfor the use of its API. So the code that usesthe Twitter API will no longer
work. It still shows how one would work with an XML-based API in general.
Documents you may be interested
Documents you may be interested