open pdf file in new tab in asp.net c# : Deleting text from a pdf software application cloud windows html azure class pdfkungfoo-sample3-part1327

6HowcanIremovewhitemarginsfrom
PDFpages?
IwouldliketoknowawaytoremovewhitemarginsfromaPDFfile.JustlikeAdobeAcrobatXPro
does.IunderstanditwillnotworkwitheveryPDFfile.
Iwouldguessthatthewaytodoit,isbygettingthetextmargins,thencroppingoutofthatmargins.
PyPdfispreferred.
1
iText finds text margins s based d on n this code:
2
3
4
public void d addMarginRectangle(String g src, , String g dest)
5
throws IOException, DocumentException {
6
PdfReader reader = = new w PdfReader(src);
7
PdfReaderContentParser parser r = new PdfReaderContentParser(reader);
8
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
9
TextMarginFinder finder;
10
for (int i i = 1; ; i i <= reader.getNumberOfPages(); i++) ) {
11
finder = parser.processContent(i, new TextMarginFinder());
12
PdfContentByte cb b = = stamper.getOverContent(i);
13
cb.rectangle(finder.getLlx(), finder.getLly(),
14
finder.getWidth(), finder.getHeight());
15
cb.stroke();
16
}
17
stamper.close();
18
}
6.1Answer
I’mnottoofamiliarwithPyPDF,butIknowGhostscriptwillbeabletodothisforyou.Herearelinksto
someotheranswersonsimilarquestions:
1. ConvertPDF2sidesperpageto1sideperpage¹(SuperUser.com)
2. Freewaretosplitapdf’spagesdownthemiddle?²(SuperUser.com)
3. CroppingaPDFusingGhostscript9.01³(StackOverflow.com)
Thethirdanswerisprobablywhatmadeyousay‘IunderstanditwillnotworkwitheveryPDFfile’.It
usesthepdfmarkcommandtotryandsetthe
/CropBox
intothePDFpageobjects.
¹
http://superuser.com/a/189109/40894
²
http://superuser.com/a/235401/40894
³
http://stackoverflow.com/a/6184547/359307
21
Deleting text from a pdf - delete, remove text from PDF file in C#.net, ASP.NET, MVC, Ajax, WinForms, WPF
Allow C# developers to use mature APIs to delete and remove text content from PDF document
delete text from pdf with acrobat; how to delete text in pdf file
Deleting text from a pdf - VB.NET PDF delete text library: delete, remove text from PDF file in vb.net, ASP.NET, MVC, Ajax, WinForms, WPF
VB.NET Programming Guide to Delete Text from PDF File
erase pdf text online; how to delete text in pdf preview
HowcanIremovewhitemarginsfromPDFpages?
22
Themethodofthefirsttwoanswerswillmostlikelysucceedwherethethirdonefails.Thismethoduses
aPostScriptcommandsnippetof
<</PageOffset [NNN N MMM]>> > setpagedevice
toshiftandplacethePDF
pagesona(smaller)mediasizedefinedbythe
-gNNNNxMMMM
parameter(whichdefinesdevicewidthand
heightinpixels).
Ifyouunderstandtheconceptbehindthefirsttwoanswers,you’lleasilybeabletoadaptthemethodused
theretocropmarginsonall4edgesofaPDFpage:
AnexamplecommandtocropalettersizedPDF(8.5x11in==612x792pt)byhalfaninch(==36pt)oneach
ofthe4edges(commandisforWindows):
1
gswin32c.exe
^
2
-o cropped.pdf
^
3
-sDEVICE=pdfwrite ^
4
-g5400x7200
^
5
-c "<</PageOffset t [-36 6 -36]>> > setpagedevice" ^
6
-f input.pdf
Theresultingpagesizewillbe7.5x10in(==540x720pt).TodothesameonLinuxorMac,use:
1
gs
\
2
-o cropped.pdf
\
3
-sDEVICE=pdfwrite \
4
-g5400x7200
\
5
-c "<</PageOffset t [-36 6 -36]>> > setpagedevice" \
6
-f input.pdf
Update:Howtodetermine‘margins’withGhostscript
Acommentaskedfor‘automatic’determinationofthewhitemargins.YoucanuseGhostscript’stoofor
this.Its
bbox
devicecandeterminetheareacoveredbythe(virtual)inkoneachpage(andhence,indirectly
thewhitespaceforeachedgeofthecanvas).
Hereisthecommand:
1
gs
\
2
-q -dBATCH H -dNOPAUSE E \
3
-sDEVICE=bbox
\
4
input.pdf
C# PDF insert text Library: insert text into PDF content in C#.net
developers to add multiple text processing functions to PDF document imaging application, such as inserting text to PDF, deleting text from PDF, searching text
remove text from pdf acrobat; how to erase text in pdf file
C# PDF Page Delete Library: remove PDF pages in C#.net, ASP.NET
Provide C# Users with Mature .NET PDF Document Manipulating Library for Deleting PDF Pages in C#. C#.NET Project DLLs for Deleting PDF Document Page.
delete text pdf acrobat professional; how to erase text in pdf online
HowcanIremovewhitemarginsfromPDFpages?
23
Output(example):
1
%%BoundingBox: 57 7 29 9 562 2 764
2
%%HiResBoundingBox: 57.265030 29.347046 560.245045 763.649977
3
%%BoundingBox: 57 7 28 8 562 2 667
4
%%HiResBoundingBox: 57.265030 28.347046 560.245045 666.295011
The
bbox
devicerenderseachPDFpageinmemory(withoutwritinganyoutputtodisk)andthenprints
theBoundingBoxandHiResBoundingBoxinfoto
stderr
.Youmaymodifythiscommandlikethatto
maketheresultsmoreeasytoparse:
1
gs
\
2
-q -dBATCH H -dNOPAUSE E \
3
-sDEVICE=bbox
\
4
input.pdf
\
5
2>&1
\
6
| grep -v HiResBoundingBox
Output(example):
1
%%BoundingBox: 57 7 29 9 562 2 764
2
%%BoundingBox: 57 7 28 8 561 1 667
Thiswouldtellyou…
• …thatthelowerleftcornerofthecontentrectangleofPage1isatcoordinates
[57 29]
withthe
upperrightcornerisat
[562 741]
• …thatthelowerleftcornerofthecontentrectangleofPage2isatcoordinates
[57 28]
withthe
upperrightcornerisat
[561 667]
Thismeans:
• Page1usesawhitespaceof57ptontheleftedge(
72pt == 1in == = 25,4mm
).
• Page1usesawhitespaceof29ptonthebottomedge.
• Page2usesawhitespaceof57ptontheleftedge.
• Page2usesawhitespaceof28ptonthebottomedge.
VB.NET PDF Page Delete Library: remove PDF pages in vb.net, ASP.
Free PDF edit control and component for deleting PDF pages in Visual Basic .NET framework application. DLLs for Deleting Page from PDF Document in VB.NET Class.
how to edit and delete text in pdf file online; how to delete text from a pdf reader
VB.NET PDF insert text library: insert text into PDF content in vb
NET users to add multiple text processing functions to PDF document imaging application, such as inserting text to PDF, deleting text from PDF, searching text
how to delete text in pdf acrobat; erase text in pdf document
HowcanIremovewhitemarginsfromPDFpages?
24
Asyoucanseefromthissimpleexamplealready,thewhitespaceisnotexactlythesameforeachpage.
Dependingonyourneeds(youlikelywantthesamesizeforeachpageofamulti-pagePDF,no?),you
havetoworkoutwhataretheminimummarginsforeachedgeacrossallpagesofthedocument.
Nowwhatabouttherightandtopedgewhitespace?Tocalculatethat,youneedtoknowtheoriginalpage
sizeforeachpage.Themostsimplewaytodeterminethis:the
pdfinfo
utility.Examplecommandfora5
pagePDF:
1
pdfinfo
\
2
-f 1
\
3
-l 5
\
4
input.pdf \
5
| grep "Page "
Output(example):
1
Page
1 size: 612 2 x x 792 2 pts s (letter)
2
Page
2 size: 612 2 x x 792 2 pts s (letter)
3
Page
3 size: 595 5 x x 842 2 pts s (A4)
4
Page
4 size: 842 2 x x 1191 1 pts s (A3)
5
Page
5 size: 612 2 x x 792 2 pts s (letter)
Thiswillhelpyoudeterminetherequiredcanvassizeandtherequired(maximum)whitemarginsofthe
topandrightedgesofeachofyournewPDFpages.
Thesecalculationscanallbescriptedtoo,ofcourse.
ButifyourPDFsareallofauniqpagesize,oriftheyare1-pagedocuments,itallismucheasiertoget
done…
C# PDF File & Page Process Library SDK for C#.net, ASP.NET, MVC
PDF to tiff, VB.NET read PDF, VB.NET convert PDF to text, VB.NET Easily manipulate multi-page PDF document file with page inserting, deleting and re
delete text in pdf file online; deleting text from a pdf
C#: How to Delete Cached Files from Your Web Viewer
C#.NET rotate PDF pages, C#.NET search text in PDF VB.NET How-to, VB.NET PDF, VB.NET Word, VB Introduce Visual C#.NET Developers the Ways of Deleting Cache Files.
delete text from pdf preview; pdf text watermark remover
7UsingGhostscripttogetpagesize
Isitpossibletogetthepagesize(frome.g.aPDFdocumentpage)usingGhostscript?
Ihaveseenthe
bbox
device,butitreturnstheboundingbox(itdiffersperpage),nottheTrimBox(or
CropBox)ofthePDFpages.(SeePrepressurewebsiteforinfoaboutpageboxes.)Anyotherpossibility?
http://www.prepressure.com/pdf/basics/page_boxes
7.1Answer1
Unfortunatelyitdoesn’tseemquiteeasytogetthe(possiblydifferent)pagesizes(or*Boxesforthatmatter)
insideaPDFwiththehelpofGhostscript.
Butsinceyouaskedforotherpossibilitiesaswell:aratherreliablewaytodeterminethemediasizesfor
eachpage(andeveneachoneoftheembedded{Trim,Media,Crop,Bleed}Boxes)isthecommandlinetool
pdfinfo.exe.ThisutilityispartoftheXPDFtoolsfromhttp://www.foolabs.com/xpdf/download.html¹.
Youcanrunthetoolwiththe
-box
parameterandtellitwith
-f 3
tostartatpage3andwith
-l 8
tostop
processingatpage8.
Exampleoutput
1
C:\downloads>pdfinfo -box x -f f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
2
Creator:
FrameMaker 6.0
3
Producer:
Acrobat Distiller 5.0.5 (Windows)
4
CreationDate:
08/17/06 16:43:06
5
ModDate:
08/22/06 12:20:24
6
Tagged:
no
7
Pages:
146
8
Encrypted:
no
9
Page
1 size: 419.535 x x 297.644 4 pts
10
Page
2 size: 297.646 x x 419.524 4 pts
11
Page
3 size: 297.646 x x 419.524 4 pts
12
Page
1 MediaBox:
0.00
0.00
595.00
842.00
13
Page
1 CropBox:
87.25
430.36
506.79
728.00
14
Page
1 BleedBox:
87.25
430.36
506.79
728.00
15
Page
1 TrimBox:
87.25
430.36
506.79
728.00
16
Page
1 ArtBox:
87.25
430.36
506.79
728.00
17
Page
2 MediaBox:
0.00
0.00
595.00
842.00
18
Page
2 CropBox:
148.17
210.76
445.81
630.28
19
Page
2 BleedBox:
148.17
210.76
445.81
630.28
20
Page
2 TrimBox:
148.17
210.76
445.81
630.28
¹
http://www.foolabs.com/xpdf/download.html
25
C# PDF remove image library: remove, delete images from PDF in C#.
VB.NET read PDF, VB.NET convert PDF to text, VB.NET C# PDF - Remove Image from PDF Page. Provide C# Demo Code for Deleting and Removing Image from PDF File Page.
delete text from pdf file; pdf editor delete text
C# Word - Delete Word Document Page in C#.NET
C# Word - Delete Word Document Page in C#.NET. Provides Users with Mature Document Manipulating Function for Deleting Word Pages. Overview.
how to remove highlighted text in pdf; how to delete text from a pdf in acrobat
UsingGhostscripttogetpagesize
26
21
Page
2 ArtBox:
148.17
210.76
445.81
630.28
22
Page
3 MediaBox:
0.00
0.00
595.00
842.00
23
Page
3 CropBox:
148.17
210.76
445.81
630.28
24
Page
3 BleedBox:
148.17
210.76
445.81
630.28
25
Page
3 TrimBox:
148.17
210.76
445.81
630.28
26
Page
3 ArtBox:
148.17
210.76
445.81
630.28
27
File size:
6888764 bytes
28
Optimized:
yes
29
PDF version:
1.4
7.2Answer2
MeanwhileIfoundadifferentmethod.ThisoneusesGhostscriptonly(justasyourequired).Noneedfor
additionalthirdpartyutilities.
This method d uses s a little helper program, written in PostScript, shipping with the source code of
Ghostscript.Lookinthetoolbinsubdirforthe
pdf_info.ps
file.
Theincludedcommentssayyoushouldrunitlikethisinordertolistfontsused,mediasizesused
1
gswin32c -dNODISPLAY
^
2
-q
^
3
-sFile=____.pdf
^
4
[-dDumpMediaSizes] ^
5
[-dDumpFontsUsed [-dShowEmbeddedFonts]] ] ^
6
toolbin/pdf_info.ps
Ididrunitonalocalexamplefile,withcommandlineparametersthataskforthemediasizesonly(not
thefontsused).Hereistheresult:
1
C:\> gswin32c
^
2
-dNODISPLAY ^
3
-q
^
4
-sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
5
-dDumpMediaSizes ^
6
C:/gs8.71/lib/pdf_info.ps
7
8
9
c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has s 146 6 pages.
10
Creator: FrameMaker 6.0
11
Producer: Acrobat t Distiller r 5.0.5 (Windows)
12
CreationDate: D:20060817164306Z
13
ModDate: D:20060822122024+02'00'
UsingGhostscripttogetpagesize
27
14
15
Page 1 1 MediaBox: : [ [ 595 5 842 ] CropBox: : [ [ 419.535 297.644 4 ]
16
Page 2 2 MediaBox: : [ [ 595 5 842 ] CropBox: : [ [ 297.646 419.524 4 ]
17
Page 3 3 MediaBox: : [ [ 595 5 842 ] CropBox: : [ [ 297.646 419.524 4 ]
18
Page 4 4 MediaBox: : [ [ 595 5 842 ] CropBox: : [ [ 297.646 419.524 4 ]
19
[....]
IIFonts
8HowcanIextractembeddedfontsfroma
PDFasvalidfontfiles?
I’mawareofthe
pdftk.exe
utilitythatcanindicatewhichfontsareusedbyaPDF,andwhetherthey
areembeddedornot.
Nowtheproblem:givenIhadPDFfileswithembeddedfonts–howcanIextractthosefontsinaway
thattheyarere-usableasregularfontfiles?Arethere(preferablyfree)toolswhichcandothat?Also:
canthisbedoneprogrammaticallywith,say,iText?
Youhaveseveraloptions.AllthesemethodsworkonLinuxaswellasonWindowsorMacOSX.However,
beawarethatmostPDFsdonotincludetofull,completefontfacewhentheyhaveafontembedded.Mostly
theyincludejustthesubsetofglyphsusedinthedocument.
8.1Method1:Using
pdftops
Oneofthemostfrequentlyusedmethodstodothison*nixsystemsconsistsofthefollowingsteps:
1. ConvertthePDFtoPostScript,forexamplebyusingXPDF’s
pdftops
¹(onWindows:
pdftops.exe
helperprogram.
2. Nowfontswillbeembeddedin
.pfa
(PostScript)format+youcanextractthemusingatexteditor.
3. Youmayneedtoconvertthe
.pfa
(ASCII)toa
.pfb
(binary)fileusingthe
t1utils
and
pfa2pfb
.
4. InPDFstherearenever
.pfm
or
.afm
files(fontmetricfiles)embedded(becausePDFviewerhave
internalknowledgeaboutthese).Withoutthese,fontfilesarehardlyusableinavisuallypleasing
way.
8.2Method2:Using
fontforge
AnothermethodistousetheFreefonteditorFontForge²:
1. Usethe“OpenFont”dialogboxusedwhenopeningfiles.
2. Thenselect“ExtractfromPDF”inthefiltersectionofdialog.
3. SelectthePDFfilewiththefonttobeextracted.
¹
http://www.foolabs.com/xpdf/download.html
²
http://fontforge.sourceforge.net/
29
HowcanIextractembeddedfontsfromaPDFasvalidfontfiles?
30
4. A“Pickafont”dialogboxopens–selectherewhichfonttoopen.
ChecktheFontForgemanual.Youmayneedtofollowafewspecificstepswhicharenotnecessarily
straightforwardinordertosavetheextractedfontdataasafilewhichisre-usable.
8.3Method3:Using
mupdf
Next,MuPDF³.Thisapplicationcomeswithautilitycalled
pdfextract
(onWindows:
pdfextract.exe
)
whichcanextractfontsandimagesfromPDFs.(Incaseyoudon’tknowaboutMuPDF,whichstillis
relativelyunknownandnew:“MuPDFisaFreelightweightPDFviewerandtoolkitwritteninportableC.”,
writtenbyArtifexSoftwaredevelopers,thesamecompanythatgaveusGhostscript.)
Note:pdfextract.exeisacommand-lineprogram.Touseit,dothefollowing:
1
c:\> pdfextract.exe c:\path\to\filename.pdf
# (on Windows)
2
$>
pdfextract /path/tofilename.pdf
# (on Linux, Unix, Mac OS X)
Thiscommandwilldumpalloftheextractablefilesfromthepdffilereferencedintothecurrentdirectory.
Generallyyouwillseeavarietyoffiles:imagesaswellasfonts.TheseincludePNG,TTF,CFF,CID,etc.
Theimagenameswillbelikeimg-0412.pngifthePDFobjectnumberoftheimagewas412.Thefontnames
willbelikeFGETYK+LinLibertineI-0966.ttf,ifthefont’sPDFobjectnumberwas966.
CFF(CompactFontFormat)filesarearecognizedformatthatcanbeconvertedtootherformatsviaa
varietyofconvertersforuseondifferentoperatingsystems.
Again:beawarethatmostofthesefontfilesmayhaveonlyasubsetofcharactersandmaynotrepresent
thecompletetypeface.
Update:(Jul2013)Recentversionsof
mupdf
haveseenaninternalreshufflingandrenamingoftheir
binaries,notjustonce,butseveraltimes.Themainutilityusedtobea‘swissknife’-alikebinarycalled
mubusy
(nameinspiredbybusybox?),whichmorerecentlywasrenamedto
mutool
.Thesesupportthe
sub-commands
info
,
clean
,
extract
,
poster
and
show
.Unfortunatey,theofficialdocumentationforthese
toolsisn’tuptodate(yet).Ifyou’reonaMacusing‘MacPorts’:thentheutilitywasrenamedinorderto
avoidnameclasheswithotherutilitiesusingidenticalnames,andyoumayneedtouse
mupdfextract
.
Toachievethe(roughly)equivalentresultswith
mutool
asitsprevioustool
pdfextract
did,justrun
mubusy extract ...
.*
Sotoextractfontsandimages,youmayneedtorunoneofthefollowingcommandlines.
OnWindows:
³
http://mupdf.com/
Documents you may be interested
Documents you may be interested