itextsharp pdf to image c# : Convert pdf to editable form Library application class asp.net azure .net ajax bioinf-py-21-part1744

genesonoroff. Thesemoleculesbindpreferentially tospecificDNA sequences,
andthisbindingpreferencepatterncanberepresentedbyatableof frequencies
ofgiven symbolsateachpositionofthe pattern. More precisely, eachrowin
thetable correspondsto thebasesA,C,G,andT,while columnjreflectshow
many timesthebase appearsinpositionjinthe DNAsequence.
Forexample, ifoursetofDNAsequencesare TAG,GGT, andGGG, the
table becomes
base 0 1 2
A
0 1 0
C
0 0 0
G
2 2 2
T
1 0 1
FromthistablewecanreadthatbaseAappearsonce inindex1intheDNA
strings,baseCdoesnotappearatall,baseGappearstwiceinallpositions,and
baseTappearsonceinthe beginningandendofthestrings.
Inthefollowingweshallpresentdifferentdatastructurestoholdsuchatable
anddifferentwaysof computingthem. Thetableisknownasafrequencymatrix
inbioinformaticsandthisisthe termusedheretoo.
Separate FrequencyLists. Sinceweknowthatthere areonlyfourrowsin
thefrequencymatrix,anobviousdatastructurewouldbefourlists,eachholding
arow. A functioncomputingtheselistsmaylooklike
def freq_lists(dna_list):
n = len(dna_list[0])
A = [0]*n
T = [0]*n
G = [0]*n
C = [0]*n
for dna in dna_list:
for index, base in enumerate(dna):
if base == ’A’:
A[index] += 1
elif base == ’C’:
C[index] += 1
elif base == ’G’:
G[index] += 1
elif base == ’T’:
T[index] += 1
return A, C, G, T
Weneedtoinitializethelistswiththerightlengthandazeroforeachelement,
since each listelementisto be usedasa counter. Creating a listoflengthn
withobjectx in all positionsisdone by[x]*n. Finding the proper lengthis
here carriedoutbyinspecting the lengthofthefirstelementindna_list,the
listofallDNAstringstobecounted,assumingthatallelementsinthislisthave
thesame length.
11
Convert pdf to editable form - C# PDF Field Edit Library: insert, delete, update pdf form field in C#.net, ASP.NET, MVC, Ajax, WPF
Online C# Tutorial to Insert, Delete and Update Fields in PDF Document
add image to pdf form; add text fields to pdf
Convert pdf to editable form - VB.NET PDF Field Edit library: insert, delete, update pdf form field in vb.net, ASP.NET, MVC, Ajax, WPF
How to Insert, Delete and Update Fields in PDF Document with VB.NET Demo Code
create a form in pdf; change font size in pdf form
Intheforloopwe applytheenumeratefunction,whichisusedtoextract
boththeelementvalue and the elementindexwheniterating overa sequence.
Forexample,
>>> for index, base in enumerate([’t’, ’e’, ’s’, ’t’]):
...
print index, base
...
0 t
1 e
2 s
3 t
Here isatest,
dna_list = [’GGTAG’, ’GGTAC’, ’GGTGC’]
A, C, G, T = freq_lists(dna_list)
print A
print C
print G
print T
withoutput
[0, 0, 0, 2, 0]
[0, 0, 0, 0, 2]
[3, 3, 0, 1, 1]
[0, 0, 3, 0, 0]
Nested List. ThefrequencymatrixcanalsoberepresentedasanestedlistM
suchthatM[i][j]isthe frequencyofbasei in positionjin the set ofDNA
strings. Herei isaninteger, where 0correspondsto A,1toT,2to G,and3
to C.Thefrequencyis the numberoftimesbasei appearsinpositionj in a
setofDNAstrings. SometimesthisnumberisdividedbythenumberofDNA
stringsinthe setso thatthe frequencyisbetween 0and 1. Note thatallthe
DNA stringsmusthavethesamelength.
The simplestwayto make a nestedlististo inserttheA,C,G, andT lists
intoanotherlist:
>>> frequency_matrix = [A, C, G, T]
>>> frequency_matrix[2][3]
2
>>> G[3] # same element
2
Alternatively, we can illustrate how to compute this type of nested list
directly:
def freq_list_of_lists_v1(dna_list):
# Create empty frequency_matrix[i][j] = 0
# i=0,1,2,3 corresponds to A,T,G,C
# j=0,...,length of dna_list[0]
frequency_matrix = [[0 for v in dna_list[0]] for x in ’ACGT’]
12
C# PDF Convert to Text SDK: Convert PDF to txt files in C#.net
RasterEdge PDF document conversion SDK provides reliable and effective .NET solution for Visual C# developers to convert PDF document to editable & searchable
allow saving of pdf form; edit pdf form
C# PDF Convert to Word SDK: Convert PDF to Word library in C#.net
hardly edit PDF document. Under this situation, you need to convert PDF document to some easily editable files like Word document.
change font size in fillable pdf form; best way to create pdf forms
for dna in dna_list:
for index, base in enumerate(dna):
if base == ’A’:
frequency_matrix[0][index] +=1
elif base == ’C’:
frequency_matrix[1][index] +=1
elif base == ’G’:
frequency_matrix[2][index] +=1
elif base == ’T’:
frequency_matrix[3][index] +=1
return frequency_matrix
Asinthecasewithindividuallistsweneedtoinitializeallelementsinthe
nestedlisttozero.
Acallandprintout,
dna_list = [’GGTAG’, ’GGTAC’, ’GGTGC’]
frequency_matrix = freq_list_of_lists_v1(dna_list)
print frequency_matrix
resultsin
[[0, 0, 0, 2, 0], [0, 0, 0, 0, 2], [3, 3, 0, 1, 1], [0, 0, 3, 0, 0]]
DictionaryforMore Convenient Indexing. Theseriesofiftestsinthe
Pythonfunctionfreq_list_of_lists_v1aresomewhatcumbersome,especially
ifwewanttoextendthecodetootherbioinformaticsproblemswherethealphabet
islarger. Whatwewantisamappingfrombase,whichisa character,tothe
corresponding index 0, 1, 2, or 3. A Python dictionarymay representsuch
mappings:
>>> base2index = {’A’: 0, ’C’: 1, ’G’: 2, ’T’: 3}
>>> base2index[’G’]
2
Withthebase2index dictionarywedonotneedthe seriesof if testsandthe
alphabet’ATGC’couldbe muchlargerwithoutaffectingthelengthofthecode:
def freq_list_of_lists_v2(dna_list):
frequency_matrix = [[0 for v in dna_list[0]] for x in ’ACGT’]
base2index = {’A’: 0, ’C’: 1, ’G’: 2, ’T’: 3}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base2index[base]][index] += 1
return frequency_matrix
13
C# Create PDF Library SDK to convert PDF from other file formats
file. Create and save editable PDF with a blank page, bookmarks, links, signatures, etc. Create fillable PDF document with fields.
convert pdf to editable form; change font in pdf fillable form
VB.NET PDF Convert to Word SDK: Convert PDF to Word library in vb.
Convert PDF document to DOC and DOCX formats in Visual Basic control to export Word from multiple PDF files in Create editable Word file online without email.
build pdf forms; cannot save pdf form in reader
NumericalPython Array. Aslongaseachsublistinalistoflistshasthe
samelength,alistoflistscanbereplacedbyaNumericalPython(numpy)array.
Processing ofsucharrays is oftenmuch more efficient than processingofthe
nestedlistdatastructure. Toinitializeatwo-dimensionalnumpyarrayweneedto
knowitssize,here4timeslen(dna_list[0]). Onlythefirstlineinthefunction
freq_list_of_lists_v2needstobechangedinordertoutilizeanumpyarray:
import numpy as np
def freq_numpy(dna_list):
frequency_matrix = np.zeros((4, len(dna_list[0])), dtype=np.int)
base2index = {’A’: 0, ’C’: 1, ’G’: 2, ’T’: 3}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base2index[base]][index] += 1
return frequency_matrix
Theresultingfrequency_matrixobjectcanbeindexedas[b][i]or[b,i],
withintegersb andi. Typically,b willbe somethinglinebase2index[’C’].
Dictionaryof Lists. Insteadofgoingfromacharactertoanintegerindex
viabase2index,we mayprefertoindexfrequency_matrix bythebasename
andthe position indexdirectly, like in[’C’][14]. This isthe mostnatural
syntaxforauserofthefrequencymatrix. TherelevantPythondatastructureis
thenadictionary of lists. Thatis,frequency_matrixisadictionary withkeys
’A’,’C’,’G’,and ’T’.Thevalueforeachkeyisalist. Letusnowalsoextend
the flexibility such thatdna_list can have DNAstrings of different lengths.
Thelistsinfrequency_listwillhave lengthsequaltothelongestDNAstring.
Arelevantfunctionis
def freq_dict_of_lists_v1(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {
’A’: [0]*n,
’C’: [0]*n,
’G’: [0]*n,
’T’: [0]*n,
}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base][index] += 1
return frequency_matrix
Runningthe testcode
frequency_matrix = freq_dict_of_lists_v1(dna_list)
import pprint
# for nice printout of nested data structures
pprint.pprint(frequency_matrix)
resultsinthe output
14
VB.NET PDF Convert to Text SDK: Convert PDF to txt files in vb.net
to editable & searchable text formats. Support .NET WinForms, ASP.NET MVC in IIS, ASP.NET Ajax, Azure cloud service, DNN (DotNetNuke), SharePoint. Convert PDF
add text field pdf; add email button to pdf form
VB.NET Create PDF from PowerPoint Library to convert pptx, ppt to
Link: Edit URL. Bookmark: Edit Bookmark. Metadata: Edit, Delete Metadata. Form Process. Convert multiple pages PowerPoint to fillable and editable PDF documents.
change font pdf form; add text field to pdf acrobat
{’A’: [0, 0, 0, 2, 0],
’C’: [0, 0, 0, 0, 2],
’G’: [3, 3, 0, 1, 1],
’T’: [0, 0, 3, 0, 0]}
Theinitializationoffrequency_matrixintheabovecodecanbemademore
compactbyusingadictionary comprehension:
dict = {key: value for key in some_sequence}
Inourexample weset
frequency_matrix = {base: [0]*n for base in ’ACGT’}
Adoptingthisconstructioninthefreq_dict_of_lists_v1functionleadstoa
slightlymorecompactversion:
def freq_dict_of_lists_v2(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {base: [0]*n for base in ’ACGT’}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base][index] += 1
return frequency_matrix
Asanadditionalcommentoncomputingthemaximumlengthofthe DNA
strings canbe made asthere are severalalternative waysofdoing this. The
classical useof maxistoapplyittoalistasdoneabove:
n = max([len(dna) for dna in dna_list])
However,forverylonglistsitispossibletoavoidthememorydemandsofstoring
the resultofthe listcomprehension,i.e., the listoflengths. Insteadmax can
workwiththe lengthsasthey are computed:
n = max(len(dna) for dna in dna_list)
Itisalsopossible towrite
n = max(dna_list, key=len)
Here,len is appliedto each element indna_list, and the maximum of the
resultingvaluesisreturned.
DictionaryofDictionaries. Thedictionaryoflistsdatastructurecanalter-
nativelybe replacedbya dictionaryofdictionariesobject, oftenjustcalleda
dictofdictsobject. Thatis,frequency_matrix[base]isadictionarywithkey
iandvalueequaltotheaddednumberofoccurrencesof basein dna[i]for
alldnastringsinthelistdna_list. The indexingfrequency_matrix[’C’][i]
15
VB.NET Create PDF from Word Library to convert docx, doc to PDF in
Link: Edit URL. Bookmark: Edit Bookmark. Metadata: Edit, Delete Metadata. Form Process. Convert multiple pages Word to fillable and editable PDF documents.
create a pdf form online; adding text to a pdf form
VB.NET Create PDF from Excel Library to convert xlsx, xls to PDF
Create fillable and editable PDF documents from Excel in Create searchable and scanned PDF files from Excel in VB Convert to PDF with embedded fonts or without
convert word doc to pdf with editable fields; adding a signature to a pdf form
andthevaluesareexactly asinthelastexample;theonlydifferenceiswhether
frequency_matrix[’C’]isa listordictionary.
Ourfunctionworkingwithfrequency_matrixasadictofdictsiswrittenas
def freq_dict_of_dicts_v1(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {base: {index: 0 for index in range(n)}
for base in ’ACGT’}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base][index] += 1
return frequency_matrix
UsingDictionarieswithDefaultValues. Themanualinitializationofeach
subdictionary tozero,
frequency_matrix = {base: {index: 0 for index in range(n)}
for base in ’ACGT’}
canbesimplifiedby usingadictionarywithdefaultvaluesforanykey. Thecon-
structiondefaultdict(lambda: obj) makesadictionarywithobjasdefault
value. Thisconstructionsimplifiesthepreviousfunctionabit:
from collections import defaultdict
def freq_dict_of_dicts_v2(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {base: defaultdict(lambda: 0)
for base in ’ACGT’}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base][index] += 1
return frequency_matrix
Remark. Dictionary comprehensionswerenewinPython2.7and3.1,but
can be simulated in earlier versions by making (key, value) tuples via list
comprehensions. Adictionarycomprehension
d = {key: value for key in sequence}
isthenconstructedas
d = dict([(key, value) for key in sequence])
16
C# Create PDF from Excel Library to convert xlsx, xls to PDF in C#
Create fillable and editable PDF documents from Excel in both .NET WinForms and ASP.NET. Create searchable and scanned PDF files from Excel. Convert to PDF with
adding text to pdf form; acrobat create pdf form
Using Arrays andVectorization. Thefrequency_matrixdictoflistsfor
caneasilybe changedtoadictof numpyarrays: justreplacetheinitialization
[0]*n bynp.zeros(n, dtype=np.int). The indexingremainsthesame:
def freq_dict_of_arrays_v1(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {base: np.zeros(n, dtype=np.int)
for base in ’ACGT’}
for dna in dna_list:
for index, base in enumerate(dna):
frequency_matrix[base][index] += 1
return frequency_matrix
Havingfrequency_matrix[base]asanumpyarrayinsteadofalistdoesnot
give anyimmediate advantage, asthestorageandCPUtimeisaboutthe same.
Theloopoverthedna stringandthe associatedindexingiswhatconsumesall
the CPUtime. However, thenumpy arraysprovide a potential for increasing
efficiencythroughvectorization,i.e., replacingtheelement-wiseoperationson
dna andfrequency_matrix[base] byoperationsontheentirearraysatonce.
LetususetheinteractivePythonshelltoexplore thepossibilitiesofvector-
ization. Wefirstconvertthestringtoanumpy arrayofcharacters:
>>> dna = ’ACAT’
>>> dna = np.array(dna, dtype=’c’)
>>> dna
array([’A’, ’C’, ’A’, ’T’],
dtype=’|S1’)
Foragivenbase,sayA,wecaninone vectorizedoperationfindwhichlocations
indnathatcontainA:
>>> b = dna == ’A’
>>> b
array([ True, False, True, False], dtype=bool)
Byconvertingbtoanintegerarrayiwecanupdatethefrequencycountsforall
indicesbyaddingi tofrequency_matrix[’A’]:
>>> i = np.asarray(b, dtype=np.int)
>>> i
array([1, 0, 1, 0])
>>> frequency_matrix[’A’] = frequency_matrix[’A’] + i
Thisrecipecanberepeatedforallbases:
for dna in dna_list:
dna = np.array(dna, dtype=’c’)
for base in ’ACGT’:
b = dna == base
i = np.asarray(b, dtype=np.int)
frequency_matrix[base] = frequency_matrix[base] + i
17
It turns out that we do not need to convert the boolean array b to an
integerarrayi,becausedoing arithmeticswithb directlyispossible: False is
interpretedas0andTrueas1inarithmeticoperations. Wecanalsousethe+=
operatorto update allelementsoffrequency_matrix[base]directly,without
firstcomputingthe sumoftwoarraysfrequency_matrix[base] + i andthen
assigningthisresulttofrequency_matrix[base]. Collectingalltheseideasin
one functionyieldsthe code
def freq_dict_of_arrays_v2(dna_list):
n = max([len(dna) for dna in dna_list])
frequency_matrix = {base: np.zeros(n, dtype=np.int)
for base in ’ACGT’}
for dna in dna_list:
dna = np.array(dna, dtype=’c’)
for base in ’ACCT’:
frequency_matrix[base] += dna == base
return frequency_matrix
Thisvectorizedfunctionrunsalmost10timesasfastasthe(scalar)counterpart
freq_list_of_arrays_v1!
1.5 Analyzing the Frequency Matrix
Havingbuiltafrequencymatrix outof acollectionof DNA strings,itistimeto
useitforanalysis. TheshortDNA stringsthatafrequency matrixisbuiltout
of,istypicallyasetofsubstringsofalargerDNAsequence,whichsharessome
commonpurpose. Anexampleofthisistohaveasetofsubstringsthatservesas
akindofanchors/magnetsatwhichgivenmoleculesattachtoDNAandperform
biologicalfunctions(like turninggenesonoroff). Withthefrequencymatrix
constructedfromalimitedsetofknownanchorlocations(substrings),we can
now scan forothersimilarsubstrings that have the potential toperformthe
samefunction. Thesimplestwaytodothisistofirstdeterminethemosttypical
substringaccordingtothefrequency matrix,i.e.,the substringhavingthe most
frequentnucleotideateachposition. Thisisreferredtoastheconsensusstring
ofthe frequency matrix. We can thenlook for occurrencesof the consensus
substringin alargerDNAsequence, and considerthese occurrencesaslikely
candidatesforservingthesamefunction(e.g.,asanchorlocationsformolecules).
Forinstance, giventhree substringsACT, CCAandAGA, the frequency
matrixwouldbe(listoflists,withrowscorrespondingtoA,C,G,andT):
[[2, 0, 2]
[1, 2, 0]
[0, 1, 0]
[0, 0, 1]]
We see thatforposition0, which corresponds tothe left-mostcolumninthe
table, the symbolAhasthe highestfrequency(2). The maximumfrequencies
for the other positionsare seen to be C for position 1, and Afor position 2.
Theconsensusstringistherefore ACA.Notethattheconsensusstringdoesnot
18
needtobeequaltoanyofthe substringsthatformedthebasisof thefrequency
matrix (thisisindeedthecasefortheabove example).
List of Lists FrequencyMatrix. Letfrequency_matrixbealistoflists.
Foreachpositioni werunthroughtherowsinthefrequencymatrixandkeep
trackofthemaximumfrequencyvalueandthe correspondingletter. Iftwo or
more lettershavethesame frequencyvalueweuseadashtoindicatethatthis
positionintheconsensusstringisundetermined.
The followingfunctioncomputesthe consensusstring:
def find_consensus_v1(frequency_matrix):
base2index = {’A’: 0, ’C’: 1, ’G’: 2, ’T’: 3}
consensus = ’’
dna_length = len(frequency_matrix[0])
for i in range(dna_length): # loop over positions in string
max_freq = -1
# holds the max freq. for this i
max_freq_base = None
# holds the corresponding base
for base in ’ATGC’:
if frequency_matrix[base2index[base]][i] > max_freq:
max_freq = frequency_matrix[base2index[base]][i]
max_freq_base = base
elif frequency_matrix[base2index[base]][i] == max_freq:
max_freq_base = ’-’ # more than one base as max
consensus += max_freq_base # add new base with max freq
return consensus
Since thiscode requiresfrequency_matrix to be a listoflists we should
inserta testandraiseanexceptionifthetypeiswrong:
def find_consensus_v1(frequency_matrix):
if isinstance(frequency_matrix, list) and \
isinstance(frequency_matrix[0], list):
pass # right type
else:
raise TypeError(’frequency_matrix must be list of lists’)
...
Dict of Dicts FrequencyMatrix. Howmustthefind_consensus_v1func-
tionbealterediffrequency_matrixisadictofdicts?
1. Thebase2indexdictisnolongerneeded.
2. Accessofsublist,frequency_matrix[0],totestfortypeandlengthof
the strings,mustbereplacedby frequency_matrix[’A’].
Theupdatedfunctionbecomes
19
def find_consensus_v3(frequency_matrix):
if isinstance(frequency_matrix, dict) and \
isinstance(frequency_matrix[’A’], dict):
pass # right type
else:
raise TypeError(’frequency_matrix must be dict of dicts’)
consensus = ’’
dna_length = len(frequency_matrix[’A’])
for i in range(dna_length): # loop over positions in string
max_freq = -1
# holds the max freq. for this i
max_freq_base = None
# holds the corresponding base
for base in ’ACGT’:
if frequency_matrix[base][i] > max_freq:
max_freq = frequency_matrix[base][i]
max_freq_base = base
elif frequency_matrix[base][i] == max_freq:
max_freq_base = ’-’ # more than one base as max
consensus += max_freq_base # add new base with max freq
return consensus
Here isatest:
frequency_matrix = freq_dict_of_dicts_v1(dna_list)
pprint.pprint(frequency_matrix)
print find_consensus_v3(frequency_matrix)
withoutput
{’A’: {0: 0, 1: 0, 2: 0, 3: 2, 4: 0},
’C’: {0: 0, 1: 0, 2: 0, 3: 0, 4: 2},
’G’: {0: 3, 1: 3, 2: 0, 3: 1, 4: 1},
’T’: {0: 0, 1: 0, 2: 3, 3: 0, 4: 0}}
Consensus string: GGTAC
Letustryfind_consensus_v3withthedictofdefaultdictsasinput(freq_dicts_of_dicts_v2).
Thecoderunsfine,buttheoutputstringisjustG!Thereasonisthatdna_length
is1,andthereforethatthelengthoftheAdictinfrequency_matrixis1. Print-
ingoutfrequency_matrixyields
{’A’: defaultdict(X, {3: 2}),
’C’: defaultdict(X, {4: 2}),
’G’: defaultdict(X, {0: 3, 1: 3, 3: 1, 4: 1}),
’T’: defaultdict(X, {2: 3})}
whereourXisashortformfortextlike
‘<function <lambda> at 0xfaede8>‘
Weseethatthelengthofadefaultdictwillonlycountthenonzeroentries. Hence,
touse adefaultdictourfunctionmustgetthelengthoftheDNAstringtobuild
asanextraargument:
20
Documents you may be interested
Documents you may be interested