45
def find_consensus_v3(frequency_matrix):
if isinstance(frequency_matrix, dict) and \
isinstance(frequency_matrix[’A’], dict):
pass # right type
else:
raise TypeError(’frequency_matrix must be dict of dicts’)
consensus = ’’
dna_length = len(frequency_matrix[’A’])
for i in range(dna_length): # loop over positions in string
max_freq = -1
# holds the max freq. for this i
max_freq_base = None
# holds the corresponding base
for base in ’ACGT’:
if frequency_matrix[base][i] > max_freq:
max_freq = frequency_matrix[base][i]
max_freq_base = base
elif frequency_matrix[base][i] == max_freq:
max_freq_base = ’-’ # more than one base as max
consensus += max_freq_base # add new base with max freq
return consensus
Here isatest:
frequency_matrix = freq_dict_of_dicts_v1(dna_list)
pprint.pprint(frequency_matrix)
print find_consensus_v3(frequency_matrix)
withoutput
{’A’: {0: 0, 1: 0, 2: 0, 3: 2, 4: 0},
’C’: {0: 0, 1: 0, 2: 0, 3: 0, 4: 2},
’G’: {0: 3, 1: 3, 2: 0, 3: 1, 4: 1},
’T’: {0: 0, 1: 0, 2: 3, 3: 0, 4: 0}}
Consensus string: GGTAC
Letustryfind_consensus_v3withthedictofdefaultdictsasinput(freq_dicts_of_dicts_v2).
Thecoderunsfine,buttheoutputstringisjustG!Thereasonisthatdna_length
is1,andthereforethatthelengthoftheAdictinfrequency_matrixis1. Print-
ingoutfrequency_matrixyields
{’A’: defaultdict(X, {3: 2}),
’C’: defaultdict(X, {4: 2}),
’G’: defaultdict(X, {0: 3, 1: 3, 3: 1, 4: 1}),
’T’: defaultdict(X, {2: 3})}
whereourXisashortformfortextlike
‘<function <lambda> at 0xfaede8>‘
Weseethatthelengthofadefaultdictwillonlycountthenonzeroentries. Hence,
touse adefaultdictourfunctionmustgetthelengthoftheDNAstringtobuild
asanextraargument:
20