85
15.2. Example: Cleaning up a photo directory
185
Iwanted to go through these files and figure out which of thetext files were really
captions and which were junk and then delete the bad files. The first thing to do
was to get a simple inventory of how many text files I had in of the sub-folders
using the following program:
import os
count = 0
for (dirname, dirs, files) in os.walk(
'
.
'
):
for filename in files:
if filename.endswith(
'
.txt
'
) :
count = count + 1
print
'
Files:
'
, count
python txtcount.py
Files: 1917
The key bit of code that makes this possible is the
os.walk
library in Python.
When we call
os.walk
and give it a starting directory, it will “walk” through all
of the directories and sub-directories recursively. The string “.” indicates to start
in the current directory and walk downward. As it encounters each directory, we
get three values in a tuple in the body of the
for
loop. The first value is the
current directory name, the second value isthelist of sub-directories in the current
directory, and the third value is a list of files in the current directory.
We do not have to explicitly look into each of the sub-directories because we can
count on
os.walk
to visit every folder eventually. But we do want to look at
each file, so we write a simple
for
loop to examine each of the files in the current
directory. Wecheck each file to seeif it ends with “.txt” and then count the number
of files through the whole directory tree that end with the suffix “.txt”.
Once we have a sense of how many files end with “.txt”, the next thing to do is try
to automatically determine in Python which files arebad and which files are good.
So we write a simple program to print out the files and the size of each file:
import os
from os.path import join
for (dirname, dirs, files) in os.walk(
'
.
'
):
for filename in files:
if filename.endswith(
'
.txt
'
) :
thefile = os.path.join(dirname,filename)
print os.path.getsize(thefile), thefile
Now instead of just counting the files, we create a file name by concatenating the
directory name with thename of thefilewithin the directory using
os.path.join
.
It is important to use
os.path.join
instead of string concatenation because on
Windows we use a backslash (
\
)to construct file paths and on Linux or Apple
we use a forward slash (
/
)to construct file paths. The
os.path.join
knows
these differences and knows what system we are running on and it does the proper
concatenation depending on the system. So the same Python code runs on either
Windows or UNIX-style systems.