Data Science

Processing all teh Files in Directory

Download PDF

[Cross-posted from ICanHazDataScience]

Okay. So we’ve talked a bit about getting set up in Python, and about how to read in different types of file (cool visualisation tools, streams, pdfs, apis and webpages next, promise!).  But what if you’ve got a whole directory of files and no handy way to read them all into your program.

Well, actually, you do.  It’s this:

import globimport os


datadir = “dir1/dir2”

csvfiles = glob.glob(os.path.join(datadir, ‘*.csv’))


for infile_fullname in csvfiles:

filename = infile_fullname[len(datadir)+1:]


That’s it.”os.path.join” sticks your directory name (“dir1/dir2”) to the filetype you’re looking for (“*.csv” here to pull in all the CSV files in the directory, but you could ask for anything, like “a*.*” for files starting with the letter “a”, “*.xls” for excel files, etc etc).  “glob.glob(filepath)” uses the glob library to get the names of all the files in the directory.  And “infile_fullname[len(datadir)+1:] ” gives you just the names of the files, without the directory name attached.

And at this point, you can use those filenames to open the files, and do whatever it was that you wanted to do to them all.  Have fun!