Using Pandas to bulk process .CSV files

Aug 20

Aug 20 Using Pandas to bulk process .CSV files

Recently a client asked me to analyze all of the credits taken on checks they had received in the last year. They received details of each remittance in a CSV file, of which there were about 250 for the preceding year. I wrote a script utilizing the Pandas Python library to extract the data without having to manually process each file.

import pandas as pd
import numpy as np
import glob

#Create a list of all the file names in a folder

path ='C:/Users/Files/Remittance'
filenames = glob.glob(path + "/*.csv")

#Import each CSV file as a dataframe by calling the list of file names
#Add a column to each new dataframe with the value being pulled from the filename
#Note the upse of Latin1 encoding - this is due to a quirk of the files provided and UTF-8 #is preferred
#Add dataframe to list of dataframes

dfs = [] for filename in filenames:

filename_df = pd.read_csv(filename,encoding='latin1',names=['InvoiceNum','InvoiceDate','ShipDate','PO','Business','Order','Amount'])

filename_df['voucher'] = re.split('[\W+]', filename)[13][:7] * filename_df.shape[0]

dfs.append(filename_df)

#Concatenate all dataframe in list into one dataframe