11 May 2013

Calculating File Change List using SHA256

Realised today that I needed a good way to create a change list document for a small app that I am writing. The app has a few hundred screens, each of which is supported by a single data document that contains the information that the screen is showing.

This data is downloaded in individual JSON files from the web the first time the user launches the application. Each of these JSON files can be amended after the app is released. I wanted to be able to provide a quick and easy way for the app to download changelist information to detect if anything needed updating.

I wanted the data to be separated in such a way that if only one file is changed the user only needs to download that one piece of data but not the rest. This is crucial as the entire dataset is just shy of 100MB whereas each individual file averages around 400K.

Python to the rescue

Wrote a little script that creates a single changelist.json file for each of my data files using the built in SHA256 key generation library that ships with python. So simple and quick:

out_file_path = r"C:\outfiles"
jdata = {}
for path, subdirs, files in os.walk(r"C:\myfiles"):
    for name in files:
        if( not name.endswith(".json") or path == out_file_path ):
            continue
            
        json_file_path = os.path.join(path, name)
        
        if( json_file_path == out_file_path ):
            continue
        
        jdata[name.replace(".json", "")] = hashlib.sha256(open(json_file_path, 'rb').read()).hexdigest()
        
out_file_name = os.path.join(out_file_path, "changelist.json")
print "Writing "+str(len(jdata))+" change list items to output file '"+str(out_file_name)+"'"

# Now write the file to the out location and quit
with open(out_file_name, 'w') as outfile:
    json.dump(jdata, outfile, indent=4, separators=(',', ': '))

Python, gotta love it :)