back up files
Here is another approach for creating back up files onto dropbox on regular basis by cron. A quite simple script though, here are some remark.
hostname
There are multiple approaches for obtaining hostname though, here I use socket.gethostname(), which seems to be a transliteration of gethostname() system call.
compression
Here I use bzip2 as compression algorithm, which seems to achieve more effective compression rate against files like plain text in comparison to other algorithm like gzip. But as a trade off to good compression rate, compression process seems to take longer time.
temporary file
I create backup file first under temporary directory and then move it under directory under dropbox in case that there is/are any change(s). As creating backup file directly under dropbox directory is quite slow due to traffic.
checksum
Finally I compare temporary backup file and existing file by value of MD5 checksum of each file. If they do not match, it indicates that some change(s) of file(s) may have arisen. If they are identical, then there was no change in files under target directory, so do nothing.
import datetime
import hashlib
import os
import os.path
import shutil
import socket
import tarfile
if __name__ == "__main__":
# base directory, which is one level upper than target directory
base = "<BASE_DIRECTORY_OF_YOUR_CHOICE>"
# backup target directory under base directory
target = "<BACK_UP_TARGET_DIRECTORY>"
# directories to exclude from backup file
exclude = ["<DIRECTORY_TO_EXCLUDE>"]
# backup file name / HOSTNAME_DAY.tar.bz2
backup = socket.gethostname() + "_" +\
datetime.date.today().strftime("%A") +\
".tar.bz2"
# temporary directory & temporary file
temp_dir = "/tmp"
temp_file = os.path.join(temp_dir, backup)
# destination directory
dest_dir = os.path.join(base, "Dropbox/<TARGET_DIRECTORY>")
dest_file = os.path.join(dest_dir, backup)
# let's start backup
# first create a backup file under temporary file
tar = tarfile.open(temp_file, "w:bz2")
os.chdir(base)
for root, dirs, files in os.walk(target):
for name in files:
for d in root.split('/'):
if d in exclude:
break
else:
tar.add(os.path.join(root, name))
tar.close()
# backup file creation has finished
# now determine if file replacement is required or not
if os.path.exists(dest_file):
# dictionary to store md5 checksum of each file
md5 = {}
for f in temp_file, dest_file:
with open(f) as file_to_check:
data = file_to_check.read()
md5[f] = hashlib.md5(data).hexdigest()
# compare checksum value of each file
# copy temporary file under destination directory
if md5[temp_file] != md5[dest_file]:
shutil.copy(temp_file, dest_dir)
else:
# somehow backup of same name does not exist, do copy
shutil.copy(temp_file, dest_dir)
If you would like to take backup, say, every one hour, then add such a line as follows in cron table.
$ crontab -l
0 * * * * python <PATH_TO_SCRIPT>/backup.py