Archive for July, 2007

Rename Files With Spaces or Changing From Uppercase to lowercase

Thursday, July 5th, 2007

This info has been around for a while, but I thought I’d post it so I wouldn’t have to go hunting for the info again.

Rename files with capital letters in the filename to all lowercase -

#!/bin/bash
for i in $@; do
    new=`echo $i | tr 'A-Z' 'a-z'`;
    if [ "$i" != "$new" ]; then
        mv "$i" "$new";
    fi;
done;

Or if you’d rather not use a shell script file to do it -

files="file1 file2"; \\
for i in `echo $files`; do \\
    new=`echo $i | tr 'A-Z' 'a-z'`; \\
    if [ "$i" != "$new" ]; then \\
        mv "$i" "$new"; \\
    fi; \\
done;

If you’re dealing with files that have spaces in the names you can do something like this -

for i in *.xls; do \\
    new=`echo $i | sed -r "s/ +/-/g"`; \\
    mv "$i" "$new"; \\
done;

If you’re needing something a bit more complex or you’re pulling data out of a text file via cat you’ll need to convert spaces prior to doing a loop. In this example I’m doing a find on a directory tree and converting names as I go -

files=`find -name '*.xls' | sed -r "s/ /\\*-/g"`; \\
for i in `echo $files`; do \\
    file=`echo $i | sed -r "s/\*-/ /g"`; \\
    dir=`echo $file | sed -r "s/\\/[^\\/]+$/\\//"; \\
    newname=`echo $file | sed -r "s/.+\\/([^\\/]+)$/\\1/" | sed -r "s/ +/-/g"`; \\
    if [ "$file" != "$dir$newname" ]; then \\
        mv "$file" "$dir$newname"; \\
    fi; \\
done;

The checking portion isn’t necessary afaik, but it prevents the mv utility from outputing lots of information stating ‘filex’ and ‘filex’ are the same file.

  • Share/Save/Bookmark

Friends & Family Fileserver & Backup Idea

Monday, July 2nd, 2007

This was originally started somewhere around 2007-06-27, but has taken several days of digging through information and sorting ideas to put this together.

I’ve been wanting to set up a raid5 array on a local file server for a while. I’m wanting something somewhat small and hopefully with a lower power consumption than a standard ATX. Something I can haul along with me fairly easily if and when I move again. So I’ve been looking for ARM, MIPS, and mini-ITX motherboards again, but considering I’ve not really found much for ARM or MIPS I think I may just go with a mini-ITX. Something similar to the Tux Server Project is my target.

And then the web/file/etc server at my parents’ house went down last Monday or Tuesday. Long story short - 1 of the drives in the raid array is dying and there’s no spare drive.

Back to my local file server… I started thinking about how I could get some sort of backup or redundancy for my file server so I wouldn’t lose data. I could set up a 2nd raid5 array and then set both raid arrays as a raid1 array (mirrored). While a possibility this causes issues in regards to being portable and low power consumption. Not to mention what if - flood, tornado, stolen, thrown out a window - there goes everything.

I’ve thought about remote backups before, but I don’t really want to be paying somebody else some sort of fee (monthly? yearly?) and maybe get stuck with how often I can upload/download data… plus, do they backup the data as well or is it just a second location for the data? Plus, why pay a monthly fee for something if I can set it up myself?

I’ve looked at rsync before and thought it might be a great idea for keeping stuff backed up between computers on a LAN. For those that don’t know, rsync allows you to keep 2 locations in sync. 2 directories, 2 computers, 2 whatever. One thing that makes rsync preferable over ftp, sftp, scp, etc is that rsync does a checksum calculation between files to determine whether a certain file needs to be transferred or not. Unfortunately, again with storing backups in 1 location, the idea of keeping rsync’ed copies on a LAN is prone to flood, tornado, etc.

The mirrored raid5 solution got me to thinking about all of my previous ideas, but I’m wanting to make things easier… I’ve tried working out various backup ideas before, but the only 1 I’ve really implemented is Subversion when dealing with code. It’s one of those, if I have to do to much every single time to get it to work, then I might not get around to it kind of issues. Sure, I’ve done tape, CD, DVD, flash drive backups at different times, but these require me to sit down and waste my time in preparation for it. Most of the time it also means it’s running while I’m awake so I can switch the backup media… Which means all I get to accomplish during that period of time is to watch TV and tap my finger as I wait for the process to get done.

I’m a geek gosh darn it! Not only that I’m lazy! Surely there’s a way for me to set something up and just leave it to run by itself. Sure, at some point I’ll probably have to intervene and probably do some maintenance, but every time I need a backup? I just want something that works without me messing with it except to fix hardware issues on occasion.

I’m thinking of fixing up my parents’ file server in Kansas City, MO and then setting up my own locally (currently Tulsa, OK) - both with about the same amount of disk space. After setting them up I’ll do an initial rsync on my parents’ LAN. With a file server in 2 different locations I can then set them up to do a daily rsync. Unfortunately I’ve run into a couple of snags with the idea.

In looking at the various raid controllers, hard drive enclosures and hard drives some information is fairly apparent by reading the detailed information. Other stuff I’m not sure where to look for answers…

  • If a drive in the raid array goes bad, will the computer beep (this would be useful as my parents could call and say that the computer is beeping)? Or at the least, will the hard drive enclosure light up in a certain way or do something else to notify you to the fact that a drive has gone bad?
  • Will some brands of RAID cards work together? I.e. if I get a 4-port RAID controller today and later I want 8 hard drives in my array, can I buy another 4-port RAID controller or am I going to have to toss the 4-port and get an 8-port?
  • Are there any available hardware RAID cards that support raid6?
  • How close do the hard drives in a RAID array have to match? Do they only have to match in size or do they have to match in the number of cylinders, heads, and/or sectors as well? So far everyone I’ve talked to simply use the exact same brand and model for all the hard drives… which works fine, but what happens in 3+ years after the warranty is up on the hard drive and you can no longer find any more of that specific brand and model?

Another issue is that rsync is great for adding files that exist on 1 computer to another computer. However, the reverse (deletion) is where I’m having issues, especially in trying to maintain 2+ locations. If 1 of them is allowed to delete (even if it’s only once a week or during a maintenance cycle)… which 1 is allowed to delete? And how much is it allowed to delete?

What I’m looking at is that the users whose fileserver is local will determine which server is the 1 allowed to delete files and the other servers simply acts as a storage device. The problem area is the public storage… I think at a certain point I may announce (by calling or emailing them) that a certain day is a maintenance day and this will be a deletion day… If there are certain files in public storage that need to be deleted, they should send me a list or something… Otherwise I’ll simply go through on my server and delete public files that no longer need to be around (do I really need the last 6 versions of Firefox installation files for windows?). After that is done I can run a rsync --delete command pushing deletions out to the other computers.

My plan is to have a cron job that will execute a shell script once a day from 1 of the computers that will look something like this -

#!/bin/bash

# all of the users along with @domain to show which
# server is their local domain.
users="patrick@my.dynamic.domain.com
    mom@some.dynamic.domain.com
    dad@some.dynamic.domain.com
    sister@another.dynamic.domain.com
    bro-in-law@another.dynamic.domain.com";

# the server that this script is being run at
localserver="my.dynamic.domain.com";

# the other servers in the rsync backup queue.
remoteservers="some.dynamic.domain.com
    another.dynamic.domain.com";

for user in `echo $users`; do
    # the username is everything before the @
    username=`echo $user | awk -F\@ '{print $1}'`;
    # the servername is everything after the @
    servername=`echo $user | awk -F\@ '{print $2}'`;

    # If the servername is not the same as the
    # local server, then we need to sync the
    # local server with the remote server.
    # Anything that the user deleted on their
    # local file server should be deleted on the
    # local server as well.
    #
    # we skip the local server because there's
    # no need to sync the local server with itself.
    if [ "$servername" != "$localserver" ]; then
        rsync --delete --compress --archive \
            backupuser@$servername:/home/$username/ \
            /home/$username/;
    fi;

    # local server is not in the remoteservers list
    # so we don't have to worry about excluding
    # it here.
    #
    # sync the remote servers with the remaining
    # user home dirs.
    for remote in `echo $remoteservers`; do
        if [ "$servername" != "$remote" ]; then
            rsync --delete --compress --archive \
                /home/$username/ \
                backupuser@$remote:/home/$username/;
        fi
    done;
done;

# sync the public storage space
for remote in `echo $remoteservers`; do
    # copy files from remote to local
    rsync --compress --archive \
        backupuser@$remote:/home/public/ \
        /home/public/;
    # copy files from local to remote
    rsync --compress --archive \
        /home/public/ \
        backupuser@$remote:/home/public/;
done;

for remote in `echo $remoteservers`; do
    # copy files from local to remote
    rsync --compress --archive \
        /home/public/ \
        backupuser@$remote:/home/public/;
done;

The biggest issue I have with this is the public storage. My understanding of the way rsync works is that you have to do all the calculations for what files are different on the 1 computer then copy those over. After doing that you have to redo the same calculation to discover the reverse in order to be able to copy those back. The biggest problem, that I could see anyways, is that the files on the local server and the first remote server will be backed up on all of the remaining remote servers. The following remote servers wouldn’t be replicated on preceding remote servers until the next time the script was run.

One option might be after the initial public storage is updated to the local server from all of the remote servers to then drop the last remote server from the list (it already has a complete copy of all the other remote servers). Then reverse the list of remote servers, then run the rsync again. However, each time rsync is run the servers have to calculate the file differences between them. Over a couple 100 meg this is probably not that big of a deal, but I’m currently looking at having file servers with between 500GB and 1TB worth of storage space available.

Even after going through several more pages on rsync calculations I still haven’t figured out which server does what calculations. It did, however, give my subconscious some more time to think about the rsync controller/round robin issue and I think I’ve got a bit better of an idea. The shell script above was modified (red is deletions, green is additions) to reflect my new thoughts on how I could deal with the issue of keeping all of the servers in sync without trying to figure out a way to do do multiple loops making rsync calls.

  • Share/Save/Bookmark