Rusty's Blog

Thoughts and musings of someone who's not sure what 'normal' is…

Monday, August 24, 2009

DroidTrack – A tracks collector for Android phones.

Ok, first of all DroidTrack probably won’t be the final name for this collection of scripts. It would not surprise mi if someone els already had the name. That’s OK, we’ll work around that as the issue crops up.

This article will discuss a couple of methods for capturing location information and making it available.

If you went through college before Java became the popular Comp Sci language for writing programs in, you probably encountered Modula-II or earlier Pascal. Both are highly structured languages used to help teach structured programming. Lau has some similarities with Pascal. There are some good references out there, some of which have worked for me in the past.

I tend to write more in Python these days, and what I will present in a bit is more along those lines, but to show that most of this can be done in either Python or Lau, lets start with a sample of Lau.

require “android”
android.startLocating(“course”) –Obtain location from network.

android.sleep(5)  –Give the location sensor a moment to come online.

while true do
l = android.readLocation()
android.makeToast(“lat ” .. l.result.latitude .. “\nlon ” ..
l.result.longitude .. “\nalt ”  .. l.result.altitude)
android.sleep(5)
end

OK, starting with the first line. This is needed on every application that is going to interface with the Android phone.

The second line starts the locating intent. In othe words it tells the Android phone to start asking the network or the GPS receiver where the phone is and be ready to report it.

It takes a little bit of time for the service to kick in. For ‘network’ location that is because we are asking the cell phone towers to approximate where we are, and they simply take some time to get back to us. For the GPS receiver it is a bit more complicated. First the GPS receiver needs to determine what time it is. Oh, not in minutes and seconds, but at a much finer resolution. So it listens for GPS satelites and based on the time stamps that they are providing, determines approximately where all of them are relative to where the receiver is. That initial estimate gets the phone to within a certain distance of itself. Let’s estimate that to be the better part of a mile. Not really much help when trying to put us on a map, but it’s a start. Now that we have that estimate, the system improves the accuracy of the clock and gets a much tighter resolution of where it is on the globe (or above or below it.) This goes back and forth a few times. Ultimately using just this technique the GPS can estimate where it is to within about 80 feet. Well, that could put me on either side of this freeway. A major reason for this level of inaccuracy is that satelites do not have perfectly maintained orbits. So while we know that a sat is in orbit, it has a limited amount of accuracy involved as well. It may be hundreds of feet away from it’s predicted location.

Since we do like a little bit better resolution than that, they had to come up with a better way of getting an idea of where the satelites were so that we can get a better idea of where we are. The standard method of doing this is to use two GPS receivers, one at a fixed known location, that tells the mobile receiver what the inaccuracies are that the satellites are giving us. The problem is that this requires either a lot of extra hardware to get the two GPS receivers to talk to each other, it also requires that both GPS receivers work their location information out from the same satelites. That is not a given. Likewise this really only works when both receivers are seeing roughly the same sky. If you are over the horizon or around the world, you will probably see something different, in addition to having a bit of difficulty getting the current variation information.

The next best thing is to use a variation of this built into the GPS system. There are several satelites in orbit that provide ‘enhanced’ information to receivers that are designed to receive them. The enhanced information is part of a feature called WAAS – Wide Area Augmentation System. There are a collection of GPS receivers located on known locations that collect drift calculations for the satelites they ’see’. This includes clock drift, and the like. That information is collected and sent to stations on both east and west coasts of the US, where the information is crunched and new calculations are made regarding where each satelite is in it’s orbit, as well as how far off from the standard time each clock actually is. A package of that information is sent to the enhanced capability satelites that then include it in their data stream. A WAAS enabled GPS receiver then uses that information to make improved calculations regarding it’s location. At best this gives a GPS receiver the ability to get their location to within a 2 foot circle, though on average 16 feet seems to be more common. (I’m at 45 degrees north, My experience is that the further south I get the better the accuracy is, and may get down to 5 feet or so near the equator, at which point I would suspect the resolution gets worse again.)

The big problem with GPS is that it requires a reasonably clear view of the sky. If you are in a building or in a metro area with lots of tall buildings. (Hey being in a canyon has an effect as well.) you can expect that GPS will not get you a usable location. As a result the Android phone, and most other 2nd generation and beyond phones also collect location information based on cell towers. Here the situation is a little bit different. Since the cell towers are not in orbit, and the phone is in contact with at least one, it can get an estimate of where it is by knowing what tower it is talking to, and the tower telling everyone where it is located. With 3G phones the phones are talking to multiple cell phone towers and can get an improved estimate of where it is by recognizin gthat it is probably somewhere between the cell phone towers that it can talk with. Also the phone can get some information based on how fast it gets a response from a tower. That said the accuracy is significantly lower than you see with good visibility of GPS satelites. However if I can figure out what block you are on, or what building you may be in, it makes getting emergency services to you that much easier.

So we’ve told the phone to start gathering location information, and we’ve put the application to sleep for 5 seconds for the phone to work out where it is. Let’s start a loop of what we want to do. That begins with ‘While True Do” and ends with”‘End”. (Purists will say that’s not like Pascal, where’s the ‘Begin’? I didn’t say it was pascal, just similar.)

Within the loop we need to do four things. Get the current location information, format what we want to present to user, present it, and go to sleep for some period of time. “l = android.readLocation()” does the first thing, putting it into a variable ‘l’. ‘l’ is a data structure with several blocks of information. We’re primarily interested in our longitude and latitude, which places us on the globe. In some cases we may be interested in our Altitude as well. So we’re going to construct a string: {lat ” .. l.result.latitude .. “\nlon ” .. l.result.longitude .. “\nalt ”  .. l.result.altitude} and wrap that up with a command to present it to the user, android.makeToast(). Finally we are going to let the script sleep again for 5 seconds.

Thank you to Shanjaq for the Lau script above. Lau also will alow you to ‘print’ the output. So if you want the output within the shell window, rather than as a pop-up that may obstruct whatever else you may be doing, you could replace ‘android.makeToast()” with “print()”.

As I say, I don’t know all that much Lau, and while it very likely is simple to edit and append files, I hadn’t originally started down that line. I tend to work in Pascal. ASE also support Ruby and BeanShell (not to be confused with BournShell or BournAgainShell) and both may also have similar capabilities.

So my script looks a bit different:

import android, string, time
droid = android.Android()
droid.startLocating()
time.sleep(10)
while 1:
  l = droid.readLocation()
  outstr = str(l['result']['longitude']) + ',' + strl['result']['latitude'])  + ',' + str(l['result']['altitude'])
  if l['result']['provider'] == 'network':
    droidfile = '/sdcard/droidtrack.cell'
  else:
    droidfile = '/sdcard/droidtrack.gps'
  fh = open(droidfile,'a')
  res= fh.write(outstr +'\n' )
  res = fh.close()
  time.sleep(10)

Note that you can see what the labels of each component of the data structure that droid.readLocation() provides by simply printing the result. so if ‘l = droid.readLocation()’ then you will get all the labels by doing ‘print l’ You may wish to modify the output based on what is in each portion. So for example I decided that it was important to distinguish between gps provided output and cell phone tower results. Cell phone provided results are taged with provider having a value of ‘network’. Since there are really only two ways that location information will be provided, that means that the alternative is ‘gps’.

Ok, there is also the possibility that all location information will be ‘wrong’ or ‘missing’. For the purposes of this script I’m not going to worry about those possibilities.

What I am looking for however is output in a csv format. Thats because I will be wrapping it up in a KML wrapper.

If you have been reading my earlier blogs, you might notice that you can also use resources on the internet to track this information. I can very likely make this script significantly more complex if I want to use those resources. As an example I can use xmpp to report my position to a jabber server where the information can update a “I’m located ‘here’” page if I wanted to do that. I could update the information with whether I was walking, riding the motorcycle, bicycle, do I have a radio with me, etc based on other parameters.

In APRS which is Amature Radio Positionin Reporting System, there is also a ’smart beacon’ mode available. Currently this script tracks my position every 10 seconds, whether I’m traveling or not. That’s not quite enough data for OpenMap, but probably far more information than I need for APRS. One of the things to remember with APRS is that it is shared radio spectrum. Try not to monopolize it. Since there is more information I can work with here we can use some of the same rules that APRS uses. If I am stationary or moving at less than 5 miles per hour, a beaon every 30 minutes is probably sufficient. If I am moving from 5 miles per hour to 40 miles per hour, let’s give a position report every 10 min. Over 40 miles per hour once every 2 minutes. Since we also can see when those rates change, let’s grab a report any time we come to a full stop. Finally we probably want to know when our course changes. So if we see our bearing change by more than 15 degrees since the last time we had a bearing, we will send a report as well. If we want to be pedantic, we can also supress new bearing updates until we see a change in bearin gof less than 14 degres. That way we won’t send a report every second while we are navigating three right turns on the cloverleaf because we missed the right turn we needed to take earlier. It’s a feature we can add later.

So we start with the ’speed’ option. This will give us a frequency of data collection of 30 min, 10 min or 2 min. :

getRate(speed):

if speed < 5:

rate=1800

else:

if speed < 40:

rate = 600

else:
rate = 120

return rate

Now we need a couple of functions to deal with major changes in speed, direction, etc. Conditions where we want to report new position information If speed crosses one of the thresholds for changing reporting, changes by more than 20 mph since the last report, or we come to a complete stop.

change_in_speed(old, new):

beacon = false

if (old = 0 and new <> 0) or (new = 0 and old <> 0) or (abs(old – new) > 20) or (old < 5 < new) or (old < 40 < new) or (new < 5 < old) or (new < 40 < old):

beacon = true

return beacon

How about a function to deal with change in bearing. This is a bit more interesting. First of all bearing is a number from 0 to 359. To do this we will set a ‘greater’ and a ‘lesser’ to compare. We then subtract the lesser from the greater. If the result is greater than 15, or less than 345 it’s time to beacon.

change_in_bearing(old,new)

if old < new:

lesser = old

greater = new

else:

lesser = new

greater = old

difference = greater – lesser

beacon = false

if (15 < difference < 345):

beacon = true

return beacon

Ok, as part of the loop we will set a counter, such that every second it is decremented by 1, we’ll start with 0 and any time we have a value of 0 we will send a beacon. So let’s use the ‘beacon_now’ variable to store this value in.

beacon_now = 0

while true:

l = droid.readLocation()

if beacon_now == 0:

Send_position_report(l['result']['longitude'], l['result']['latitude'] , l['result']['altitude'], l['result']['speed'], l['result']['bearing'])

beacon_now = getRate(l['result']['speed'])

oldPosition = l;  # Note that we only set this if we have had to beacon.

if change_in_speed(oldPosition['result']['speed'], l['result']['speed']):

beacon_now = 1

if change_in_bearing(oldPosition['result']['bearing'], l['result']['bearing']):

beacon_now = 1

beacon_now -= 1

time.sleep(1)

Now all we have to do is write the ‘Send_Position_Report()’ function. For that we can simply format a string and dump it to a text file. send an xmpp message to a tracking account, or update a twitter page.

I have taken a few liberties with the syntax for test in the change_in_speed() function. I will post an updated script as a comment. One thing I haven’t checked is how long it will take to run through the functions or comparisons. I suspect that if ASE converts this to bytecode that the response time should not be bad. Additionally I would suspect that some people will be looking for different threshods and rates. If you are a runner, you probably would not be all that pleased to get your rate of speed and location every 10 minutes. So perhaps you would rather set the rate to 120 for any speed over 5 mph. If you are a pilot, perhaps you want to set a threshold of 30 mph for ground speed, 200 for takeoff/landings and above that for cruising, with changes in speed of 50 mph to trigger track collection. You also may not be interested in getting samples as often at cruising speed as during takeoff and landings.

Ok, I’ve gone a bit affield of just a simple demonstration of how to get your location information to display on your phone. You may be looking to take this in an entirely different direction. If so, have a good time.

posted by Rusty at 11:04 pm  

Tuesday, August 18, 2009

Removing ‘dupes’

If you read my blog from yesterday, you know that I have a rather substantial collection of files sitting on my server. Over the years I’ve written a few stories, taken some pictures, subscribed to a few news feeds, and so on.

But that’s only a small part of why my server has such a large collection of files on it. The real reason is that I’ve backed up systems that I needed to do some work on to my server in the past, and it’s accumulated a few duplicate files. I’ve had at least three laptops that I’ve created a folder on the server for, then dumped the entire content of my home folder from the laptop into the folder on the server. Including all subdirectories.

I have also backed up what I considered to be important folders on the system the same way. In my user account’s home folder there are folders ‘etc’, ‘oldetc’ and the like that are copies of what I had in the folder at the time I needed to do something significant to the system. Oh, it has saved me a few times. But I really don’t need a lot of that sitting around anymore. And it does take time to parse when I do make backups.

Well, just merge the folders together then, and throw away the source folders for any duplicates. Right?

In some cases that might be workable. However it would cause a few issues as well. Let’s take a couple of examples. Say I had a laptop Able, another Betsy, and a third Chuck. At one point or another I made backups of each to my server. However when I was using Betsy, some time after I was using Able, I create a file ‘Meeting Notes.txt’ in the home folder. Oddly enough I have a similar file in Able, and another in Chuck. How do I ‘merge’ the three folders? In this case I don’t really want to do that. What I want to do is keep the three folders, and their unique files separate, but after I had taken a picture of my Lab a when I was setting up Able, I decided that it would be a great background image, so I tossed it in a folder Backgrounds, and copied that from system to system. That is a prime image to remove the duplicates of.

Sounds simple right? Bring up a list of the files in each folder, and delete the duplicates. Right?

I mentioned that this home folder has almost 200 gig of content right? It turns out that there are a bit over 380,000 files. For the moment I”m presuming that there are at least 100,000 duplicates. Additionally some of the duplicates may not have the same name.

Say I take a photo I’ve taken, and store a copy in my backup folder. Well the camera gives it a positively useless name like ‘IMG00010.PEF’ and I use a tool called UFW to conver this to img00010.jpg, but the name is still hardly descriptive. The picture is of a friend’s Samoyed named Doug. So I copy the image to doug010.jpg for my friend, and just happen to leave it on the laptop. Ok, she was in the image too and I was using the image as a background for a while.

Now I don’t mind having the jpg dupe of the pef file. they are in different formats, and the jpg takes up significantly less space. However I probably don’t need both of the img00010 and doug010 images hanging around at the same time. Since the doug010 image is really the same as the img00010 image, I can use a feature of many Linux file systems and ‘link’ the two file names to the same file. In fact that is already in use on some folders as a couple of tools such as web servers have changed what folder they pointed at for the user accoung. At one point Apache was pointing at WWW, and at another it was pointing at public_html, and at another time it was pointing at www. Rather than delete and recreate each time, I created a link to the original folder, and pretty much forgot about it.

In any case there are a lot of possibilities for why a duplicate file may exist, and in some cases what appears to be a duplicate file may not be one.

So how to clean this up.

First of all let’s find a way of identifying duplicates that has nothing to do with the file name. There’s a handy tool available for Linux and I believe for MacOS as well, Windows too I think, called sha256sum. This tool is primarily used to generate checksums of files that are going to be distributed on the Internet, so that once you have completed your download you can check to see if the resulting file matches what the distributer says they sent out. It’s really likely ot be overkill for what I intend to do with it. I could probably get away with using md5sum, but considering that I have over a quarter of a million files to go through, I might just as well use the ‘best’ tool for the job. What the tool does is go through a file and spit out a hash of the file contents. Actually it spits out a 64 character string that is reasonably unique for each file. For my purposes it is close enough to being unique. I can survuve deleting some files.

What I am doing now is using the command

find . -type f -exec sha256sum "{}" \; > sha256sums.txt

to generate a file that I can then process. The format of the file is each line contains first a sha256sum then the filename.

Next up is to sort the file by the first 64 characters. Actually just sorting will be fine. I’ll probably use the command

sort <sha256sums.txt >sha256sums.srt

which should spit out a file of the same number of lines.  A quick check for that

wc -l sha256sums.txt sha256sums.srt

to verify that. Now we are going to want to weed out all the lines that represent unique files. Essentially we are going to do the reverse of what the tool uniq was originally intended to do. By default if you pipe the contents of a file through ‘uniq’ you get al the lines that are unique, or are not duplicates of the line before them. It does include the first line that may have a duplicate after it, but not any subsequent duplicates. Well, that’s close to what we want. Time to take a look at it’s options. Hmm. -d – dump out duplicate lines. Ok, we’re closer. But as I said the format for each line is ‘hash filename’ and in this case that means that, even sorted, each line is going to be unique. Oh, wait, -c – compare first ‘n’ characters. Bingo.

uniq -d -c64 <sha256sums.srt > sha256sums.dups

should take care of the job. Well, maybe.

Until I take a look at the output I won’t really know if it will list both or all files with that checsum. Ok, let’s presume for the moment that it won’t. What to do?

Well, let’s use the cut command to cut out the first field, then use fgrep to find all the lines that have the resulting hashes in them. Since there are possibly 3 or 4 copies of some files (or more) let’s also go through uniq to make sure we have a reasonably clean copy.

uniq -d -c64 <sha256sums.srt | cut -f1 -d\  | sort -u > sha256sums.hashs
for A in 'cat sha256sums.hashs' ; do grep ${A} sha256sums.txt >> sha256sums.dups ; done

OK, that should get the list of duplicate files. What I need to do now is get rid of the hashes and sort the list of files.

cat sha256sums.dups | cut -f2 -d\  | sort > duplicatefiles.txt

Now at this point let’s get rid of those sha256sums files. We will also need to edit the duplicatefiles.txt What I want to do is limit the files that I delete to just those in ‘backup’ folders for devices that I am not worried about having a duplicate file from. While we are in there it would be a good idea ot escape out non-graphical characters within the file. It will make a difference later.

rm sha256sums.*
gedit duplicatefiles.txt
for A in `cat duplicatefiles.txt` ; do rm ${A} ; done ; rm duplicatefiles.txt

And we are ‘done.’

Well sort of. In reality we ended up deleting about 600 files. Not quite the experience in space savings I was looking for.

But it’s a start. And we have to start some place. I suppose. Now to figure out what else to get rid of….

posted by Rusty at 12:54 am  

Monday, August 17, 2009

I am a little currious…

…about where people draw the line for the phrase ‘That was a waste of time!’

By way of explanation for why I am curious, I just spent a significant part of today first working on, then ‘upgrading’ my server.

I have several ’servers’ I supose, my video server my sip serer, I suppose I should include my print server as well, but when I think of ‘my server’ I’m not really thinking of these. I’m thinking of my file, web, jabber, mail and so on server. The box has gone through a few upgrades in the past. In fact I don’t think there is a single component in it that was in the original box that bore it’s name. That was my first Linux box. Built from an old 386sx 16 motherboard that I had pulled out of my OS/2 server, and put into an old Northgate 286 case that I picked up cheap. I do remember compiling my own kernel at the time. I wasn’t interested in ‘X’ yet, but what I was interested in was setting up a dial on demand gateway.

I went through a few different modems with that box. I think the earliest one I used was a 14.4kbps internal modem. I think I tried a US Robotics 14.4 hst DS modem that I had picked up a couple of years earlier for running a BBS, but by then it was showing it’s age, and half the time I wasn’t getting a stable connection. Oh there were other reasons for that sort of a situation, but the important partwas that I didn’t trust it.

For a few years I would try one modem, use that until it stopped working, then go get another modem. I finally found a stable modem in a ATI Supra 33.6 that is actually still working. I’ve seen this modem connect to a modem that wouldn’t accept a connection from a peer that was a speed and model duplicate. I’m sure that there are situations where this modem won’t work well. But it’s worked reliably now for over 10 years where modems that were considered to be comercial grade with support, don’t even come close.

A few years after I got the Supra, I ended up with a cable-modem. No more need for dial on demand routing, but while I did need nat routing, my server wasn’t going to do that. Oh I could have done that, but I realy wanted to set this box up to do some different things. Actually, it had picked up the task of being my mail server along the way, and because I was doing dial on demand routing in the past, and I didn’t want every web page querie to initiate a dial command if it was pulling back stuff that I already had, I had been running a squid cache. I don’t think I had started running a web server of my own yet. But I was playing around with file serving using a fairly early edition of Samba.

About this time I was working with BeOS as my primary desktop platform. I still have that box, all set to go. It’s got an early Pentium in it, but for analog video, I think it would outperform the boxes I’m using today. I don’t know about for digital video though. I think i twould probably run into some throughput issues. That said I was doing things with it in the audio realm that I have yet to see anything like them being done on Linux, MacOS or Windows. I wouldn’t be too surprised if there was something feature wise similar on MacOS and Windows, but they either came with BeOS, or were a free download.

For that matter, it’s only been in the last three years that I’ve seen Linux handle multiple audio streams at once. BeOS did that out of the box. And from post completion to a usable desktop was 20 seconds. Linux is getting there now, but  It’s a safe bet to go brew a cup of coffee with just about anything else.

In any case I had decided that I needed an actual firewall, and I didn’t want it running as my file server at the same time. After several years, and a few itterations with cases and motherboards, I finally gave up and bought a broadband router. I haven’t looked back there. Oh I’ve thought a couple of times that it would be nice to have a multi-port box, with different internal and external networks on them in the mix. However for the moment the boxes that I’m using are working, and I don’t really see a good reason to replace them. Ok, I have one reason I might want to replace what I have, but ti would need a significant amount of permanent storage as I would like to be able to block certain traffic sources as far out in my network as possilbe. Oh well.

Well, over the years I added a web server, print services, Jabber, and so on. I’m not using the Mail feature quite as much any more. Well, I take that back. In the past couple of days I’ve started using it a bit more, but I’m wondering if I want to do that long term. The current network infrastructure that I’m working with says that i should be able to run my own mail server for incoming and outgoing e-mail. Howeer since I am in a dhcp scop from my cable provider, there are many mail servers that will blackhole e-mail from me directly from here. If I can figure out how to source my own e-mail through g-mail or something like that, perhaps, but for the moment I’m comfortable with g-mail getting all my e-mail. So as you might imagine, the box, as a server, has become pretty important to me. In fact if you are reading this, it either came directly from my blog, or was hosted there till someone decided to republish it for me. Hopefully ther eis attribution attached. If not I certainly wouldn’t mind if the malicious ‘publisher’ were to find his servers mysteriously going through a DDOS from time to time. Though I’m not suggesting you should do that.

So A month or so ago I decided that it was time to start looking at some higher availability solutions. The easiest would be to set up a small box to pick up my web stuff. In reality, it might not be a bad idea to move the database for the blog, and other stuff off of the server and into a dedicated database server system. From a database demands perspective, I would be surprised if all of my database environment put together was using more than a couple hundred megabytes. That includes information about all the recordings I’ve made, the next 2 weeks of TV schedules, the metadata on all my audio recordings, and so on. If you throw in my e-mail, perhaps a couple of gigabytes. Yes things get different once you add in all the music itself, and if you were to add the video itself things would be even more different. However I’m not quite ready to consider collections of photos, music and videow to be ready to be database object collections yet. Even if they are at some level.But I do have a substantial amount of information in my home directory on my server. Just under 200 gigabytes. And I figured it would be a bood idea to get copies of the data stored elsewhere as well.

So I picked up one of my favorite little machines, a v.50 at Microcenter, upgraded the memory and hard drive, Installed Ubuntu Linux Server on it, and started rsyncing from my server.

That was about a week ago. I took a look at where I was today and realized that I had gotten a bit over a quarter of the way through my collection. I wasn’t very happy about that. Or the fact that I had to reload my server several times over the past week. So I went and picked up a blank 320 gig 3.5″ drive and put it into a case I already had set up, plugged it into the server and started the process all voer again.

The server didn’t immediately crash. Not that that’s all that much of a positive comment on what the server did do. Over the next three or four hours it locked up about a dozen times. So I pulled out Spinrite thinking that perhaps I was dealing with a corrupted hard drive, and ran it against my internal drives. Nope, they seemed to be OK. So I over the next 2 hours spinrite ran. And gave thedrives a clean bill of health. Well, ok. So I grabbed a copy of the Ubuntu server lts install CD and poped it into the drive and tried doing a backup from within that box. No luck. Still ended up with lockups randomly occuring. So I knew the problem was not specifically with the operating system.

I then ran memtest86. Problems. Before it had completed a single pass I knew what the problem was. My server had bad memory. So I looked around. No matching memory. In fact the next closest memory was almost twice as fast, and the systemwouldnot boot with it in the memory slots. Well for that matter, the system wouldn’t boot when I put the old memory back in, so either the problem expanded on the memory module, or I also damaged the motherboard. I wouldn’t put it past either option.

About a month ago I decommisioned my primary workstation and moved to a much smaller box. (Yes another of those v.50’s.) As a result I had a ready solution sitting right next to me. Well, mostly. There were a couple of differences between the boxes. First of all the old box had a single sata drive in it, which was hanging off of a SATA card. It also had two IDE/PATA hard drives in it along with a PATA CD-Rom Drive. Now some time ago I picked up a module that allows me to plug a sata port on a motherboard into a PATA hard drive. I’ve got a couple of them. Nice little devices. In any case that device pretty much made evertying else work together well. The new motherboard had a single PATA port on it, along with just two SATA ports. I added the PCI SATA card from the old server into the new box (Also pulled the gigabit ethernet card and included that.) and I was able to plug everything into at least one usable port. Actually I had 1 internal and theoretically an external interface as well. I added another hard drive to the internal side. Long term I expect that will be my first stage backup location. In other words back up the transient information there, then back that up to a different location without having to take the box offline for long periods of time while backing that information up.

Of course watching the stream of file names passing by, I have a very strong suspicion that I could merge a few directories and save myself nearly a quarter of the space I have allocated.

Of course once I do that I have to tidy up the backup collections as well. Oh well.

In any case back to my original question. Obviously by now I consider the steps I’ve taken to get a stable server and backups made to be important time spent. In the next couple of months I expect to build a server pair that should have even faster processors, perhaps quad cores, more memory, and probably more storage. Hopefully I can find a way to get it into less space. Of coure if I do everything ‘right’ I’ll also ofload the video storage from my media center. When contemplating doing that I think it might be a good idea to add a completely separate network dedicated to storage that is back ended through my server and one or two platforms that will store data to the network. The clients all go through those platforms that make it look as if there is a single contiguous storage device that all the content is stored on.

The down side is figuring out how to handle ‘ofsite’ backups for something like that. Oh, I suppose a couple of 2T drives for now, then a couple of 4T drives down the line, and so on. But even that seems like overkill. When it comes down to it, the video data is all stuff that if I lost all of it tonight, I could probably end up going a couple of days before I found out that I had lost anything at all. And I might not even notice it then.

So was this a waste of time? The initial issue where bad memory was causing problems during the backup process, probably. That said I think that the interim solution of moving the data to a ‘larger’ platform is going to be helpful. And since I’ve done another ‘upgrade, I have a bit of an idea of where things stand as far as the next upgrade. And I can start making plans.

And before someone suggests ‘Carbonite’ remember I’ve been archiving hundreds of gigabytes of content. I get about a 128 kB/s upload speed through my ISP. A 2 .2 gig video takes at least 2 hours to upload across my connection. So we’re talking about 4 days of continuous full network utilizaion tup upload my archive for every 100 gig, or about a week to a week and a half for the entire collection of my data. Ok longer than that because pretty close to half way through I would probably be getting a call from my ISP asking what the heck I’m doing tieing up all that bandwidth for so long.

I have other reasons for not wanting to dump that much data into a carbonite storeage block. Though most of them I won’t go into now. Now? I’m thinking sleep is a really good idea.

To give an answer to the ‘best use’ question. I don’t know. I didn’t have anything else planned for today. It rained through a good portion of the day, so I would not have been out on the bike. I got some reading done. But if there was someone who I could have sat and talked with, that probably would have been a much better use of my time.

Mixed.

posted by Rusty at 3:33 am  

Powered by WordPress