Building a Data Archive
By Jonathan Morgan
In a series of short blogs I’d like to take a look at what it really means to build a data archive that will last into the future; what forms the basis for the need, how the industry is responding to that need and drawing some straightforward conclusions out of what’s really going on…
Whether you look at the top three IDC top 3 storage predictions: IDC serves up top 10 Storage Predictions for 2008
- Storage services models for data backup, archiving and replication will be more appealing to businesses.
- New role-based storage systems will demand tighter integration between the storage layer and content-generating applications.
- Vendors will build object-based storage systems to classify data and add policies closer to the point of creation.
Or general industry movements: 2008 Cluster Storage goes mainstream
It is clear that you are not alone if you are thinking about your data archiving needs.
Solution vendors are often keener to tell you what you want than you might like (I had to laugh at this: Reach out to Customers). It’s a bit of a shouting match out there!
Amidst all the noise, and vendors shouting about this or that “selling point”, it is all too easy to make a very expensive mistake by selecting a solution that makes some sense today, but no sense whatsoever down the line. Contracts are being signed that tie companies to ridiculous payment schemes – solutions are sold that may make sense this year, but that make no sense even looking as few as 2 years down the line.
These blog series will be as unbiased and jargon noise-free as I can make them (of course I’m a huge fan of our own solution, but I know it isn’t relevant to everyone). Tell me if I mess up on that!
I believe you should make today’s decisions based on a 10-year time span. Thinking 10 years ahead is a stretch, but when building an archive of re-usable data, it’s common to think about at least that period:
- When asked how long they want to archive their data for, the broadcaster EO replied: “forever” – it’s pretty common – we’ve heard the same answer from a scientists and a library as well, and why not? Is it worthwhile in your organisation to identify and force data to be defunct? Being able to re-use data is an amazing modern way to grow a company’s intellectual heritage and underlying value. Ten years of re-usable data is a balance sheet asset within itself.
What purchase/data-archive decisions should be taken into account today if the data archive is going to be available for a 10 year+ period?
Key Observations
- Disk $/GB has fallen by 50% for the past 10 years+ and will continue to do so. We’ll have at least 8 times bigger disks in 3 years time; in 10 years we’ll have 1 Petabyte disks (whatever “disks” will mean).
- In 10 years you’ll barely recognise the software packages you currently use
- In 10 years there’ll be a significant staff turnover
- Your data is valuable and you want to re-use it, even if today you cannot fully predict all the ways it will be re-used over the years
- There’ll always be a “best of type” archive solution at any one time – but that “best of type” will continually shift over that time
Those observations form the key foundation to all arguments made throughout this series.
Conclusion: Assume Storage Technology will Continually Improve
To marry today’s requirements for a data archive to a plan that takes into account a future with 5TB, 20TB, 1PB disks… does not need to be difficult.
The primary concern is to realise that data archiving technology will not stand still, secondly, to factor that into your buying decision. Ask yourself if the solution you are considering will be able to cope with that change?

Our advice?
If going with tape – assume a fixed period for your archive on that media, and assume that the data will be moved off tape at the end of that period.
If going with a proprietary disk-based solution – be very aware that the solution may not cope well with growing node sizes, and that the company has to continue fully supporting that product line for a long time to come.
If going with a home-made/virtualisation solution, or non-proprietary solution, you are ready to take advantage of new hardware types as and when they become available, potentially, without any vendor tie-in.
If you are going with an Online or SaaS solution, be careful to make sure that you aren’t signing up to a deal that will look like a rip off within 3 years time! Costing should be compared on a time frame of 5 years of data storage.
Next out, we’ll look at some of the ways to organise the data in the archive… with 10 years+ in mind.
We’ll introduce the various approaches to archiving data that are available today, and give our view about how data should and shouldn’t be organised within that archive for re-use throughout the years to come.
- Should data be organised at all?
- What’s the lifespan of a user group?
- Do data storage policies apply to you?
Basic overview of storage clustering: Storage Clustering
About this entry
You’re currently reading “Building a Data Archive,” an entry on MatrixStore
- Published:
- 11.02.08 / 5am
- Category:
- Archives, Technology


No comments
Jump to comment form | comments rss [?] | trackback uri [?]