Data Loss Prevention
Leading Causes of Data Loss:
Natural Disasters 3%
Viruses 7%
Human Errors 32%
Software Malfunction 14%
Hardware & System Malfunction 44%
Computer's are more relied upon now than ever, or more to the point the data that is contained on them. In nearly every instant the system itself can be easily repaired or replaced, but the data once lost may not be recreatable. That's why the Data Recovery Clinic stresses the importance of regular system back ups and the implementation of some preventative measures.
The chart above lists the most common reasons that data recovery would be needed for. In all cases there are steps that you the user can take to minimize your risk of data loss.
1. Natural disasters
While the least likely cause of data loss, a natural disaster can have a devastating effect on the pyhsical drive. However, Data Recovery Clinic has rescued data from fires, floods, lightening strikes and the subsequent power surges.
In instances of severe housing damage, such as scored platters from fire, water emulsion due to flood, or broken or crushed platters, the drive may become unrecoverable.
The best way to prevent data loss from a natural disaster is an off site back up. Since it is nearly impossible to predict the arrival of such an event, there should be more than one copy of the system back up kept, one onsite and one off. The type of media you back up to will depend on your system, software, and the required frequency you need to back up. Can you proceed with a day's data loss? a week's? a month's? Also be sure to check your back ups to be certain that they have properly backed up. There's nothing worse than attempting to restore data from a blank medium.
2.Viruses
Viral infection increases at rate of nearly 200-300 new trojans, exploits and viruses every month. There are approximately 56,712 "wild" or risk posing viruses and about 105,000 total known viruses, some of which are considered non-threatening. With those numbers growing everyday, you are at an ever-increasing risk to become infected with a virus.
There are several ways to protect yourself against a viral threat:
a. Install a Firewall on your system to prevent hackers access to your data.
b. Install an anti-virus program on your system and use it regularly and scan to see if you have been infected. Many viruses will lie dormant or perform many minor alterations that can cumulatively disrupt you system works. Be sure to check for updates on a regular basis.
c. Back up and be sure to test your back ups for infection as well. There is no use in removing the virus only to restore it again form your back up.
d. Be wary of any email containing an attachment. If you don't know where it came from or what it is, then don't open it.
e. If you have contracted a "wild" virus that there is no known cure for, quarantine it to that system and contact the Data Recovery Clinic for further information and assistance.
3. Human Errors
Even in today's era of highly trained, certified, and computer literate staffing there is always room for the timelessness of accidents. Sometimes referred to as the U.S.E.R virus, human mistakes are made daily all over the world. There is not much we can do as users to prevent the intervention of Murphy's Law, except to be cautious. Here are a few things you might want to try:
a. Be aware. It sounds simple enough to say, but not so easy to perform. When transferring data, be sure it is going to the destination you had in mind. If asked "Would you like to replace the existing file" make sure you are before clicking "yes".
b. If you are even a little bit uncertain about a task you are about to carry out, make sure there is a copy of the data to restore from.
c. Take extra care when using any software that may manipulate your drives data storage, such as: partition mergers, format changes, or even disk checkers.
d. Before upgrading to a new Operating System, back up your most import files or directories in case there is a problem during the install. Keep in mind if you have a slaved data drive it may become formatted as well.
e. Never shut the system down while programs are running. The open files will more than likely become truncated and non functional.
4. Software Malfunction
Software malfunction is a nessesary evil when using a computer. Even the world's top programs cannot anticipate every error that may occur on any given program. There are still a few things you can do to lessen the risk:
a. Be sure you are using the software ONLY for its intended purpose. Mis-using a program may cause it to malfunction.
b. Using pirated copies of a program may cause the software to malfunction, resulting in a corruption of you data files.
c. Be sure that you have the proper amount of memory installed if you plan to run multiple programs simultaneously. If a program shuts down or freezes up you may lose or corrupt what you were working on.
d. Back up, Back up, Back up. A tedious task, but you will be glad you did if the software corrupts your customer data base.
5. Hardware Malfunction
The most common cause of data loss, hardware malfunction or hard drive failure, is another nessesary evil inheirent to computing. There is usually little to no warning that your drive will fail, but some steps can be taken to minimize the need for data recovery from a hard drive failure:
a. Do not stack drives on top of each other-leave space for ventilation. An over heated drive is likely to fail. Be sure to keep the computer away from heat sources and make sure it is well ventilated.
b. Purchase an UPS (Uninterruptible Power Supply) to lessen malfunction caused by power surges.
c. NEVER open the casing on a hard drive. Even the smallest grain of dust settling on the platters in the interior of the drive can cause it to fail.
If you need hard drive recovery do one of the following:
Fill out an online data recovery quote form - a representative will get back to you within an hour of submittal.
Call 727-642-5521 ( our toll-free number is at the top of every page) to speak with a representative and receive your quote over the phone. We answer our phones 24 hours a day 7 days a week.
Fill out a data recovery request form and ship us your drive. please follow any instructions on how to package and ship a hard drive.
Introduction
The capacity planner's role is critical for efficient backup and recovery for any Datacenter. This white paper is intended to provide a capacity planner with detailed information and guidelines for performing effective capacity planning. This paper includes two main sections:
Overview of Backup Technology
Capacity Planning
For those who are not highly familiar with current backup technology, the first section, Overview of Backup Technology, provides a useful foundation for the Capacity Planning section.
Overview of Backup Technology
The need to reliably backup and retrieve data has reached a new level of importance as companies are realizing the importance of saving and accessing large volumes of data. Today's corporate databases and on-line applications routinely manipulate hundreds of gigabytes (GB) of data, and databases one terabyte (TB) and larger are becoming increasingly common. The amount of corporate data collected electronically is growing dramatically each year. And companies are realizing the value in saving un-sifted data, for example, to glean information about market trends that can make or break their future success.
This reliance on full-time availability of data means the time to backup data is shrinking, and the demands for 100% availability of important data and for frequent backups is growing. These trends are placing enormous pressure on Information Technology organizations to increase the speed of backups while reducing the degree to which they intrude on day-to-day operations. Equally important is the need to recover files quickly and efficiently. Thus scheduled backups and rapid recoveries are activities that must be predictable, stable, reliable, and fast.
Basics of Backup and Recovery Technology
Current backup technology allows most of the backup process to be automated, with the exception of initial configuration and subsequent adjustment as storage requirements expand.
Physical and Logical Backups
There are two basic backup and recovery processes: physical and logical backups. Physical backups copy a byte-for-byte image of all of the database disk storage to a backup device. Logical backups copy all of the logical entities in the database to a backup device. Each process presents a different configuration problem. Physical backups are usually much faster than logical backups, because the source is read sequentially and the data can be retrieved at full device speed. The drawback is that the entire volume must be backed up as a single entity. Thus raw device backups are most useful when the entire device must be backed up. In contrast, a logical backup program reads the superblock to obtain the names of all the directories in the file system, and then reads logical entities such as directory entries one by one, almost always not in device order. While slower, a benefit of logical backups is their ability to inspect the last-modified date of each file and decide whether or not the file has been updated since the most recent backup.
Fully-Consistent Dumps
Two backup strategies can be implemented when fully-consistent dumps are required. One way to make the file system being dumped inaccessible to modifications is to simply unmount the file system before dumping it. The file system can then be remounted read-only if such access is required during the backup. Another option is to lock the file system against the updates while the backup is being performed. Because these systems prevent the file system from being modified during backup, they are nearly always used off-hours. This is usually not a problem unless user batch jobs are run overnight, as they can be substantially degraded during the backup.
Full-Time Availability
Datacenters that require full-time availability of data can use software or hardware mirroring to replicate crucial data onto two or more separate disks. By itself, mirroring does not solve the real backup problem (nor do other protected storage mechanisms, such as RAID-5), because mirrored data is also susceptible to application bugs and operator or user error, and mirrored disks must also be backed up. When full-time availability is required, a number of options are available, for example, hot database backups, and the use of snapshot images--read-only copies of data for backups.
Database Backup Technology
There are three basic type of full database backups: on-line, off-line and raw device backups. On-line backups are logical backups of a database that can be simultaneously handling transactions. Off-line backups are logical backups of a database that is quiet and is not available for transactions. Raw device backups are physical backups of the raw disk devices.
On-line Backups
On-line backups are the least-intrusive strategy, and they are a popular solution for databases that must be available 24 hours per day. On-line database backups are facilitated with software such as Oracle Enterprise Backup Utility (EBU), which can provide a consistent snapshot of all database table spaces to backup utilities such as Sun Enterprise Sun StorEdge Enterprise NetBackupsoftware. With several parallel streams of data provided by the database,Sun StorEdge Enterprise NetBackup software utilizes the backup drives to their maximum capacity, multiplexing multiple streams onto single devices where feasible.
Because transactions must be logged during the backup process, database performance may be degraded while on-line backups are performed. One way to backup a database that must sustain high transaction rates, is to mirror the database and perform a physical backup of the mirror. This requires first altering the database to begin backup, which establishes a quiescent database image. Then the mirror is detached so that a static image of the database is maintained on the detached mirror. The database is then altered to end backup, which allows logged transactions to be rolled forward into the tablespaces while a raw device backup of the mirror is done. When the backup is complete, the mirror is re-attached and the mirroring mechanism synchronizes the two disk images once again.
Off-line Backups
For very large databases that can be taken out of use for short periods of time, off-line backups are often the choice. This approach uses a utility such as Oracle EBU to make the database unavailable for normal transactions. It synchronizes the state of all its tables and provides a consistent view of the database to Sun StorEdge Enterprise NetBackup software. Off-line backups typically outperform on-line backups because of the lack of contention for system resources, and the fact that they have no impact on transaction rates once the database is back in use again. Today, with high-performance backup solutions such as Sun StorEdge Enterprise NetBackup software, off-line backups are once again being considered viable solutions.
Raw Device Backups
Raw device backups are the simplest way to backup a database, as they directly copy the raw disk devices to tape. This requires the database to be in a quiescent state, and uses a utility such as Sun StorEdge Enterprise NetBackup software to manage the high-speed transfer of disk data onto tape. Raw device backups are fast because the database itself is not involved in the process, eliminating all but the essential overhead. They are also fast because the disk devices are read sequentially, providing data to Sun StorEdge Enterprise NetBackup software at high speeds.
Advances in Backup Technology
In the past, IT organizations have turned to mainframes as the solution to large database and high-speed backup needs. While UNIX ® systems have typically delivered a 50-70 gigabyte per hour backup throughput, mainframes and their high-speed tape drives have managed throughput nearly six times faster. Several recent developments have turned the tables on this equation and have enabled sustained backup rates of more than one terabyte per hour on Sun servers--while at the same time decreasing the intrusiveness of backup operations.
Faster Throughput Rates
Tape drive technology has seen dramatic improvements in throughput rates. The familiar Sun 7 GB 8 mm tape drive provides native (uncompressed) throughput of 1 MB/second. The Sun StorEdge DLT tape 4000 tape drive almost doubles this rate by managing 1.5 MB/second, and the familiar IBM 3490E manages three times the throughput of the 7 GB drive with a rate of 3 MB/second. Newer drives that significantly change the character of database backup capabilities include the Sun 20 GB 8 mm AME tape drive with a transfer rate of 3 MB/second, the Sun StorEdge DLT tape 7000 tape drive that transfers 5 MB/second, and the Storage Technology RedWood SD-3 tape drive that, with 12 MB/second throughput, outperforms the IBM 3490E by a factor of four.
Greater Capacities
Along with these improvements in speed have come improvements in capacity. Sun's StorEdge 20 GB 8 mm AME tape drive stores almost three times the native capacity of the 7 GB previous generation. The Sun StorEdge DLT tape 4000 and DLT tape 7000 tape drives have native capacities of 20 GB and 35 GB, respectively. With a capacity of up to 50 GB on a single data cartridge, the StorageTek SD-3 drive can hold up to 250 times more data than a standard
18-track cartridge, and 125 times more data than a 36-track cartridge. The result is smooth, high-performance backups because less tape handling is required.
Automated Backup and Recovery Management Procedures
Another important development that is changing the character of backups is the advent of management software that automates backup policies and optimally feeds data to tape devices--ensuring integrity and speeding the backup process. After all, raw tape speed and high-capacity drives are meaningless without the ability to effectively manage the transfer of data.
New Approaches to On-line Backups Using Database Technology
Recognizing the need for high-speed backups that require no down time, database vendors have developed approaches to on-line backups that enable specialized backup software such as Sun StorEdge Enterprise NetBackup software to transfer data from the database management system (DBMS) to backup devices using parallel streams of data. One example is Oracle's Enterprise Backup Utility (EBU). This utility is responsible for managing the creation of a consistent database snapshot and feeding parallel data streams to the Sun StorEdge Enterprise NetBackup software server for multiplexing onto tape devices. Whereas once this process required dumping database tables to separate ASCII files and then backing up the files, EBU now provides convenient interfaces that can be effectively utilized by third-party utilities.
All of these developments in database backup technology require processing power and I/O bandwidth in order to work in concert to speed the backup process. Sun's Ultra Enterprise servers provide scalable, symmetric multi-processing, scaling from one to 64 high-performance UltraSPARC processors, up to 64 GB of memory, and supporting up to 20 TB of disk storage. The advent of scalable I/O platforms such as these allows DBMSs to be configured with the optimal balance of processing power and I/O bandwidth--enabling on-line backups to proceed without impacting database performance.
Capacity Planning
Capacity planning is critical to the success of efficient backup and recovery for any Datacenter. Bad performance is usually the result of unrealistic expectations and poor planning. Realistic expectations and good planning must consider current and future needs. It must include a plan for the time and skill to configure the Datacenter, and a plan for training personnel to operate and fix problems as they arise.
Capacity planning is part science and part art. The capacity planner must account for numerous variables and virtually unlimited configuration permutations. Systems are often underconfigured and the wrong products are often selected for the job. Because installation and configuration are complex, there is much room for error. Furthermore, because there are always interrelated bottlenecks, a major aspect of capacity planning is choosing the preferred bottleneck.
The main role of the capacity planner is to choose hardware and software for efficient backup and recovery in the Datacenter. To do this, the planner must first determine the following:
Volume of data the Datacenter will be managing
Availability of that data
How the data will be spread out across the network
Policies for backing up the data
The capacity planner can use this information to derive the following types of requirements:
Backup servers
Network
Storage
Backup device
Finally, the planner can determine the configuration requirements.
Understanding the Enterprise
Perhaps the single most important factor the capacity planner needs to assess is the environment to be backed up. This section presents the information the planner needs to assess the environment.
Dataset Size
The planner's first step is to determine how much data there is to backup or archive on a regular basis. Two main factors the planner needs to determine are the total size of the data and the size of the dataset that changes.
Total Dataset Size
The total size of existing data is an indication of the following:
Minimum amount of storage capacity required
Amount of data to be backed up during a full backup
Predictor of total required capacity
Total data size is often one of the easiest pieces of information to obtain, and tends to be specified as part of the requirements. In addition to obtaining the total data size, the planner must know or estimate the following factors:
Number of separate files. The total volume of data may be composed of a few large files or millions of small files. Certain types of data (e.g., databases) may not reside in files at all, but be built on top of raw volumes. In filesystem backups, there is often a small fixed overhead per file. The file record needs to be added to the backup database, the directory information read, and the disk needs to perform a seek to beginning of file.
Knowing the number of files also helps the planner determine the size of the backup index database retained by the backup software. On average, Sun StorEdge Enterprise NetBackup software suggests planning for 150 bytes in the database per file revision retained on media. That works out to over seven million file records per gigabyte of index database.
Average file size. By knowing the above two pieces of information, the capacity planner can calculate the average file size in the enterprise. If there is a large skew in file size distribution (e.g., many small files and a couple of very large files that throw off the average), the average may not be a good predictor of behavior. Therefore the planner must plan for slightly different performance when backing up small files versus large files.
Average directory depth. The directory structure into which the files are organized may also have an effect on the performance of the backup system. This is partly because long directory paths results in multiple seeks to the disk. Longer paths also result in larger records, because each filepath backed up is recorded in the database as a variable size entry. Therefore, longer paths tend to make the backup index databases grow faster.
Size of the Dataset that Changes
The size of the dataset that changes determines the volume of data that needs to be saved during incremental backups. As the number of changed files or blocks increase, the volume of data that needs to be written to tape grows. The capacity planner must know or estimate the following factors:
The frequency of the dataset change. The frequency of the dataset change determines the frequency for performing backups. The frequency that datasets change can widely vary. For example, some directories never change, some change only when something is upgraded, some change only at the end of the month, and some, like user mailboxes, typically change on a minute-by-minute basis. In addition, the frequency of dataset change, in part, determines the volume of data written during incremental backups, because incremental backups only save the files that have changed.
Amount of data to be backed up. The planner needs to decide whether to back up all the data or only the changed portions. While it is usually faster to save only the changed portions, it is also usually faster to restore whole directories and filesystems from full backups than from incremental ones. This is because of the restore process: restores from incremental backups need to first restore from the full backup, and then from all the incremental backups, until the latest versions of all files have been restored. This multi-step process often results in numerous tape mount requests and multiple retrieves of the same piece of data. The choice of performing full backups or incremental ones tends to be a matter of which case is most important: a regularly scheduled backup or an emergency after data has been lost on disk. While the former is done much more frequently, the latter tends to be a more time-critical situation.
Data Type
The type of data to be backed up relates mostly to the level of compression that could be expected from the backup hardware or software. There is no guarantee that the types of data to be compressed will exhibit similar properties, so it is safest to assume the data will not be previously compressed, and to compress all the data to be backed up.
Database or higher-level application data plays a special role in effective capacity planning. Unless the enterprise has relatively simple availability requirements for their data, backup will require special modules to save the data in a consistent state for restore. These modules are available for many popular database and application environments for both Sun StorEdge Enterprise NetBackup and Solstice Backup software packages.
The following are types of data the planner needs to consider. The various data types mentioned below include an example compression ratio for the DLT tape 7000 tape drive.
Text or natural language. Text or natural language tends to have a lot of redundancy, and can therefore be well compressed by both software and hardware. For example, in tests using sample English texts, the DLT tape 7000 hardware compressed the data at ratio of approximately 1.4:1.
Databases and high-level applications. Many popular database packages and application environments have corresponding backup modules for Sun StorEdge Enterprise NetBackup and Solstice Backup software packages. For example, backup modules exist for Oracle, Informix, and Sybase database packages as well as for application environments like SAP. These modules enable backing up and restoring data in a consistent state, without taking the database off-line, making it unavailable to users.
Additionally, while databases and high-level applications tend to have widely varying contents and structure, they often contain text or numeric data with a lot of redundancy. This makes them more compressible. For example, in tests with sample databases from a TPC-C benchmark, the DLT tape 7000 hardware compressed the data at a ratio of approximately 1.6:1.
Graphics. Many applications require manipulating numerous large graphical objects. The fact that graphic files tend to be larger than text files does not imply the filesystem will consist of a few large files. This is because applications create composite objects from a myriad of smaller isolated objects.
In general, graphic objects tend to be previously compressed, making further compression in hardware or software unlikely. Indeed, the nature of hardware compression algorithms often inflates files that are already optimally compressed. For example, in tests with Motion JPG data, the DLT tape 7000 hardware compression showed a compression ratio of approximately 0.93:1.
Combined file types. Data residing on network file servers and internet servers, the most common server types, is usually a mix of text, graphics, and binary files. Because these datasets often consist of many small files, the capacity planner must also evaluate system performance. These mixed file types compress well. For example, in tests with files from network file servers and internet servers, the DLT tape 7000 hardware had a compression ratio of approximately 1.6:1.
File Structure
Another factor the planner must consider is the structure of the files: will they be backed up using a filesystem or dumped from a raw device?
As mentioned previously, raw dumps copy all the bits from the storage volume to the backup media. This captures the bits for any filesystem or database metadata, as well as the actual application data written on that volume. However, the metadata may be out of synch with the data in the volume. This is because the metadata on the volume is not interpreted, and the volume cannot differentiate the backup from another access. To prevent this problem, the volume is typically taken off-line to prevent updates to both data and metadata. Another solution is to mark all entities on that volume read-only for the duration of the backup.
The level of this problem varies depending on the types of filesystems and databases to be backed up. On-line filesystems maintain consistency, and do not require periods of unavailability. However, some higher-level applications may keep their data and metadata in the filesystem, and may need to be taken off-line or otherwise prevented from updating their files during the backup. Prevention from file updates during backups is required so that all the application data can be simultaneously saved and restored in a consistent state.
Another consideration between raw volume and filesystem backup is the atomicity of the data. The raw volume is treated as one large entity, while filesystems are divided into many small logical pieces. The entire dataset needs to be restored to keep one portion of data (e.g., a file or database row) that needs to be recovered from a raw volume dump in a consistent state. Restoring the entire dataset not only takes longer, but it also overwrites any changes to all the other data that had been made since the dump. In addition, incremental backups are currently impossible with raw volumes, because an update to any part of the volume compromises the integrity of the whole. In this case, the whole volume needs to be dumped again. With filesystems, only those files that changed since the last backup need be saved again.
The main advantage of raw volume dumps is the sheer efficiency of dumping raw bits without further interpretation by the system. The disk accesses tend to be large and sequential, minimizing the overhead of system calls and eliminating seeks by the disk drive arms (which are orders of magnitude slower than data transfers).
In contrast, filesystems add additional overhead. The data from file accesses is, by default, buffered in the virtual memory system, and this incurs copies in the kernel. In addition, files are read from disk in directory order and may be scattered in various areas of the disk, causing seeks to pass from one file to the next. This process may reduce the data rate from the disk volume. To perform closer to the level of raw dumps, the filesystem inefficiencies can be minimized through careful configuration and tuning. Nevertheless, there are certain situations where raw dumps are superior, if only for their sheer simplicity.
Filesystems can also offer a number of features that benefit effective backup configuration and planning. Chief among these is the ability to turn on Direct I/O. Direct I/O is a method of accessing files in the filesystem as though they were raw devices. This mainly bypasses the virtual memory buffering, but this may result in a large saving in CPU time, memory usage, and overall wall time. (Despite the benefits of Direct I/O, seeking to various positions on the disk to reach the beginning of file cannot be avoided.) A recent study showed that Direct I/O saved an average of approximately 20% CPU cycles, and kept the system from thrashing during extraordinarily heavy loads.
Direct I/O is available in both VxFS and UFS (starting with the Solaris 2.6 Operating Environment software). VxFS provides various mechanisms for engaging Direct I/O, including a per-I/O option. The most common method, however, is to use a mount-time option to enable this feature for the entire filesystem. UFS also allows Direct I/O to be turned on for the entire filesystem. One additional benefit of VxFS is that a filesystem can be remounted with different options without first unmounting the filesystem. This allows users to remain on-line and active, even when Direct I/O is toggled. This may form a benefit in enterprises where continuous operation is necessary.
Lastly, the VxFS filesystem provides a quick snapshot capability that can mount an additional filesystem as a read-only snapshot of the original. This is done while the original is still active and available. This feature is implemented via a copy-on-write mechanism that makes sure any blocks from the original filesystem are copied out to a special area before the block is changed on disk. A much smaller amount of additional disk space is required to activate the filesystem snapshot capability than from the logical volume manager. This is because only blocks that changed during the snapshot need to be duplicated.
Data Origin
Knowing where the data is coming from will help the planner to plan an appropriate configuration. The configuration needed for a local backup at high speed is very different from that needed to backup hundreds of small PC's over a metropolitan area network. The considerations below explore this issue in more depth.
Is the Server Where the Data Resides the One Doing the Backups?
When the server where the data resides does the backups, the complication of configuring networks is eliminated, and the planning focus is narrowed to the disk and tape subsystems and server processing capabilities. The server needs to have sufficient tape bandwidth to meet the backup window requirements--the available time period for backing up a specified quantity of data. To ensure capacity for multiple backups of the data (e.g., daily differential, weekly cumulative, and monthly fulls), tape capacity should be configured for at least three to five times the dataset size.
Disk bandwidth should be configured to meet the backup window requirements and keep the tapes streaming. (To keep from back-hitching, the DLT tape 7000 tape drive needs to receive data at a rate no less than 3.5 MB/second.) This may be difficult to ensure, because the server and disk subsystem are often already in place and tuned to perform a specific set of tasks. In this case, to determine if the desired backup window is feasible before planning for a specific set of tape devices, it often helps to measure the sequential rate of the disk subsystem. If the backup window is feasible, but backup performance still suffers due to slow disks, the planner needs to consider reconfiguring or upgrading the disk subsystem as part of the system upgrade path.
Lastly, the planner needs to consider the CPU resources necessary for local backup. Fortunately, these tend to be minimal, especially if Direct I/O is used to access the filesystems. For example, with Direct I/O, a single 250 MHz CPU should be sufficient to backup at 50 MB/second from local disk to tape. If the backups will be concurrent with regular operation and the system is already fully loaded, the additional CPU resources needed for backup may need to be added.
There are some additional factors the planner must consider. If the system has spare processing capacity, the planner must determine how much head-room exists and whether it will be sufficient to meet demands. Secondly, if the backups will be performed at off-peak hours, the planner must determine if there are any other scheduled processes to be run concurrently with the backup, and how much CPU is available for both. The planner also needs to consider sizing and tuning memory, especially if Direct I/O is not used. The main consideration in that case is the shared memory buffers used to coordinate between various backup processes, albeit memory is needed for essentially all system activities.
Is the Data on Remote Clients?
The planner must consider the requirements for backup of remote clients. This involves planning for the networking requirements to meet the backup window and other considerations. There is no recipe solution because of the virtually boundless varieties and configuration possibilities of enterprise networks. The planner must carefully plan for a successful network backup infrastructure, and have a good knowledge of network performance.
Even with the latest networking technologies, network bandwidth tends to lag behind the bandwidth of storage subsystems. Gigabit Ethernet is theoretically 100 times faster than Ethernet, but at the same time, FiberChannel Arbitrated Loop (FC-AL) offers twice the available bandwidth of Gigabit Ethernet. This discrepancy in bandwidth is unlikely to change anytime soon, because the tolerances in network connectivity tend to be much tighter than for storage. Network bandwidth issues are further complicated by the relatively high cost of upgrading the network infrastructure. While new storage devices can just be plugged in, adding network capacity may mean re-wiring parts of the enterprise. Such infrastructure tends to be very expensive and needs to be planned years in advance. Therefore, even if the upgrade is committed, there is often a period of time where the backup solution needs to work around inadequate network bandwidth.
Because of these network bandwidth issues, a frequent challenge when planning backup solutions is to find ways to satisfy backup requirements within the confines of a given network bottleneck. To understand the overall situation and to obtain a satisfactory solution, the planner needs to find the answers to the following five key questions:
How many clients are there? Knowing the number of clients helps the planner understand the overall scale of the enterprise. It also helps the planner determine aspects of backup planning such as level of multiplexing. Knowing the number of clients is also important because it ties in with the clients' location in the network in relation to the backup server.
What types of clients are there? To understand the client processing capabilities, the planner needs to know the types (i.e., the architecture and operating system) of clients that need to be backed up. For example, if a client has powerful processing capabilities but little network bandwidth to the server, software compression may be a good choice in backing up that client. Both Sun StorEdge Enterprise NetBackup and Solstice Backup software packages offer client-side modules for most platforms.
Do the clients have their own backup devices? If the clients have their own backup devices, the best configuration may be a hierarchal master-slave configuration. In this configuration, the master server initiates and tracks backups, but data goes to the local device. This configuration saves network bandwidth, and can often be significantly faster. The master-slave configuration is recommended on large clients connected to the backup server by a slow network. The backup server is often less powerful than the client it controls, and the main backup devices are attached to the slave clients.
How are the clients distributed? Knowing where in the enterprise network various clients reside helps the planner determine the available network bandwidth between the clients and server. This is necessary information for predicting backup times and data rates available from the clients to disk. Because the network bandwidth is often inadequate, a hybrid solution is most appropriate, in which both network backup of some clients and master-slave configurations are used.
How autonomous are the client systems? Sometimes the client systems are located in remote offices connected to the backup server via WAN (wide-area network) links. These systems often do not have dedicated technical support, and hence need to be managed remotely. By centralizing management, Sun StorEdge Enterprise NetBackup software helps make that task easier. However, certain tasks are necessarily manual, and involve personnel at the remote site. These people will need to be trained to carry out specific tasks associated with backup (e.g., changing tapes in stand-alone drives).
What Does the Disk Subsystem Look Like?
It is critical to obtain the optimal disk subsystem for good backup performance with modern tape technologies. This is because the disk becomes the next most likely bottleneck, assuming the network bandwidth is sufficient or the backups are being performed locally. The performance of the disk subsystem depends on numerous factors. To plan backup solutions, the planner can use the questions below as guidelines for addressing some of the more important disk-related performance issues:
How are the data on the disks laid out? The data layout on the disk affects throughput rate, because it determines whether access to the disk is mostly sequential or random. If the access pattern requires frequent seeks between portions of the disk, the overall throughput rate of data from the disk will dramatically decrease.
There are three reasons that the access pattern may require frequent seeks. The most common one is that the data on the disk was created over a long period of time. In this case, deleted files are left on scattered parts of the disk, and they are subsequently filled by newer files. A seek may then occur to get the next file, because the disk is backed up in directory order. (In this case, one way to obtain mostly sequential access to the existing files--albeit not an ideal process--is to backup all the files once, recreate the filesystem on the device, and then restore all the files from tape.)
Another common cause for this access pattern is that multiple processes are accessing different regions of the disk simultaneously. This results in seeks between the various regions. This can occur, for example, if two different filesystems on the same disk are being backed up simultaneously. In this case, it may be possible to serialize the access by scheduling the backups differently.
A third reason for this pattern is that outer regions of the disk (lower numbered cylinders) tend to be faster than inner regions. Data that needs to be accessed more quickly may be laid out on the outer cylinders.
How are the disks arranged into logical volumes? The logical volume configuration significantly affects performance. To add levels of performance or reliability to the disk subsystem, most enterprise server environments will involve some level of logical volume management, using software or hardware RAID.
RAID-0 (or stripes) volumes tend to increase overall performance, but significantly reduce overall volume reliability. Various combinations of RAID-1 (mirroring) and RAID-0 increase performance while also increasing reliability. RAID-5 also tends to increase both performance and reliability. However, RAID-5 has performance characteristics which slightly complicate backup planning. Approximately two to three times more time should be planned for restoring data to a RAID-5 volume than it took to back it up, because RAID-5 writes (especially small random writes) take significantly longer than reads. The expected reliability of the logical volumes plays a role in determining backup frequency. The RAID volume should probably be backed up more frequently if the following are all the case: the volume has poor reliability (e.g., RAID-0), it is updated often, and it contains valuable data.
How are the disks managed? Another important consideration is the mechanism by which the individual disks are managed or configured into logical volumes. Two possible mechanisms are host-based and hardware RAID. Host-based RAID imposes slightly more overhead on the server system than hardware RAID, but tends to be more flexible. Various volume managers offer different RAID configuration options (e.g., RAID 1+0 vs. RAID 0+1). Some volume managers also offer additional features (e.g., snapshot) that are attractive for backup solutions. A large number of server clients and most workstation/PC clients do not implement logical volume management at all, and are limited to the performance and reliability characteristics of the individual component disks (i.e., JBOD).
What are the disk capabilities? The capabilities of the individual disks also affect disk subsystem performance and reliability. Newer disks tend to be faster and more reliable than older disks. This is not only because of age, but also because of rapid advances in disk technologies. When doing sequential I/O, each disk tends to be capable of a certain data rate, and a certain random seek rate. When the disks are managed as RAID volumes, these capabilities place limitations on the overall logical volume performance. Additionally, different disks have different MTBF (mean-time-between-failures).
Data Destination
Several key questions below provide guidelines for the planner to plan for factors related to the tape subsystem, the data's target location.
What Does the Tape Subsystem Look Like?
The tape subsystem is another critical consideration, but tends to be slightly less complex than the disk subsystem. Overall, tape devices tend to be relatively predictable and generally behave as advertised. The most difficult task associated with a high-performance tape subsystem tends to be in terms of installation and configuration rather than planning. Planning tape subsystem capabilities is often a matter of using the device specifications to amass the required storage capacity and throughput. The planner can use the following questions to consider related issues:
Where do the tape devices reside? The planner needs to determine whether the tape devices are stand-alone desktop or rack-mounted units that need to be loaded by hand, or if they are mounted in a robotic library. If they are the former, the planner needs to consider planning for the human interaction required to implement an effective backup solution.
The robotic library is a superior choice for enterprise-level backup solutions. There are many variations of tape libraries, but most commonly they offer multiple tape drives and internal storage capacities in the hundreds of gigabytes.
By knowing the required data capacities, the planner can plan for a sufficient number of libraries to house all the data and to have room to grow. It may be more reliable to purchase a number of smaller libraries than a single very large library, because most tape libraries have only a single robot mechanism.
How many tape drives are there? The planner needs to determine the number of tape drives needed to meet the throughput requirements, and to configure at least that many as part of the libraries. The planner must also remember the SCSI or FC-AL slots on the server needed to connect the tape robotic devices. If there is an existing tape subsystem, they must determine its capabilities and supplement them with new equipment, if necessary. They must also be aware of any forward or backward compatibility issues with the media, because tape formats change almost as frequently as the underlying hardware.
What are the drive capabilities? Each individual type of tape drive has its own characteristics and capabilities. These include native-mode throughput, tape capacity, effectiveness of compression, compatibility of tape formats, and recording inertia. While throughput and capacity are relatively simple, the others also need to be carefully considered.
The actual compression ratio achieved depends mostly on the type of data, but it also depends on the compression algorithm implemented by the drive hardware. For example, the DLT tape 7000 algorithm prefers to trade throughput for compaction, while the EXB-8900 Mammoth 8 mm drive prefers the opposite. Not all tape drives are capable of using older media, even if the form-factor is identical. Most can read tapes written with older formats but cannot write in the older format.
If the backup images are to be archived for a number of years, the upgrade path is also important. The drive technology will chiefly determine the recording inertia. For example, linear recording technologies like the DLT tape 7000 and STK Redwood drives tend to have a stationary read/write head and quickly moving tape. To perform well, these drives need to be fed data above a specific rate. Helical-scan technologies like 8 mm and 4 mm tapes have lower recording inertia and are thus less sensitive to data input rates, but have overall lower throughput capabilities. It is difficult to balance all these factors, but as long as some minimal requirements are met, a suboptimal choice usually has little real effect on the overall performance.
How Are the Tape Devices Distributed?
It is also important to optimally position tape devices throughout the enterprise. This mainly depends on where it is advantageous to make the extra effort and attach backup devices directly to servers where the data resides. The following questions can help the planner examine the relationship between the tape devices and data, and may help them to focus on the relevant considerations:
Are all tapes on the master server? If all tape devices reside on the master server and the bulk of the data is elsewhere, the network needs to support the transfer rates necessary to move data from the remote clients to the centralized backup server. This configuration often simplifies day-to-day management at the cost of a complex networking infrastructure. As noted previously, networks are traditional bottlenecks for backup applications, and need to be configured for optimal performance.
Are tape libraries attached to important servers? An effective backup architecture is to add tape devices to servers where large quantities of data reside, and task them with being backup slave servers, centrally managed from the master server. With this architecture, the only information that is communicated over the network between master and slaves is the file record information, about 200 bytes per file backed up. Both Sun StorEdge Enterprise NetBackup and Solstice Backup software packages support this option.
How close are the tape drives to the data? The proximity of the tape drives to the data is usually an issue of network bandwidth. This is because shorter network distances tend to be covered by higher speed network links. If the tape devices and data are separated by hundreds of kilometers, the link bandwidth is likely to be low. In contrast, if they are located in the same data center, it may simple to configure a point-to-point link, dedicated for backups, between the two. This is mainly important when deciding where to locate the master server in a widely distributed enterprise, because the network architecture and data locations tend to be fixed. A general guideline is to locate the master server as close as possible to the bulk of the data, and hopefully close to a central location in the network topology.
Tape Environment
The operating environment influences the reliability of the tape subsystem longevity. The planner can use the three questions below to address the main factors:
What are the temperature and humidity like? Tapes perform best in moderate temperatures and relatively low humidity. The operating temperature affects things like tape tension and strength, drive part tolerances, and temperature of internal electronic components in the drive. Humidity may affect the longevity of the magnetic coating on the tape. This is because high humidity causes the surface of the tape to become gummy. The ideal operating conditions tend to be listed as part of the media packaging. For example, the DLT CompacTape IV lists operating conditions as 10-40 degrees C, storage as 16-32 degrees C, and humidity between 20-80%. Long-term archive storage (20+ years) requires even more stringent conditions.
How often are the drive heads cleaned? Drive heads need to be cleaned periodically because they pick up deposits with continual use. This is usually accomplished by inserting a cleaning tape. Tapes operating in dirty conditions (e.g., near printers) need to be cleaned more frequently, as do drives that operate outside of environmental specifications. Brand new tapes tend to have some manufacturing debris on the surface, and drives that frequently use brand new tapes should also be cleaned frequently. Both backup software and tape library hardware are capable of automatically inserting cleaning tapes after a certain number of uses.
How old are the drives and tapes? As they get older, tape drives tend to eventually wear out and encounter errors more frequently. Each tape technology has an associated MTBF (mean-time-between-failures), and media has a certain rated number of passes it before it is expected to wear out. These statistics, available from the manufacturers, tend to be optimistic.
The Data's Path
One of the last considerations in the overall system, is the path the data takes from the disks where it originates, to the tape cartridges where it is destined. The planner can explore this factor through the following questions:
Are Data and Tape Local to Backup Servers?
If data and tape are local to backup servers, the planner should focus should focus configuration and tuning on moving data quickly through the system between the devices. They should also focus on supporting the potentially large number of processes involved in managing the backup streams. These tend to fall into two areas: using memory effectively and providing local host/RPC capacity.
Is the filesystem buffer cache used? Backups are more efficient when avoiding the filesystem buffer cache. The buffer cache can be bypassed by either using Direct I/O to access individual files, or backing up the raw volume rather than the filesystem.
How much system memory exists? Backup relies on system memory in two capacities. Primarily, it is used for shared memory regions used to implement interprocess communication between various backup/restore processes. Memory is also used when buffering filesystem data in the virtual memory cache. If data is cached in virtual memory faster than old pages can be purged, the system may begin to thrash. More memory temporarily forestalls this condition. However, if the system is in a condition where data is cached faster than purged, it will likely thrash at some point during the course of a long backup.
The most elegant solution is to avoid the buffer cache in the first place, but if that is impossible, the planner needs to tune the memory reclaim rates to be more aggressive. In addition, to improve I/O to the swap device, they also need to stripe-swap across multiple spindles. This may eliminate thrashing, or at least reduce its impact.
What software is being used? The software used determines the overall efficiency with which data is moved from disk to tape. Both Sun StorEdge Enterprise NetBackup and Solstice Backup Power Edition software packages move data very efficiently, but Solstice Backup Network Edition software is a little less efficient. the Solaris Operating Environment software utilities such as tar and ufsdump are not particularly efficient and should not be used to implement enterprise backup solutions.
How much shared memory is available? The amount of shared memory the system can allocate is controlled in the /etc/system file. This file determines the memory used for interprocess control (IPC) between the reader and writer processes in the system. For efficient backup and restore a certain amount of shared memory should be configured per device and data stream.
What are the TCP tunings like? Tuning various parameters for the TCP kernel helps determine the buffer sizes used by the system, and the speed that clo
