09 October,2012 by Tom Collins
Taking a DB2 backup is straightforward, but in a large DB2 environment where hundreds of mission critical DB2 databases are backed up daily, what steps are in place to deal with failed backups?
A well managed backup strategy is one of the DBA secrets to stress reduction
Backups can fail for different reasons – software failure, corruption, hardware issues, tapes unavailable,connectivity issues are just some of the few reasons. From an Operational perspective , how do you deal with this issue?
On a basic level – 1) Attempt a backup 2) If success , OK , if failure , what next?
Normally , there are overnight backups. In most organisations Operations staff will manage the scheduling and follow up on DB2 backup failures. Dealing with failure will depend on the cause. If it’s a missing tape Operations staff can replace, but let’s say it’s a database corruption issue- escalate to DBA
One of the issues for Operations staff , is developing a standardised approach to Backups. They might be managing a wide range of file systems – such as DB2, SQL Server , flat files etc. How can you manage the risk but keep the effectiveness of Operations staff ?
Most large scalable backup systems will have a scheduling element. Backups are scheduled according to SLAs, RPO and RTO. What you don’t want is Operations taking a cowboy approach and logging on to DB2 server at any time of the day- potentially creating serious contention and possible downtime. At the same time , Opertion staff have to deal with backup failures.
I prefer a rescheduling approach. In other words, don’t log onto the system and issue another backup.
First, assess reason – look at log files and report to the subject matter experts.
Second , agree on a scheduled time.It might be immediate or later on.
One question arising is should Operations Staff log directly onto the server and run the backup, bypassing the resschedule?
The questions to consider are :
1) Scheduling offers a standardised approach across the multi – filesystems such as SQL Server and DB2 environments i.e SQL Server backup failures are rescheduled . Would this be lost?
2) Are there other logging\monitoring benefits that come with scheduling?
For example, if Operations logged on and issued the local backup script , is there a standardised way they would assess success\failure?
3) It will give you more flexibility in planning - for out of business hrs. For example , if a backup failure was discovered at 9am M-F , is it suitable for someone to log onto root and issue the backup command?
This is only a preview. Your comment has not yet been posted.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
Posted by: |