I’m far from new to migrating Oracle databases to new servers, I’ve had to do it several times over the past decade and a half. Many of these migrations have been done with minimal downtime using technologies such as: Oracle Standby Database; Quest SharePlex for Oracle and basic restores of hot and cold backups using both RMAN and OS commands and between homogeneous and heterogeneous server architectures.
So, why has it taken me a while to get the method for this latest one sorted out?
The additional complications this time:
- The new servers are 50 miles away in a 3rd party data centre.
- The database files and FRA (Fast Recovery Area) are stored in ASM (Automatic Storage Management).
- There are two 600GB databases to migrate in parallel.
- Oracle Streams is running between the two databases (single source, single target thank goodness) .
- The total outage I have is around 9 hours.
So the basic plan is this:
Taking as read the existence of the the new Linux cluster (RHEL 5.9) complete with ASM installed and configured and Oracle homes created for the RDBMS instances.
- Take suitable RMAN backups – compressed of course to minimise network traffic.
- Shoot the backups down the line to the new servers.
- Restore the databases.
- Periodically copy over incremental backups and/or backups of archived redo logs.
- Carry out PITR (point in time recovery) to (almost – explanation later) the same time on both databases.
- Open the databases. Sadly this will have to be RESETLOGS which does have an impact on Streams replication.
- Reestablish Streams replication.
- Modify the tnsnames.ora files and configuration files with connection string details for the various applications and interfaces that touch these databases.
- Final testing and handover.
Before I started practicing I was totally OK with all of that plan apart from two things.
- I had no idea how to transfer files from one ASM instance to another and didn’t even know that it was possible, but I needed to avoid creating additional file systems if possible and I needed to find a way of transferring data across the network – driving down the motorway backup media in hand not being an option.
- Knowing that opening a database RESETLOGS breaks the Streams replication, I needed to find out if it was reasonably possible to repair it. (Reinstantiation from scratch would take about 24 hours and again this had to be avoided if possible).
To answer the first point, I came across the following two invaluable blog entries from John Jeffries and Talip Hakan Ozturk. Obviously I had to adapt what they contain and I created a few basic tools to help create the scripts to minimise typing errors when creating the transfer scripts.
http://johnpjeffries.wordpress.com/2009/12/07/using-rman-to-restore-a-database-to-another-server-in-an-asm-environment/
http://taliphakanozturken.wordpress.com/2012/06/23/how-to-copy-backup-files-from-an-asm-environment-to-another-asm/
The initial transfer of the full backups across the network take between 7 and 9 hours per database depending on network traffic and subsequent RMAN restores between 2 and 3 hours each depending on the number of channels.
The backups of a day’s archived redo logs transfer in less than an hour so this is more than acceptable. So the full backups will be copied over sometime during the week before cutover night and periodic transfers of archived redo log backups transferred to roll the databases forward. Fabulous!
Now to the problem about getting Streams working again after PITR and opening RESETLOGS.
I found the following helpful Oracle Documentation: http://docs.oracle.com/cd/E11882_01/server.112/e10705/man_gen_rep.htm#i1014311
John Jeffries (again), also has some good stuff in his blog, but his entries referred specifically to downstream capture, whilst we are using Combined Capture and Apply with capture at source.
Here is what I found works best in our situation:
- On the databases in our current data-centre
- Stop all user activity on the source system (apart from the streams heartbeat process where a scheduler job inserts into a ‘heartbeat’ table every 15 seconds).
- Disable the capture process in the source database.
- ALTER SYSTEM ARCHIVE LOG CURRENT in both source and target
- Enable the capture process in the source database
- ALTER SYSTEM ARCHIVE LOG CURRENT in both source and target
- Take RMAN backups of the the resulting archived redo logs.
- On the databases in the new data-centre
- We are trying to engineer the situation where the source database is slightly older than the target, so:
- Perform PITR on the new source database and recover to a point when the capture process is disabled.
- Perform PITR on the new target database and recover to a point when the capture process had been reenabled (about a minute newer than the source).
- We are trying to engineer the situation where the source database is slightly older than the target, so:
Referring to the Oracle documentation in the link above, in the section:
Performing Point-in-Time Recovery on the Source in a Single-Source Environment, having the target’s highest applied SCN (determined in step 8) greater than the FIRST_SCN (returned by the source dictionary build in step 6) made it possible to omit step 9 (I had no missing transactions I needed to apply) and step 10 – with its 12 sub-steps which frankly I had no success with (the instructions are a little vague after all).
So moving on to step 11: I didn’t like the sound of dropping the existing capture queue one little bit, but dropping the queue and recreating it as per step 12 went without a hitch – I just wish there had been an example showing the syntax.
I could completely ignore that the target database had been through PITR, from the perspective of this scenario it was irrelevant. The abridged steps I needed to perform were:
- Step 1 – Carry PITR recovery as per documentation and note RESETLOGS SCN ( 53708867168)
- Step 2 – Make database is open in restricted mode
- Step 3 -Run following on source to get names of capture process and rulesets as the capture queue will need to be recreated in a step below
COLUMN CAPTURE_NAME HEADING 'Capture|Process|Name' FORMAT A15 COLUMN RULE_SET_OWNER HEADING 'Positive|Rule Owner' FORMAT A15 COLUMN RULE_SET_NAME HEADING 'Positive|Rule Set' FORMAT A15 COLUMN NEGATIVE_RULE_SET_OWNER HEADING 'Negative|Rule Owner' FORMAT A15 COLUMN NEGATIVE_RULE_SET_NAME HEADING 'Negative|Rule Set' FORMAT A15 SELECT CAPTURE_NAME, RULE_SET_OWNER, RULE_SET_NAME, NEGATIVE_RULE_SET_OWNER, NEGATIVE_RULE_SET_NAME FROM DBA_CAPTURE; Capture Process Positive Positive Negative Negative Name Rule Owner Rule Set Rule Owner Rule Set --------------- --------------- --------------- --------------- --------------- JOHNJI$CAP STRMADMIN RULESET$_5140 STRMADMIN RULESET$_6399
- Step 4 – Skip.
- Step 5 – Capture queue should already be stopped, but check anyway.
- Step 6 – At the source database, perform a data dictionary build:
SET SERVEROUTPUT ON DECLARE scn NUMBER; BEGIN DBMS_CAPTURE_ADM.BUILD( first_scn => scn); DBMS_OUTPUT.PUT_LINE('First SCN Value = ' || scn); END; / First SCN Value = 53710388735
- Step 7 – Make sure the APPLY queues on the target are IDLE.
- Step 8 – Determine the highest SCN for a transaction that was applied at target and verify that it is higher than both the RESETLOGS SCN (Step 1) and the First SCN (Step 6).
SELECT APPLY_NAME, HWM_MESSAGE_NUMBER FROM V$STREAMS_APPLY_COORDINATOR; ApplyProcess Name HWM_MESSAGE_NUMBER -------------------- ------------------- APPLY$_JOHNJI_3956 53710391425
- Step 9 – Disable restricted session on the source.
- Step 10 – Skip.
- Step 11 – Drop the existing capture queue – Yes honestly this is safe.
BEGIN DBMS_CAPTURE_ADM.DROP_CAPTURE('johnji$cap'); END; /
- Step 12 – Create new capture queue setting FIRST_SCN and START_SCN to number returned in Step 6.
BEGIN DBMS_CAPTURE_ADM.CREATE_CAPTURE( QUEUE_NAME => 'strmadmin.johnji$capq', CAPTURE_NAME => 'johnji$cap', RULE_SET_NAME => 'ruleset$_5140', NEGATIVE_RULE_SET_NAME =>’ruleset$_6399’, START_SCN => 53710388735, FIRST_SCN => 53710388735 ); END; /
Step 13 – Start the new CAPTURE process and check that changes are being applied in target.
Hope that helps someone else!
Johnji