Pages

description

Data is a real challenge facing most companies and organisations. This data has to be stored, copied, backed up, archived and deleted. Storing and managing it is half the story though, it has to be cleanesed, analysed and put to good use.Data is the 21st century's oil. My blog's will present my view on it. Please take it as it is, a point of view.

Wednesday 18 July 2012

Oracle Data Migration


I’d like to share with you here my adventures in migrating a big database. This should give you some insight into the different Oracle migration methodologies, the issues facing the DBA and the business and hopefully, some answers.




Context
A good story is nothing without a context and a scene description. I could do it the Lord of the Rings’ way and spend 200 pages describing the customers’ workplace, and there is a lot to be said (a guy tried to scalp me using a hatchet for instance) but this is a technical blog, and I’ll try to keep it that way.
So, the context is a French company who just acquired a branch of a company some hundreds of miles away and in a different country, Germany. The German company was using a large warehouse database daily for their operations. The database comprised over 7TB of critical data. As part of the acquisition, part of this database (6TB) was to be copied over to a data centre in France, with a downtime window of four days (including bank holidays).
Every word counts here, believe me.

Kick off meeting
I became aware of this acquisition when the customer asked me nicely to take care of it, and that a special network link has been created between the two sites.
“You don’t have access to the source database though, it’s used in production by the German company, and as it’s a partial acquisition, we’ll be allowed to keep just part of the data anyway.”
Yep, I thought too that the statement of work was on the short side.

Dataguard
Confronted with a large database, a short migration window and a huge distance, my first reaction was to use Dataguard.
If you’re not familiar with Dataguard, it’s Oracle’s replication technology. It provides functionality to allow maintenance of a perfect copy of the source database at another site based on log shipping.
Every transaction executed on the primary site is captured in a file called the Archivelog, copied over to the secondary site and replayed against the standby database.


This product is mainly used for the following tasks:
  •  To create a standby database at a DR site.
  •  To reduce the overhead of backups on the primary database, as Oracle backups can be taken at the standby database.
  •  For planned maintenance. By switching over to the standby, the roles are reversed, and the maintenance operations on the primary site can be performed with minimal service outage.
  • For reporting. The Active Dataguard option allows opening of the standby database in Read Only mode for reporting while still applying the archivelogs.
  •  Migrations.

This last point is the one of interest. By creating a standby database, I would have been able to get an almost perfect copy of the primary. The difference being in the few minutes orhours’ worth of data that still needs to be copied over and applied due to the inherent latency in a log shipping process.
The day of the final migration, I would have been able to do the following:

  1. 1 Ask my German counterparts to shutdown the application and the database, and open it in a read only mode to ensure there is no more transactions applied to production.
  2. 2 Let the latest archivelogs to be copied over and applied to the server in France.
  3. 3 Open the French database.
  4. T  That’s it!

Total downtime: less than 1/2 hour.

I had successfully used a similar approach on a previous engagement, and felt pretty sure I found the perfect solution in less than 5 minutes. Being on the paranoid side however, I decided to test it anyway, using exactly the same configuration and versions as the production servers.

Well, even the paranoid have enemies. Mine was called endian.

The German database was sitting on an HP-UX server. The destination server in French was going to be a Linux. As it happens, these two platforms use different endians.
endianness is then the same as byte order  in which multibyte numbers are stored. A big-endian machine stores the most significant byte first (HP-UX on PA-RISC), and a little-endian machine stores the least significant byte first (LINUX on x86).
Dataguard is NOT a supported method of log shipping between servers of different endians. End of story.

Goldengate

Frustrated by not being able to show the customer how smart I was, I decided that there HAD TO BE a way to do a database synchronisation between two sites even with different endians. That would be a huge oversight from Oracle if there wasn’t.

The solution I found is called GoldenGate, a recent Oracle acquisition that offered them the perfect replication tool. It’s even better than Dataguard in some instances.

Dataguard is a database replication tool -Database as in the whole database. (logical standbys to be discarded). GoldenGate on the other hand is much more granular, and allows the bi-directional replication of particular tables, and no issue at all with endianness. It is even possible to replicate between heterogeneous databases. i.e From SQL Server to Oracle to DB2 to PostgresSQL.



It WAS the best solution. However, it wasn’t picked as THE solution by the business for a couple of reasons:
-          The destination database AND server belonged to the German part of the bank that hasn’t been bought, and they had no interest at all in installing some untested (by them) software.
-          The cost. Goldengate need to be licensed by both the source AND destination database, and being an Oracle product, it’s far from cheap.


Export / Import
I thought then that it was time to go back to the basics. Export/Import or Data Pump as it is called since version 10g, are the main tools used by generations of DBAs to migrate their data as it enables very fast bulk data and metadata movement between Oracle databases.
The principle is quite easy to grasp. All data is extracted from the database into a dump file. This dump then needs to be copied to the other side and inserted. Simple.
Every solution needs to be tested. The results were disappointing:
Export time
0.5 day
Copy time
3 days - estimated
Import time
1.5 days
TOTAL
5 days

Way more than the allocated 4 days. Especially that the business wanted some spare time to check the application.

IP over Avian-Carrier
At which point, the business project manager became a bit worried to say the least. The customer’s workplace atmosphere being what it is, I sent him an e-mail detailing a solution to speed up the network link by using an experimental method for the encapsulation of IP datagrams in avian carriers.
As specified in the RFC 1149, a Request for Comments (RFC) issued by the Internet Engineering Task Force (IETF) :
Avian carriers can provide high delay, low throughput, and low altitude service.  The connection topology is limited to a single point-to-point path for each carrier, used with standard carriers, but many carriers can be used without significant interference with each other, outside of early spring.  This is because of the 3D ether space available to the carriers, in contrast to the 1D ether used by IEEE802.3.  The carriers have an intrinsic collision avoidance system, which increases availability.  Unlike some network technologies, such as packet radio, communication is not limited to line-of-sight distance.  Connection oriented service is available in some cities, usually based upon a central hub topology.
 The IP datagram is printed, on a small scroll of paper, in hexadecimal, with each octet separated by whitestuff and blackstuff. The scroll of paper is wrapped around one leg of the avian carrier. A band of duct tape is used to secure the datagram's edges. The bandwidth is limited to the leg length.  The MTU is variable, and paradoxically, generally increases with increased carrier age.  A typical MTU is 256 milligrams.  Some datagram padding may be needed.


Anyway, he didn’t take the time to carefully read my e-mail, and was going to send it to the rest of the world when I caught him on the phone, 5 seconds before he put an end to his career and mine!
Lesson learned: hope can be a dangerous thing.

Transportable tablespaces
Back to serious stuff. I still had a 6TB database to migrate, and I had one last option to test - Transportable Tablespaces.
A tablespace is a logical group of data files in a database. A database typically contains at least one tablespace, and usually two or more. Within the database, the tablespace plays a role similar to that of a folder on the hard drive of a computer.
The transportable tablespace feature offers the possibility to copy the files directly from one database to the other, without the need to export the data to an intermediary file. A conversion is required however when source and target databases are on different endians.
Export time
None
Copy over network time
4 days - estimated
convert time and attach time
1 day
TOTAL
5 days

Almost fast enough is not good enough. Copying the full tablespaces instead of the extracted data only increased the size of data to be copied, and hence the time required.


Back to IP over Avian Carrier
The main issue’s with the network link. It was just too slow, whatever the solution. So, let’s concentrate on reducing this time.
That’s when I got back to my great idea of using pigeons. It started as a joke, but ended up being the perfect solutions, with a small modification! Instead of pigeons, using some physical disk in the trunk of a car would do just fine, thank you.
Export time + copy on disk
1 day
Transfer time
0.5 day
Copy from disk + Import time
2 days
TOTAL
3.5 days

The business project manager was happy at first, until he was told that critical data was definitely NOT going to travel in the trunk of a car. A professional and secure transporter was required. It got REALLY interesting when he had to explain to the French authorities why an armed vehicle was to cross the border from Germany.


The end
It was all right in the end. The database was successfully migrated, and in time for the application tests.
This story reminds me of something I heard about the space race during the cold war. One of the issues facing the astronauts was pens! They didn’t work in zero gravity environments. How can you conduct any scientific process if you can’t event write the results down?
The Americans invested $10M and 2 years in finding a way for a pen to work in a zero gravity environment. The Russian’s took paper pencils up there.
Sometimes, we need to go back to the basics and think outside IT.

No comments:

Post a Comment