In this and the following posts i will perform some crash and recover scenarios and show how to recover the cluster successfully.
At the moment the following tests are planned and will be published during the next days:
- suddenly turning off the power and restarting the node
- terminating private network connect between the cluster nodes
- recovering an ACFS file system which will not mount automatically
(was not able to reproduce more than once; sorry guys)
- overwriting the ASM disk header with the disk group being offline
- corrupting an online and active ASM disk by writing chunks of random data to the disk randomly
- simulating disk errors by removing the device from the operating system
- corrupting the OCR
- corrupting the Voting Disk
The environment used for the posts are explained in detail here.
Useful scripts can be found here.