Three steps to solve the problem of storage area network (SAN)

Three steps to solve the problem of storage area network (SAN)

although "storage area network (SAN)" is considered to be a stable and reliable technology, some mistakes are inevitable. Considering the complexity of San, the process of checking and solving San related problems is often considered a daunting task. In this article, I will provide some methods. I hope these methods can make you feel better when trying to solve San related problems

when you try to check a San and solve related problems, you may find that most of the problems are not caused by the San. Let me explain why

first, let's assume that you have a separate PC, and the PC uses a SCSI hard drive. Then suppose that one day you suddenly find that you can't read the data on the hard disk. There are many reasons for this problem. It may be that the disk drive itself is broken, of course, there may be a problem with your data cable, or the disk controller is broken; It is also possible that the data on the hard disk has been cleared, the partition has been deleted or damaged. What I want to tell you is that the phenomenon of never being able to access the data on the disk cannot be explained. It must be the problem of the hard disk itself, because there are many reasons other than the hard disk itself

now let's look at a similar situation on the San. As we know, San is just a basic method to connect a server to a disk array or other storage devices. The working mechanism of San is to allow the server to use SCSI commands to communicate with storage devices

suppose the server suddenly cannot read data through the San. In addition to the possible problems of your San, there may also be some non San problems, such as errors in your data itself. In addition, factors such as network connectivity between the server and the storage unit, data deletion, data corruption, or data separation from the server will also lead to similar situations. In this case, you should treat the San system as a case of connecting storage devices and servers directly, and solve San related problems according to this idea

but what should we do if the problem really comes from the San itself? The best strategy is to check for problems from the center of the San and then expand to the edge

step 1: start with the fibre channel level. The reason for this is that the fibre channel switch is in the center of the San, and it is also a device to ensure the network connectivity between servers and storage devices, as well as between servers and storage devices

you should first confirm whether the central switch can be physically connected to the server and each storage device. If each layer can adopt different colors, and you can make sure that the physical connection between them is normal, then you can confirm that the problem is not caused by the optical fiber equipment. When checking optical fiber devices, you should pay attention to places such as unstable connections, missing devices, incorrect area configurations, and incorrect switch configurations

step 2: use some diagnostic tools to test the communication of the switch. This step can test whether the storage device and the switch are normally connected in the authorized material list. If not, then we can know which part of the problem is

it may be that the physical connection between the switch and the storage device is interrupted, or the storage software is not configured correctly. If the switch can communicate with the storage device but cannot communicate with the server, the problem lies in the connection between the switch and the server. This is why I recommend you start from the center of the San to check for problems. With just a few simple tests, you can rule out half the possibility of problems in the San (whether these possibilities come from the server side or the storage device side)

step 3: if you can confirm that the problem occurs between the server and the switch, your work will be simpler. In this case, please check the following possible conditions:

the cause of the problem may be that the bus adapter of the host has problems, the driver is missing, or the configuration is incorrect. In addition, it is also related to the method you configure for the server to access virtual storage devices. You can use the diagnostic tools provided by the hardware manufacturer to judge, or you can run some protocol analysis tools to judge whether the network interface card (NIC) works and determine whether the driver works normally. If the NIC works normally, the problem should be related to the system configuration

in short, checking and solving San problems is indeed a complex task, but you can reduce your workload by doing two things. First: first, judge whether it is the San itself or the ordinary storage device. Second: check and solve problems from the center of San, so that you can quickly locate most of the problems. (end)

