HDFS is a distributed file-system with a master-slave architecture, in which the NameNode is a master process while DataNodes are slave processes. A NameNode stores only the metadata about the file system while the DataNodes store the actual data. If the client needs to perform any file operations like read, delete, create etc. then the client has to get the metadata information from NameNode first.
When NameNode is started, all the metadata information in memory must be in a consistent state. Till the point the NameNode does not have all metadata, it must remain in Safe Mode. After it has acquired all the information needed for clients to carry out their work, NameNode will move to active mode.
We have illustrated the complete startup flow of the NameNode in the below flowchart.
Safe mode provides stability in a persistent state to allow the NameNode to enter the Active mode. Without Safe Mode, the NameNode would start in an inconsistent state. This would result in the unavailability of data to the clients and increase the overhead of replicating data on the data nodes in the HDFS cluster. Thus, Safe Mode is an important phase for the NameNode as well as the HDFS.
Alen Frantz (Big Data Engineer)
The Definitive Guide (3rd Edition) by Tom White