This post explains the basics of distributed systems, defines what a distributed database system is, and demonstrates its three main types: mirroring, replication, and fragmentation. This post also concludes with a simple example of a real-life distributed database system that we use every day.
The first thing that comes to mind, when discussing distributed systems, is a network-connected computer running a program to achieve a common goal. In other words, distribute the load among the available network resources such as storage and processing capabilities. Wikipedia defines a distributed system as “a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other to achieve a common goal.”
Distributed Database Systems
Using the same concept, a distributed database system can be seen as a system that stores and manage data access across different computers in a network. In their book, “Principles of distributed database systems,” Özsu and Valduriez define a distributed database as a collection of multiple, logically interrelated databases distributed over a computer network that is managed by a distributed database management system (DDMS) which make those distributed databases transparent to the users.
In a typical networked environment, a central database model manages and stores all its databases on one network server, which leads to a single point of failure. This approach is common when all the computers are at the same location. If this database server fails for any reason, all the databases will be inaccessible; thus, all the applications will not be able to work properly. Also, all the network traffic will be directed to one server, that server should be able to handle incoming requests regarding network load, memory, and performance requirements.
On the other hand, in a distributed system, we can notice that databases are scattered on different sites in the network. Depending on the used model, when a server goes down, all or part of the data could still be accessible. Again, the load is distributed across multiple servers and resources, which in turn enhances the overall system and network performance. A distrusted database system offers the following promises: transparent management of distributed and replicated data, reliable access to data through distributed transactions, improved performance, and easier system expansion.
Central vs. distributed database models
Distributed database models
The main approaches used in distributed database systems are replication, mirroring, and data fragmentation. Each had its pros and cons. Thus each is suitable for a different scenario. Still, all can be mixed and matched together.
Mirroring refers to the process of making a copy of the database to another computer. The first database is the primary, and the second is the mirror, users and applications connect and use only data available at the primary, any data updates are also applied to the mirror as well. The mirror is acting as a failover backup, so when the primary goes down, the mirror takes over again. Mirroring is mainly used to provide high availability for critical systems that don’t bear to be offline. This is very common as a disaster recovery solution, where the main database is in a city and the backup database in another city. In some cases, it is in another country.
Replication is like mirroring where two databases are involved; a primary and a replicated one, and updates are immediately reflected in the replicated database. Data on the replicated database is accessible for users and applications for read-only or read and write operations, updates on the replicated databases can be merged with the original primary database. This approach is used to provide data redundancy and performance optimization. For example, a multi-threaded critical online application will heavily use the primary database, while reports and lower priority applications can use the replicated server.
That will reduce primary server load and prevents reports and lower priority applications from using up all the primary server resources in a way that blocks other critical apps.
Another alternative is to use fragmentation, which instead of distributing databases across remote servers, the database is divided into partitions called fragments, those fragments are then distributed. Fragments are distributed to the right servers serving the right users. For example, a large enterprise database with global customers would distribute horizontal fragments on multiple servers at different geographical locations, where the fragment containing the middle ease customers would be hosted on a server in this region. It is no feasible for this fragment to include the USA data for multiple reasons, like data security, and enhancing applications performance by using a server that is closer to the customer locations. Data can be fragmented vertically, where each record is divided among different departments. Each has only access to the relevant fields. For example, the table below will be two fragments for engineering and manufacturing fragmentation.
Everyday use for distributed fragments databases for global targets is the CDN or content delivery network that is commonly used by social networks; intelligent algorithms detect users’ location and migrates his data to a closer server.
This post explained the basics of a distributed database system, show its different types, and demonstrated a simple example CDN example. Did you work on any distributed database or system? Please share your experience and challenges.