Tuesday, December 15, 2009

PostgreSQL System Archtechture

PostgreSQL is an open source, object-oriented relational database system. Up-to-date, thousands of database applications has been designed using PostgreSQL and it.s wide acceptance justifies the validation of its conceptual architecture, via systematic analysis. This document attempts to sketch the conceptual architecture of PostgreSQL based on developer documentation and source code structure. The aforementioned conceptual architecture can aid in the validation process to be conducted at a later stage, providing a concrete architecture as a result. It is our hope that such a concrete architecture can serve as the PostgreSQL reference architecture for PostgreSQL developers.


1. Introduction


PostgreSQL is an open source, object-oriented relational database system. It was first developed in 1977, under the name .Ingres.. In late 1990s, Postgres adopted SQL standard and took on the name, .PostgreSQL.. Up-to-date, thousands of database applications has been designed using PostgreSQL and it.s wide acceptance justifies the validation of its conceptual architecture, via systematic analysis. This document attempts to sketch the conceptual architecture of PostgreSQL based on developer documentation and source code structure. The aforementioned conceptual architecture can aid in the validation process to be conducted at a later stage, providing a concrete architecture as a result. It is our hope that such a concrete architecture can serve as the PostgreSQL reference architecture for PostgreSQL developers.


The paper first introduces the overall architecture of PostgreSQL as a classical clientserver architecture. Each sub-system: server, client, and storage manager are examined in detail. To illustrate the applicability of our conceptual architecture, we trace work flow oftypical SQL queries in our architectural model. Lastly, we took a critical look at the advantageous and disadvantageous of PostgreSQL and speculate on its evolvability. PostgreSQL uses a mixture of architectural styles. At the top level, the client and server interacts in the classical client-server model, while the data access structure is strictly layered. At the server layer, the query processing is structured as a pipeline, while the database access by server sub-systems is structured as a bulletin board. The interaction between the client and the server is largely request/reply driven and each client is provided with a separate server thread. All of the server thread access a commonly shared data management system.


2. Overall Architecture


Figure 1 PostgreSQL System Concept Architecture


The front-end of PostgreSQL is a typical client-server architecture, while the back-end isa mixed architecture of layered and pipeline design.

Sub-system descriptions



1) Libpq is responsible for handling the communication with the client processes: establishing connection to postmaster; obtaining postgre server thread for the operational session. It forwards user.s operation requests to the back-end. The operation request is forwarded on an as-is basis.


2) The server side is composed of two sub-systems: the postmaster and the postgre server. The postmaster is responsible for accepting incoming connection request from client, performing authentication and access control on client request, and establishing client to postgre server communication. The postgre server handles all queries and commands from client. PostgreSQL is a process per-user. model, which means that one client process can be connected to exactly one server process. Since the postmaster controls incoming all requests from clients and invokes new postgres without network connection, one implication of this architecture is that the postmaster and the postgres always run on the same machine (i.e. the database server), while the front-end (client) application can run anywhere. Because of this, the scalability of PostgreSQL is limited and PostgreSQL is usually used in relatively small database application.


3) Storage Manager is responsible for general storage management and resource control in the back-end, including shared buffer management, file management, consistency control and lock manager. Concurrency Control Multiple threads of PostgreSQL can be executed concurrently accessing a shared data storage. We hereby define a service thread that is performing a data reading as reader, and a service thread that is performing a data writing as writer. In PostgreSQL, readers do not block writers and writers do not block readers. A writer only blocks a writer if they are writing to the same data entry. In the above case, PostgreSQL provides two solutions (based on ISO SQL standard): read committed and serializable. In the case of read committed, the writer will read the new values before performing its write operation. In the case of serializable, the writer will abort if the data value has been modified since it

began its transaction.


3. The Server subsystems


The server host of PostgreSQL largely consists of two parts: Postmaster and Postgres. When a client (front-end) sends a request to access database in the server, the postmaster of the server spawns a new server process, called postgres, which directly communicates with the client. Hence, the postmaster is always running, waiting for requests from a client, while the postgres, which is a process, starts and stops upon the request of clients. After a connection is established, the client process can send a query in plain text form to the back-end. There is no parsing done in the front-end. The server then parses the query, creates an execution plan, executes the plan, and transmits the retrieved tuples to the client over the established connection.


Query/Command Processor


Figure 2 Query/Command Processor Architecture


Figure 2 shows the conceptual architecture of the PostgreSQL server structured as a pipeline. It shows the general control and data flow within the back-end from the time the back-end receives a query to the time it sends the results. The six major subsystems exist in the server:


1) The Parser first checks the query transmitted by the application program for valid syntax. If the syntax is correct, a parse tree is built up and handed back; otherwise, an error is returned. Then, the parse tree is transformed into internal formula used by the database back-end.


2) The Traffic Cop identifies the query as a utility query or a more complex query. The complex queries in PostgreSQL are select, insert, update and delete. These queries are sent to the next stage (i.e. Rewriter.) The utility queries are sent to the Utility Commands.


3) The Utility Commands handles queries that do not require complex handling. Vacuum, copy, alter, create table, create type, and many others are handled by the utility commands.


4) The Query Rewriter is a subsystem between the Parser and the Planner. It processes the parse tree passed by the Traffic Cop and, by applying any applicable rule in present, it rewrites the tree to an alternative form. This stage enables PostgreSQL to support a powerful rule system for the specification of views and ambiguous view updates.


5) The planner provides an optimal execution plan for a given query. The basic idea of the planner is cost-estimate-based selection of the best plan for a query. It first combines all possible ways of scanning and joining the relations that appear in a query. All the created paths lead to the same result and the planner estimates the cost of executing each path. After all, it chooses the cheapest path of all and passes to the Executor.


6) The executor takes the plan passed back by the planner and starts processing the top node. It executes a plan tree, which is a pipelined demand-pull network of processing nodes. Each node produces the next tuple in its output sequence each time it is called. Upper-level nodes call their subnodes to get input tuples, from which they compute their own output tuples. Upper-level nodes are usually join nodes. Each join node combines two input tuple streams into one. In contrast, bottom-level nodes are scans of physical tables, either sequential scans or index scans. The executor makes use of the storage system while scanning relations, performs sorts and joins, evaluates qualifications and finally hands back the tuples derived.


Query/Command Utility


Figure 3 Query/Command Utility Architecture


Figure 3 shows the conceptual architecture of the PostgreSQL server utilities structured as an object-oriented design. Each utility component is distinct in its functionality as

described in detail below:


1) Catalog: it provides system catalog manipulation, and contains creation and manipulation routines for all aspects of the catalog system, such as table, index, procedure, operator, type, aggregate, etc. The catalog module is used by all subsystems of the back-end.

Figure 3 Query/Command Utility Architecture


2) Access: it defines data access for heap, indexes and transactions. Its function is three folds: to provide common data access routines; to provide data access structure in the form of hash, heap, index, btree, etc.; act as phase manager during transactions. The access module is used by all sub-systems of the back-end.

3) Nodes: nodes/lists module defines the creation and manipulation of nodes and lists, which are containers of request and data during query processing. The nodes/lists submodule is used by all sub-systems of the back-end except traffic cop.


4) Utils: utils provides various utilities to the back-end, such as initialization, sort routines, error log, hash routines, etc. It is widely accessed by all sub-systems of the back-end.


5) Bootstrap: the bootstrap module is used when PostgreSQL is being run for the first time on a system. This module is required because postgreSQL commands commonly access data table. Such data tables does not exist when Postgre is been run for the first time.


Catalog system


PostgreSQL uses catalogs to a much greater extent and context than other DBMSes. PostgreSQL not only uses catalogs to define tables, but also uses it to describe datatypes, functions, operators, etc. This feature provides much greater user extensibility and control. User defined datatypes are used to associate new data items particular to specialized databases; User defined functions can be either standard functions or aggregate functions; User defined operators can be used in expressions as standard expression. All of the catalog items are maintained and accessed via the catalog subsystem, providing a uniform organization.


4. The Storage


It provides uniform data storage access for the back-end. Only one storage module is active on a PostgreSQL server. The functionality of the storage module includes: provide shared memory and disk buffer, mediate access to kernel file manager, and provide semaphores and file locks. The storage module is used by rewrite & path generation module and command module. PostgreSQL uses non-overwriting storage management, which means updated tuples are appended to the table and older versions are removed sometime later. PostgreSQL achieves cache synchronization by using a message queue. Every back-end can register a message which then has to be read by all back-ends.


Figure 4 Storage Manager Architecture

File Manager: provides management of general files and large buffered files.

Buffer Manager: provides management of shared buffers.

Page Manager: uses LRU algorithm to manage pages.

Lock Manager: provides .read. and .write. locks to achieve consistency.

IPC: realizes cache synchronization.

Disk Manager: provides interface for physical storage/disk.


Figure 4 Storage Manager Architecture

This figure shows the conceptual architecture of PostgreSQL.s storage manager. In this figure, double-headed arrows indicate dependence on or from all of the storage manager subsystem.


5. Use Case: Query Workflow


Figure 5 Work Flow of Query


1) The SQL query string is converted into a query tree.


2) The query tree is modified by the rewriter in the following way: the rewriter looks up keywords in the query tree and expand it with the provided definition.


Figure 5 Work Flow of Query

3) The Planner takes in the modified parse tree, generates all possible query paths. The planner then evaluates the paths to determine the optimum path and establishes a query plan for this path.


4) The query plan is transformed into a series of executable SQL queries and

processed to obtain results.


6. Conclusion


The server architecture of PostgreSQL has some advantages. First, it pre-filters an incoming query into utility and complex query. This step reduces needless overhead for queries that don.t require complex handling such as rewriting, and planning. Also, the system is likely to find a good query plan without any explicit user.s assistance. The catalog system provides flexible user control for the definition of new datatypes, functions, and operators. On the other hand, the centralized nature of PostgreSQL makes it ill-suited for the management of large-scale databases. Furthermore, the database to be accessed must reside at the same location as the server process. The evolvability of PostgreSQL is two sided. On the positive side, PostgreSQL.s flexible catalog system and elegant client-server mode makes it well suited for home office database management and for applications requiring specialized data manipulation and definition. On the negative side, PostgreSQL.s centralized distribution and lack of location transparency, makes a very undesirable choice for large-scale or distributed database management.

Thursday, December 10, 2009

PostgreSQL System Architecture

PostgreSQL

1. Overview

The PostgreSQL Global Development Group is a community of companies and people co-operating to drive the development of PostgreSQL, the worlds most advanced Open Source database software.

The PostgreSQL software itself had its beginnings in 1986 inside the University of CaliforniaBerkeley as a research prototype, and in the 16 years since has moved to its now globally distributed development model, with central servers based in Canada. at

2. Why PostgreSQL?

There are SQL database systems, which scale, perform, and offer better reliability then PostgreSQL. This is obvious given the competitive commercial market for database software. PostgreSQL offers is complete solution to small to mid-sized companies looking for cost effective reliable data storage. Other solutions such as MySQL are cost effective but don’t offer the same level of reliability or features expected from an enterprise class database system.

PostgreSQL development focuses on a rich set of tools that ensure the productivity and scalability to the largest audience possible. PostgreSQL is used in a variety of corporation’s for various uses including research, storage, online-access and rapid-access storage.

3. Why MySQL?

For completeness I felt I should add this section. MySQL is not useless and has many applications given the correct circumstances and data requirements. Among the most evident are high read queries and low insertions. MySQL outperforms most other databases on raw read performance, so given a high read requirement for a given application; MySQL may be the ideal choice. For the most part I have found MySQL is most often used because of its simplicity to setup and run.

4. PostgreSQL vs. MySQL

The only thing I can say is that for small websites MySQL is much faster, but as you site becomes larger needs more features and requires better scalability, PostgreSQL becomes the undeniable choice. For further, reference refer to:

http://www.databasejournal.com/features/mysql/article.php/3288951

http://www.phpbuilder.com/columns/tim20000705.php3

http://www.webtechniques.com/archives/2001/09/jepson/

4.1. MySQL

For the novice user MySQL is more than adequate for solving most database needs. This type of tasks MySQL excels in includes logging, form storage and other web-based small transactions. MySQL is also useful for very large databases but only under certain circumstances. Provided the database is accessed in a standalone method (no transactions). Since MySQL performs fast reads for large databases it exceed PostgreSQL for large datasets in terms of read performance but as soon as a large number of users start accessing the database (insertions & updates), the system does not scale at an acceptable rate expected for large companies and organizations.

4.2. PostgreSQL

PostgreSQL is by far the smartest solution for most database tasks. It lies in terms of complete solution between the scalability of MySQL and other large commercial database vendors. For moderate to large projects PostgreSQL is almost always the ideal and safest choice. At first glance both MySQL and PostgreSQL look very similar and perform for the most part the same in light load environments but as soon as the requirements for the database increase such as a need for greater query functionality or scalability, then PostgreSQL quickly becomes the optimal choice. Unlike MySQL, PostgreSQL enjoys a full feature query language similar in size and functionality to other commercial databases such as Oracle and DB2. Of course, it goes without saying that DB2 and Oracle are the premiere contenders in query functional but also have a $25,000+ price tag.

4.3. Other Commercial & Non-Commercial Databases

It goes without saying that commercial databases for the most part have all of the features MySQL and PostgreSQL contain but two main points make MySQL and PostgreSQL a viable solution for both personal and professional use. The first is cost, depending on the size of the company they may not be willing to pay hundreds or thousands of dollars for a database system, and this is becoming a strong emphasis for companies because most of the functionality in PostgreSQL and MySQL is all that is needed for most user applications. There is still a small section of the industry that require commercial systems for their functionality, such as spatial analysis but even that is quickly being implemented into current version of MySQL and PostgreSQL. Second is ease of use and cost of long-term administration. Every time a company buys a piece of hardware they have to consider how much its going to cost in five, to ten years to replace it. If the long-term cost out weighs the short-term cost there is no reason to buy it at all. The same principle holds for databases. In the case of commercial database servers it is yet to be decided which cost less on a long-term scale but it seems to be obvious from the recent growth in free database servers that free is definitely better.

4.4. Which do I choose?

That depends on the needs of the company. In my experience MySQL and/or PostgreSQL can handle just about any task but in the short term the in ability of MySQL to handle a persistent storage model or guarantee consistency, which cripples MySQL’s usability under most business architectures.

5. The ACID Test

5.1. Atomicity

Atomicity is an all-or-none proposition. This guarantees that for a given request that either the entire request is competed or non-at-all. This is a important feature that PostgreSQL and MySQL supports.

5.2. Consistency

Consistency guarantees that a transaction never leaves your database in a half-finished state. If a part of a transaction fails then the portion of the transaction already completed is rolled-back to the original state. This is supported by PostgreSQL but not by MySQL.

5.3. Isolation

Isolation keeps transactions separated from each other until they-re finished. It ensures that there is not mixing of transaction data during execution. This is important for data integrity. This is supported by PostgreSQL but not by MySQL.

5.4. Durability

Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. This is supported by PostgreSQL but not by MySQL.

6. Setup / Installation

On both Windows and UNIX based systems installation of the application is a straightforward process only involving the installation of a single package. PostgreSQL database data is compatible from version to version, allowing for quick upgrades and installations.

7. System Configuration

7.1. Resources

System requirements for PostgreSQL like any other database system are completely dependent on the size and structure of the data being stored. Most web databases including web logs and form data need very little storage or system resources but for larger datasets such as equities data, spatial information or streaming data large amounts of resources are needed for processing and the data in an efficient manor. When designing a database system there are three main sections that need to be addressed.

· The first, is hard disk performance, hard disks are the single slowest medium on computers. For databases which are larger, greater the one terabyte, they cannot fit all of the data into main memory, therefore the system they must access the disk several times in order to traverse the index to find the specified data. These create a requirement for very fast disks, which take little time seeking and locating information.

· The second is CPU resources. In most cases this does not become a issue in large database systems because the processor spends most of its time waiting for data to be found on the disk but in systems where there is a high number of transactions i.e. credit transactions because a majority of the time is spent moving small amounts if data in memory.

· The third and most important resource is main memory. The amount of main memory on a system can make or break the performance of a system independent of the speed of the processor or the disks. System memory not only acts a fast medium to process data but also acts the primary storage units for table indexes. For large databases it is imperative that as much of the table index is stored in memory to reduce the number of disk accesses to the disk.

The combination of these three components makes up most if not all database systems today. There are other components, which go into designing a database system but none of them make an impact as great as CPU resources, hard disk performance, and main memory.

7.2. Replication

Database replication is the corner stone of database management for systems of all sizes. The ability to backup streaming data and more important survive a hardware failure is key in providing reliable database services. Because standard backups put a high load on the system (disk I/O), another method is needed to backup and restore data. The method of choice is replication. Replication is a method of data duplication via SQL or other proprietary communication methods.

Once setup, there is a main server and any number of replication servers that act as streaming mirrors to the main server. Every insertion or update that is made to the main SQL server is propagated to all of the replication servers to update the data. This ensures upon a failure any of the replication servers can be used as the new primary server. Other features include bi-directional propagation, which allows any of the servers to be accessed and any changes made to one node will propagate to the other child nodes arbitrarily. This is method creates a true autonomous system allowing for multiple failures to be handled gracefully, but is also the hardest backup system to keep running coherently due to decrease in data stability. This is obvious since a single bad drive could start spewing bad data to all of the other replication servers.

7.3. Scalability

Scalability is among the hardest obstacles to overcome using commodity hardware and software but PostgreSQL offers elegant solutions that allow database to be scaled to virtually and limitless level of abstraction. This system uses a single thread to carry incoming queries but each of these queries is handled on there own thread. Even though the query receiving thread is alone it still offers better or equal scalability to MySQL. In terms of multi-computer scalability, PostgreSQL does not scale at all. For large cluster based clustering, commercial database systems are by far more mature; some good implementations include Oracle and DB2.

The most important aspect of SQL server scalability is the ability for a server to scale both based on the number of users and size of the dataset. In both of these area’s PostgreSQL performs very well. PostgreSQL scales to large number of users linearly with the hardware it is installed on. PostgreSQL uses a modified B-Tree structure that scale to almost an infinite level of data.

8. Authentication

8.1. Authentication Methods

PostgreSQL supports various forms of authentication including trust, password, Kerberos, ident and PAM. These are all handled in the configuration file “pg_hba.conf”.

When trust authentication is specified, PostgreSQL assumes that anyone who can connect to the server is authorized to access the database as whatever database user he specifies (including the database super user). This method should only be used when there is adequate operating system-level protection on connections to the server.

The password-based authentication methods are md5, crypt, and password. These methods operate similarly except for the way that the password is sent across the connection. If you are at all concerned about password "sniffing" attacks then md5 is preferred, with crypt a second choice if you must support pre-7.2 clients.

Kerberos is an industry-standard secure authentication system suitable for distributed computing over a public network. A description of the Kerberos system is far beyond the scope of this document; in all generality it can be quite complex (yet powerful). The Kerberos FAQ or MIT Project Athena can be a good starting point for exploration. Several sources for Kerberos distributions exist.

The ident authentication method works by inspecting the client's operating system user name and determining the allowed database user names by using a map file that lists the permitted corresponding user name pairs. The determination of the client's user name is the security-critical point, and it works differently depending on the connection type.

8.2. Authentication/Privilege Granularity

One of PostgreSQL most powerful features is the ability to allocate privileges at a fine level. The is apparent in its ability to not only give privileges to tables and databases but also to give privilege to create databases on a per user basis, therefore giving each user the ability to create there own database and administration capabilities. MySQL also provides privilege-based access but on a very simplistic level by offering all of the same controls but many times requiring many additional settings to achieve the same privileges PostgreSQL can accomplish in a single command.

9. Managing Resources

9.1. Resource Limits

Unix-like operating systems enforce various kinds of resource limits that might interfere with the operation of your PostgreSQL server. Of particular importance are limits on the number of processes per user, the number of open files per process, and the amount of memory available to each process. Each of these have a "hard" and a "soft" limit. The hard limit is what actually counts and the soft limit is not usually enforced. The root user can only change the hard limit.

System resources can also be controlled at the user level by Unix-like operating systems but for windows resource limits built-into the applications are a needed addition to ensure quality of service for all users. The PostgreSQL server uses one process per connection so you should provide for at least as many processes as allowed connections, in addition to what you need for the rest of your system. This is usually not a problem but if you run several servers on one-machine things might get tight.

9.2. Linux Memory Overcommit

In Linux 2.4 and later, the default virtual memory behavior is not optimal for PostgreSQL. Because of the way that the kernel implements memory overcommit, the kernel may terminate the PostgreSQL server (the postmaster process) if the memory demands of another process cause the system to run out of virtual memory.

10. Securing Systems

10.1. Secure TCP/IP Connections with SSL

PostgreSQL has native support for using SSL connections to encrypt client/server communications for increased security. This requires that OpenSSL is installed on both client and server systems and that support in PostgreSQL is enabled at build time.

With SSL support compiled in, the PostgreSQL server can be started with SSL enabled by setting the parameter ssl to on in postgresql.conf. When starting in SSL mode, the server will look for the files server.key and server.crt in the data directory, which should contain the server private key and certificate, respectively. These files must be set up correctly before an SSL-enabled server can start. If the private key is protected with a pass phrase, the server will prompt for the pass phrase and will not start until it has been entered.

The server will listen for both standard and SSL connections on the same TCP port, and will negotiate with any connecting client on whether to use SSL.

10.2. Secure TCP/IP Connections with SSH

One can use SSH to encrypt the network connection between clients and a PostgreSQL server. Done properly, this provides an adequately secure network connection.

11. Enterprise Systems

11.1. Requirements

System requirements for enterprise grade servers vary greatly from the machine a user would use for say a web log or DNS server. Enterprise systems must offer both a higher level of reliability and speed then a standard workstation can offer. Some of these requirements usually include some sort of RAID storage for the data and also some sort autonomous system recovery. Any further information in regards to enterprise level hardware is beyond the scope of this paper.

11.2. Performance

System performance varies from system to system based on the requirements pre-specified by the owner, company, or application. Configuration scripts are the key to creating a PostgreSQL system that is optimized for both the hardware and the software. From a hardware standpoint, the most important part of optimization is estimating the amount of resources available on the system and the required resources by the database and the expected number of users accessing the system.

In most cases systems are dedicated to running either only the PostgreSQL database or the database with minimal or no other applications running. In this case the system can be setup to allocate most if not all of the available resources to the to the SQL server. For PostgreSQL these resources are located under “/var/lib/pgsql/” under the files name “postgresql.conf”. This file contains all of the options for limit all of the available resources to the PostgreSQL server.

11.3. Cost

PostgreSQL cost per system is approximately linear with the size of the dataset times the number of users expected to access the database. The cost saving from PostgreSQL lies in the licensing of the software making it essentially free (BSD License). This in-tern leaves the cost of the entire system to only the cost of the hardware.

The cost of hardware per system depends on the required resources as stated but also other supporting hardware (systems) for replication and backup, which acts as streaming and static backups of the current data. This backup server(s) usually cost the same or slightly less then the main server.

12. Backup & Restoration

12.1. High Availability

So what is high availability? High availability is anything this is going to give your SQL server(s) or any server for that matter a level of service availability greater then that of what could be achieved through the use of normal installation of a SQL server or etc. This has become a buzz word in recent past because the idea of high availability has been around for quit a while through mainframe technologies but only recently has the idea of high availability been brought to commodity hardware.

12.2. Replication

Database replication has an undeserved reputation for being complex to setup and prone to failure. The early version of MySQL and PostgreSQL replication were difficult to configure because the process was inadequately documented. Current versions of replication offer a more manageable and complete solution for datareplication, especially for multi-point storage.

13. Cluster & High Availability Technologies

13.1. High Availability

So what is high availability? High availability is anything this is going to give your SQL servers or any server for that matter a level of service availability greater then that of what could be achieved through the use of normal installation of a SQL server or etc. This has become a buzz word in recent past because the idea of high availability has been around for quit a while through mainframe technologies but only recently has the idea of high availability been brought to commodity hardware.

13.2. Replication

Database replication has an undeserved reputation for being complex to setup and prone to failure. The early version of MySQL and PostgreSQL replication were difficult to configure because the process was inadequately documented. In its most basoc form, replication consists of two servers: a master and a slave. The master server(s) record all queries that modify or add data to the databases. The slave connects to the master, reads queries from the master’s binary log (list of changes), and executes them against its local copy of the data.

· Data distribution: allows data to be copied to multiple locations without any manual interventention. This can be particularlly helpful where live backups must be kept at all times.

· Load balancing: replication offers a convient method for creating a load balancing system for select queries. It does not scale for inserts or updates because they must all be written to the master node. Multi-master master mode is possible but is prone to errors.

· High availability & failover: Like other 24/7 services, SQL servers cannot have downtime and suffer from backup preformance degredation. Replication distributes backups to live feeds, therefore eliminating/reducing the need for scheduled backups.

Replication as shown solves many problems but still leaves some problems to be solved. Among the largest of these issues is scalable writes. Under the current replicaton architecures writes cannot scale with the number of nodes, therefore creating a severe bottleneck for certain query circumstatnces. For these problems another systems is used called clustering.

13.3. Clusters

MySQL Cluster combines the world's most popular open source database with a fault tolerant database clustering architecture so you can deliver mission-critical database applications with 99.999% availability.

MySQL Cluster enables you to:

· Cost-effectively deliver 5 nines availability using parallel server architecture with no single point of failure.

· Deliver the performance and high throughput required to meet the most demanding enterprise applications.

· Incrementally scale your applications in a linear fashion as your needs grow without having to invest in expensive hardware.

MySQL Cluster have a flexible distributed architecture, which gives you complete control over the level of performance, reliability and scalability you need to match any applications requirements.

14. Credits

14.1. Book: High Performance MySQL

· Abstract: Advanced Tools & Techniques for MySQL Administrators

· By: Jeremy D. Zawodny & Derek J. Balling

· O’reilly Press 2004

14.2. PostgreSQL Manual

· Abstract: Manual

· By: http://www.postgresql.org/docs/7.4/static/biblio.html

· PostgreSQL 2004

· Many of the sections in this whitepaper are copied directly from this site. I am not claiming them as my own in any way shape or form.

14.3. New Architect: PostgreSQL vs. MySQL

· Abstract: Comparison based on building a better database

· By: http://www.webtechniques.com/archive/2001/09/jepson/

· Most of the stuff used if from the ACID testing section.

For more detailed information, You can download the following books :


1. PostgreSQL basic tutorial guide:

http://www.ziddu.com/download/7760529/PostgreSQL_basic_tutorial.rar.html

2. PostgreSQL programmer guide:

http://www.ziddu.com/download/7760528/PostgreSQL-programmer.rar.html

3. PostgreSQL user guide:

http://www.ziddu.com/download/7760527/PostgreSQL-userguide.rar.html