Understanding Isolation Levels in Galera Cluster

transaction-isolation-levels

In an earlier article we have gone through the process of Certification Based Replication in Galera Cluster for MySQL and MariaDB. To recall the overview, Galera Cluster is a multi-master cluster setup that uses the Galera Replication Plugin for synchronous and certification based database replication. This article examines the concept of Isolation Levels in Galera Cluster, that is an inherent feature of the MySQL InnoDB database engine.

A transactional DBMS uses the concept of isolation in which concurrent transactions are isolated from each other for performance, reliability and consistency. The setting “isolation level” determines the degree of isolation needed when multiple transactions try to make changes and issue queries to the database. Transaction Isolation represents the “I” in the ACID model; a set of database design principles that ensure reliability, consistency and durability needed for any transactional DBMS.

The ACID Model of Database Design
ACID stands for atomicity, consistency, isolation and durability. Database transactions should be atomic units of changes that can be committed upon successful completion or can be rolled back upon abortion of transaction. The database state should be consistent at all times, during the progression of a transaction and after commits and rollbacks. Queries should be able to see either the updated values or old values, but not a mix of two, when related data is being updated by a transaction across multiple tables. Transactions should be isolated from one another, so that one transaction could not interfere with the work set of another transaction and also to avoid conflicts. This is achieved through table and row locking. Isolation is controlled by the isolation level parameter. The more intense and prolonged is the isolation level, so us the locking, thus causing delays and performance issues for the sake of transaction integrity. So a balance between performance and isolation is what needed to be set as the isolation level. The new state of a database resulted after a committed transaction should be safe from possible database vulnerabilities like data loss, crashes, hardware/software failure, power failure etc. This quality represents durability of a transaction and is typically achieved through writing data to disk storage or like the doublewrite buffer mechanism in InnoDB.

Transaction Isolation Levels
The MySQL InnoDB engine and hence Galera Cluster supports 4 different isolation levels:

READ-UNCOMMITED: In this level, transactions can see data changes made by other transactions even before they are committed. Also known as dirty read, this level does not provide real isolation at all.
READ-COMMITED: Here transactions can see only committed changes made by other transactions. Here non-repeatable reads take place, means the SELECT queries read the committed data prior to the query. So when a single transaction runs multiple SELECT queries, each one sees their own snapshot of committed data that are different due to the changes in data caused by other transactions.
REPEATABLE–READ: This is the default isolation level for MySQL InnoDB. Here snapshots of data are taken before the first SELECT query and all subsequent queries see the same snapshot causing repeated read of same data. So queries will not see changes committed by other transactions making reads repeatable.
SERIALIZABLE: In this level, all rows accessed by the transaction are locked and appends are blocked. Since the data snapshot available to SELECT queries are the same ones, this is similar to REPEATABLE-READ but read-only.

Isolation Levels in Galera Cluster
Isolation levels are set when a client request causes a transaction. In the cluster these can be applied on a single node level (Intra node level) and on global cluster level (Inter node level) for all nodes. Single node level isolation can extend to the maximum level allowed by InnoDB engine, while for global cluster level isolation depends on the replication protocol.

In the master-slave mode of Galera Cluster, all four levels of isolation can be used, but multi-master mode supports only the REPEATABLE-READ level.

Working of Transaction Isolation Levels in Galera Cluster
Galera Cluster is utilizing the transaction isolation level feature that is inherent to the MySQL InnoDB engine. So transactions running on the same node in the cluster have isolation levels set for its MySQL configuration. That means if we have configured MySQL with the default REPEATABLE-READ level, transactions at same nodes will be isolated at that level. For transactions issued on separate cluster nodes, Galera Cluster uses SNAPSHOT ISOLATION, also called “first committer wins” isolation. This is used to solve the “lost update” problem that generally affects the plain REPEATABLE-READ isolation.
The lost update problem with REPEATABLE-READ on transaction with separate nodes typically occurs as follows when concurrent transactions operate:

1. Transaction 1 and Transaction 2 operating on node 1 is getting the same data snapshot before executing the operation.
2. Transaction 1 uses SELECT and get the snapshot from node 1.
3. Transaction 2 uses SELECT and get the same snapshot from node 1.
4. Transaction 1 updates the data on node 1, locking node 1 table.
5. Before Transaction 1 committing the above transaction, Transaction 2 trying to update data on node 1 (locked for Transaction 1’s commit), based on the previous snapshot that is no longer relevant now, because in step 4 above, Transaction 1 has already changed the data, but pending commit. Since this is plain REPEATABLE-READ and Transaction 1’s commit is pending, this change is not visible to Transaction 2
6. Transaction 1 commits data update made in step 4 and lock is released.
7. Transaction 2 commits its data update since lock is released. But note that, in step 5, Transaction 2 totally missed the update made by Transaction 1.

With Galera Cluster’s SNAPSHOT ISOLATION, the first committer wins logic is used for ensuring data integrity when concurrent transactions operate on separate nodes in the cluster (same database). It is implemented as follows:

Transaction 1 operates on node 1 and Transaction 2 on node 2, same database.
1. Transaction 1 begins transaction on node 1.
2. Transaction 2 begins transaction on node 2.
3. Transaction 1 reads (SELECT) from node 1 and gets the initial snapshot.
4. Transaction 1 reads (SELECT) from node 2 and gets the same initial snapshot as Transaction 1.
5. Transaction 1 updates data on node 1.
6. Transaction 2 updates same data on node 2. Here node 2 does not have the lock since Transaction 1 is operating on node 1. However Transaction 2 did not see the updated data (step 5).
7. Transaction 1 commits the update.
8. Transaction 2 starts commit, but failed due to the deadlock caused due to conflicting WRITE because Transaction 1 is working on the database. So transaction is aborted and restarted.
9. Transaction 1 sees (SELECT) the updated snapshot on node 1.
10. Transaction 2 reads (SELECT) and sees the updated snapshot as seen by Transaction 1 on step 9. So the first committer has won.

If the above scenario changes to concurrent transactions working on same node with REPEATABLE_READ isolation level, “lost update” issue arises. To overcome this issue, Galera Cluster utilizes MySQL/InnoDB’s SELECT FOR UPDATE statement for reading for updates. This statement locks the table from the READ operation prior to UPDATE operation itself making the table read only so that conflicting WRITEs are prevented. This is depicted below:

1. Transaction 1 and begins operating same data on node 1.
2. Transaction 1 issues SELECT FOR UPDATE statement. Lock is applied to table.
3. Transaction 2 issues SELECT FOR UPDATE statement, but is blocked by the InnoDB lock on node 2 data.
4. Transaction 1 updates and commits transaction on node 1.
5. The waiting Transaction 2 sees the updated snapshot on node 1 once the lock is removed. It issues update and commit the transaction without missing the update from Transaction 1.
In this way Galera Cluster can be configured to use REPEATABLE-READ isolation to overcome lost update problem using proper read locks. As of now, Galera does not support SERIALIZABLE isolation level in multi-master setup. This is because, currently the replication protocol does not have mechanism to prevent the read locks from being overwritten by replication. SERIALIZABLE isolation works in controlled master-slave setup.

Conclusion
Thus Galera Cluster effectively utilizes MySQL/InnoDB capability of isolation levels, especially REPEATABLE-READ isolation level coupled with proper read lock configurations to achieve data integrity and consistency. So Galera replication maintains and upheld the ACID compliance that is an essential requirement for any modern DBMS.

Read more


Monitoring Your MySQL Database Using mysqladmin

database-monitoring

As with any other dynamic running process, the MySQL database also needs sustained management and administration of which monitoring is an important part. Database monitoring allows you to find out mis-configurations, under or over resource allocation, inefficient schema design, security issues, poor performance that include slowness, connection aborts, process hanging, partial resultsets etc. It is giving you an opportunity to fine tune your platform server, database server, its configuration and other performance factors. This article provides you an insight into monitoring your MySQL database through tracking and analyzing important database metrics, variables and logs using mysqladmin.

Is your MySQL server running?
Obviously availability or up-time is the primary concern of any production deployment. But at the same time it does not guarantee that the running server will accept connection or whether it is running out of space. Still, it is worth to know about the server status. There are many methods to detect a running MySQL server:
1. Ping the server
mysqladmin –u root –p ping
Returns 0 if server is running and 1 if not. Even for access denial cases, 0 will be returned since server is still running. A remote server can be pinged by specifying the host name (IP or Domain if DNS resolution is available).
Eg: mysqladmin -h db.myserver.com –u root –p ping
or
Eg: mysqladmin -h 100.100.100.100 –u root –p ping
A running server will return:
mysqld is alive
2. Get the database server status
Eg: mysqladmin -h 100.100.100.100 -u root -p status
or
service mysqld status
3. Get server version
mysqladmin -u root -p version
The results will include version, uptime, threads, slow queries etc.

Get the MySQL Server Metrics and Optimize Database
The important metrics (server status variables) that reflect the health and status of a MySQL server are:

Aborted_clients: number of connections aborted because client did not closed the connection properly. Possible reasons are: client did not close connection using mysql_close() before exiting, client was sleeping for a time interval (seconds) more than that is set in the variables wait_timeout or interactive_timeout without notifying server and client program ended abruptly during a data transfer. After aborting a connection, server will log the details in error log:
[Note] Aborted connection 854 to db: 'customers' user: 'john'
Aborted_connects: number of failed connection attempts. Possible reasons could be: a client without required privileges tried to access a database, client used wrong password, the connection packet was missing some required information, and there was a delay more than the connect_timeout seconds to receive a packet from the client. Check error log to find out the affected user and grant needed privileges. It can also be due to an attack like spamming, DOS or brute force.
A common reason for above scenarios can be: the max_allowed_packet variable that contains the maximum memory size in bytes is too small or queries needed more than the allocated memory for mysqld. You can increase this memory through any of the following methods:

[mysqld]
max_allowed_packet=32M
OR
shell> mysqld --max_allowed_packet=32M
OR
shell> mysql --max_allowed_packet=32M

Other reasons could be related to network protocols, faulty hardware, OS thread level issues etc.

Connections: number of connection attempts (both successful and failed)
Uptime: number of seconds server has been up. A smaller value my point to intermediate shutdowns or server crashes.
Threads: number of active threads (client connections). This should reflect an average concurrent amount of connections. Otherwise the Aborted_clients and Aborted_connections should be examined for high values.
Threads_connected: number of threads created to connect clients at the current moment. A zero and too high values can be issues. Too high can be due to either smaller thread_cache_size value or too many concurrent client connections happening. This need to be investigated against database attacks especially DOS.
Threads_created: Threads created to handle connections. An optimal setup ensures threads to be reused through thread cache. A too high value for this variable can be due to smaller thread_cache_size value and it should be increased.
Questions: number of queries received from clients since last server start.
Slow queries: number of queries that have taken more than long_query_time seconds. Identify the slow queries and optimize them for improving performance. Check whether the database uses proper indexing and queries are utilizing this.
Open tables: number of tables that are currently open. A low value for this and high value set for table_cache means memory is free and table_cache size can be reduced and vice versa.
Bytes_received: bytes of data received from clients.
Bytes_sent: bytes sent to clients.
Max_connections: maximum permitted number of simultaneous client connections.
Max_used_connections: maximum number of concurrent connections occurred at any time. If max_used_connections/max_connections is high like more than 75%, there are chances for server to run out of connection slots and denies further connections.
Max_execution_time_exceeded: number of SELECT statements that caused a time out in execution. Check the slow query log and error log to find and optimize slow queries, if this value is high.

Check above 2 values for abnormal hike in traffic.

The server status variables can be displayed by issuing the SHOW STATUS command:
mysql> SHOW GLOBAL STATUS;
+--------------------------+------------+
| Variable_name | Value |
+--------------------------+------------+
| Aborted_clients | 0 |
| Aborted_connects | 0 |
| Bytes_received | 155372598 |
| Bytes_sent | 1176560426 |
| Connections | 30023 |
| Created_tmp_disk_tables | 0 |
| Created_tmp_tables | 8340 |
| Created_tmp_files | 60 |
...
| Open_tables | 1 |
| Open_files | 2 |
| Open_streams | 0 |
| Opened_tables | 44600 |
| Questions | 2026873 |
...
| Table_locks_immediate | 1920382 |
| Table_locks_waited | 0 |
| Threads_cached | 0 |
| Threads_created | 30022 |
| Threads_connected | 1 |
| Threads_running | 1 |
| Uptime | 80380 |
+--------------------------+------------+
OR
shell> mysqladmin -u root -p extended-status processlist
+----+-------+-----------+----+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+----+---------+------+-------+------------------+
| 51 | monty | localhost | | Query | 0 | | show processlist |
+----+-------+-----------+----+---------+------+-------+------------------+
Uptime: 1473624 Threads: 1 Questions: 39487
Slow queries: 0 Opens: 541 Flush tables: 1
Open tables: 19 Queries per second avg: 0.0268

To see values of specific variables, use LIKE clause with or without patterns:
$ mysql -u root -p
mysql> SHOW GLOBAL STATUS LIKE 'aborted_connects';
mysql> SHOW STATUS LIKE 'Key%';
+--------------------+----------+
| Variable_name | Value |
+--------------------+----------+
| Key_blocks_used | 14955 |
| Key_read_requests | 96854827 |
| Key_reads | 162040 |
| Key_write_requests | 7589728 |
| Key_writes | 3813196 |

Check Logs Frequently
The slow query log and error log needs to be checked regularly to detect issues. An effective method is to monitor and investigate both logs and server status variables to identify issues and space for further optimization.

InnoDB Table Monitoring
The command SHOW ENGINE INNODB STATUS can be used to detect aborted or rolled back transactions. This indicates presence of deadlocks and the InnoDB configuration as well as client application’s deadlock handling mechanism should be enhanced. Below are some server status variables for InnoDB that need to be monitored:

Innodb_row_lock_waits: number of times operations on InnoDB tables had to wait for a row lock.
Innodb_buffer_pool_wait_free: number of times InnoDB had waited for memory pages to be flushed. This happens when there are no free pages available for writing and InnoDB flushes some pages first and waits for this operation to be finished. A too high value indicates that innodb_buffer_pool_size is too small and needs to adjust according to the work load.

Summary
The mysqladmin tool along with other MySQL administrative tools is sufficient enough for properly monitoring the MySQL database. Regular monitoring, investigation of any issues and performance degradations and taking corrective measures on a daily basis will make your database robust and well optimized through an evolution process. This is the desired and feasible way of adapting your MySQL database to fulfill its responsibilities like a faultless engine working harmoniously.

Read more


Choosing between AWS RDS MySQL and Self Managed MySQL EC2

aws-rds-ec2
The Amazon Web Service (AWS) products AWS EC2 and AWS RDS comes with sophisticated technology features and capabilities that serve the needs for Web based computing and storage efficiently. When coming to the decision making process for selecting between AWS RDS and the self managed AWS ECS or any independent MySQL DBaaS for your MySQL database needs, there are some prominent factors to consider. These factors are crucial and affect the operational and economic aspects of your database deployment. This article provides a comprehensive overview of both products to help you understand the distinction between them for making a suitable and cost-effective decision.
AWS EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides scalable cloud computing facilities. You can select either a pre-configured instance optimized for your application needs or first deploy an instance and configure it through provisioning and scale up/down facilities. The EC2 packages are organized according to the level of optimization and resource allocation done in terms of CPUs, Memory, Storage, etc.
Its advantages are:
1. Pay per provisioning.
2. Quick scale up/down capabilities.
3. Self managed with root access and allows configuration of CPU, memory, storage etc.
4. Rapid commissioning of multiple server instances.
5. Choice to select instance types and OS (Linux and Windows).
6. Availability of pre-configured instances.
7. Flexibility to choose instance location.
8. Elastic IP addresses.
AWS RDS
Amazon Relational Database Service (RDS) is a managed web service that assists in setting up a relational database in cloud. Its advantages are:
1. Resource scaling.
2. Automated database administration for hardware provisioning, database setup, patching, upgrades, backups, etc.
3. Automatic failure detection and recovery.
4. Flexibility to schedule and perform backups and restore operations.
6. Read scaling through Read Replicas.
7. Availability of AWS IWM (Identity and Access Management) and network isolation through Amazon Virtual Private Cloud (VPC) to enhance security.
8. On-demand pricing.
9. High availability with provisioning of Multi-AZ DB instance deployments with synchronized standby replicas on different Availability Zones for automatic failover and recovery.
Amazon RDS is implemented by using DB instances – an isolated database environment in the cloud and can contain multiple databases. A DB instance can be created and modified by using the AWS CLI, the AWS API or the AWS management console. Currently the Amazon RDS supports the DB engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle and SQL Server.
Amazon MySQL RDS
It is the MySQL based DB instance deployments in the cloud. As like other types of DB engines, MySQL RDS also offers advantages like high availability, automated DB administration, backup and recovery, provisioned storage and IOPS with scaling, read replicas on different AWS regions for load balancing and automatic failover, isolation and security, Multi-AZ instances with synchronous standby replicas on separate Availability Zones for failover and backups, etc.
Limitations with MySQL RDS
1. Storage is charged per provisioning, not according to consumption. Once storage is allocated, it cannot be returned without rebuilding the instance from scratch or a logical backup restore.
2. Access to the Operating System and Shell Access is denied. So tasks that need to be run from an OS access cannot not be done.
3. MySQL user accounts with SUPER privilege are not allowed. Operations that need this privilege like setting of some system variables, replication configuration and operations, binary log management, InnoDB key rotation etc. cannot be performed. These can only be achieved through RDS stored routines and RDS parameter groups.
4. There is no slow query log, instead need to use RDS provided mechanism for logging slow queries.
5. There is no multi-master replication in RDS. For Single Zone instances there is one master and multiple read replicas with asynchronous replication and no failover. This means that the replication is not fault proof and may cause data loss. Multi-AZ instances come with additional cost and have a primary instance and standby secondary instance at another zone. Though they use synchronous replication, the secondary database is in-active and couldn’t be used until the failover. Then the secondary becomes the primary.
6. Scale up or down is automatic but have downtime due to switchover time.
7. Replicating from AWS RDS to another cloud provider is not supported.
8. Does not support all available MySQL replication options like multi-master, multi-source, tree replication etc.
Cost Comparison 
Base Cost calculation for the m4.large setup (2 X vCPU, 8 GB Memory) as on September 8, 2017 for 1 month of usage for EC2, Single Zone RDS and Multi-AZ RDS
EC2 m4.large
$0.1 per Hour X 24 hours X 30 days = $72
Single Zone RDS db.m4.large
$0.175 per Hour X 24 hours X 30 days = $126
Multi-AZ RDS (Multi Zone) db.m4.large
$0.350 per Hour X 24 hours X 30 days = $252
We can see that base monthly cost (without adding backup storage and bandwidth) for RDS is almost double with EC2 for the same configuration. When it comes to Multi-AZ RDS, it becomes double than Single Zone RDS.
 
The Conclusion
AWS RDS is a fully managed solution and you have little to do in terms of frequent and daily DBA tasks. So you are free to focus on the application side and its connectivity and interface to the already running database. It is a good choice for critical business implementations where little customizations are needed and scalability is the main constraint.
AWS EC2 on the other side demands good effort in configuring and managing your DB deployment. But the flexibility is that, if you have reasonable expertise with DBA tasks, you can implement a tailored DB setup with your choice of configuration, optimization and other enhancements like specialized scripts for carrying out custom tasks etc. Along with the cost effectiveness, EC2 is suitable if you need multiple RDS instances or your MySQL server has a variety of plugins needed. So you can go for EC2, if you need to have the flexibility of setting up an adaptive DB implementation that scale and evolve with your applications.

Read more


Securing MySQL NDB Cluster

mysql-cluster-security
As discussed in a previous article about MySQL NDB Cluster, it is a database cluster setup based on the shared-nothing clustering and database sharding architecture. It is built on the NDB or NDBCLUSTER storage engine for MySQL DBMS. Also the 2nd article about MySQL NDB Cluster Installation outlined installation of the cluster through setting up its nodes – SQL node, data nodes and management node. This article discusses how to secure your MySQL NDB Cluster through the below perspectives:
1. Network security
2. Privilege system
3. Standard security procedures for the NDB cluster.
Detailed information is available at MySQL Development Documentation about NDB Cluster Security.
NDB Cluster Network Security
Cluster communication can be divided into client to cluster and cluster node to node categories. Node to node (between SQL, Data and Management nodes) communication are not recommended to be through SSL or using any other encryption protocols. This is because of the extra overhead due to network lag and latency that can affect the cluster performance. A network related security concern that needs to be addressed for the NDB Cluster is specified below.
For the config.ini file of an NDB Cluster, if the HostName parameter under [mysqld el_id='59ad103a65a56'] is blank or absent, or if there are empty slots under the [mysqld el_id='59ad103a65a93'] section, then the management node will not check about from which host, an SQL/API node or MySQL Server is attempting a connection to it. In such a case an SQl/API node can connect to the management node and access its data. To prevent this, specify a HostName parameter for all [mysqld el_id='59ad103a65acc'] and [api] sections in the config.ini file. Additionally, in such a scenario and management node client (ndb_mgm client) that knows the management node’s host address can connect to it and execute administrative commands to control the cluster.
The above issue can be resolved by many ways. One is by isolating cluster node network from external public network from which connections to the cluster is initiated. This can be done by placing a hardware or network firewall between the public network and SQL nodes.
Though costly, this setup is dependable and fault proof. It does not allow any access to data and management nodes from outside or any communication from the data and management nodes to the public network. Obviously the SQL nodes need to be secured against hacking.
Another option is to use software firewalls on all host nodes and configure them to block data packets that need not come or go outside the cluster.
The firewall on the data and management nodes hosts allow connection from SQL nodes only. Outside applications or clients can only communicate with cluster through SQL nodes. Each host machine will allow and deny traffic as specified below:
SQL node: Allows traffic from management and data nodes from within the cluster network that’s IP address is allowed, allows traffic from within the network through the assigned port and denies all other traffic.
Management and Data nodes: Allows traffic from management and data nodes, allows traffic from SQL nodes through the allowed IP address and denies all other traffic.
A combination of hardware and software firewall (network and host based) also can be used to secure the cluster.
This setup keeps the cluster behind the network firewall, but allows traffic between the management/data nodes and SQL nodes.
MySQL Privileges and NDB Cluster Security
Like any other MySQL storage engines, user privileges information for NDB storage engine tables are stored in mysql system database. Since MySQL grant tables use the MyISAM storage engine, those tables are not duplicated or shared across the SQL nodes in the NDB cluster. So changes in users and privileges made in one SQL node are not propagated to other SQL nodes by default. To enable this propagation, or distributed privileges across all SQL nodes, following steps need to be implemented:
On each of the SQL node MySQL server, import the script share/ ndb_dist_priv.sql available inside the MySQL installation directory.
shell> mysql options -uroot < share/ndb_dist_priv.sql
This script will create a number of stored routines on the SQL node. To verify this, use the below query to get the output as shown:
mysql> SELECT ROUTINE_NAME, ROUTINE_SCHEMA, ROUTINE_TYPE
    -> FROM INFORMATION_SCHEMA.ROUTINES
    -> WHERE ROUTINE_NAME LIKE 'mysql_cluster%'
    -> ORDER BY ROUTINE_TYPE;
+---------------------------------------------+----------------+--------------+
| ROUTINE_NAME | ROUTINE_SCHEMA | ROUTINE_TYPE |
+---------------------------------------------+----------------+--------------+
| mysql_cluster_privileges_are_distributed | mysql | FUNCTION |
| mysql_cluster_backup_privileges | mysql | PROCEDURE |
| mysql_cluster_move_grant_tables | mysql | PROCEDURE |
| mysql_cluster_move_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_local_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_privileges_from_local | mysql | PROCEDURE |
+---------------------------------------------+----------------+--------------+
7 rows in set (0.01 sec)
The stored procedure mysql_cluster_move_privileges creates 2 sets of copies of the mysql database, one as MyISAM and other as NDBCLUSTER storage engine. Then the script invokes mysql_cluster_move_grant_tables that contain ALTER TABLE statements to convert the mysql system tables to NDB storage engine.
Once this conversion to distributed privileges is made at each SQL nodes, any change in users or privileges at any one SQL node will be distributed to all SQL nodes in the cluster.
As stated above in Cluster Network Security, it is important to secure the users in any SQL Node, so that a compromised node is not used to access and corrupt data in other SQL nodes. So ensure that the config.ini does not contain an empty [mysqld] block or use the HostName parameter. Also a firewall based security system is necessary to segregate the cluster from public network to ensure proper security.
Standard Security Procedures for NDB Cluster
The security procedures applicable to a standard MySQL Server is also valid for a MySQL NDB Cluster.
1. Always run the MySQL server (mysqld) as the mysql system user, that is part of the mysql user group.
2. Ensure that the system’s MySQL data directory is owned by the mysql user.
3. Ensure that the my.cnf of each SQL node contains user=mysql in the [mysqld] section.
4. Never run mysqld as the system root user.
5. Delete the anonymous user account created during installation.
6. Always keep in mind that the different nodes inside the cluster have high mutual transparency and hence it is highly recommended to secure the cluster through tight user privilege management and network security.

Read more


How to install MySQL NDB Cluster on Linux

mysql-ndb-cluster
As discussed in this previous article about MySQL NDB Cluster overview, it is built on the NDBCLUSTER storage engine for MySQL and provides high availability and scalability. This article explains installation of MySQL NDB Cluster on Linux environment. The detailed documentation is available at the official MySQL Developer Documentation page.
To avoid a single point of failure, the example cluster is installed on separate hosts that represent each of the four nodes needed to run the cluster. So this is a four node, four host cluster that we are setting up. Instead of different hosts, virtual hosts also can be used for this setup. Each node/host is assigned separate IP addresses on the Ethernet Network as follows:
Management node (mgmd) - 192.168.0.10
SQL node (mysqld) – 192.168.0.20
Data node “A” (ndbd) – 192.168.0.30
Data node “B” (ndbd) – 192.168.0.40
 
NDB Cluster setup on multi-computer Ethernet network
 
Installing the NDB Cluster Binary Release
Install the NDB Cluster binary from the MySQL NDB Cluster download page. At the time of this article, NDB Cluster 7.5 binary archive is available as mysql-cluster-gpl-7.5.8-linux-i686-glibc23.tar.gz. Download it to /var/tmp.
SQL Node setup
A user named mysql on group mysql is needed on the system. Verify these are present after login as root user and if not, manually create them.
Unpack the archive to /usr/local and create a symbolic link named mysql to the mysql directory found inside the package.
shell> cd /var/tmp
shell> tar -C /usr/local -xzvf mysql-cluster-gpl-7.5.8-linux2.6.tar.gz
shell> ln -s /usr/local/mysql-cluster-gpl-7.5.8-linux2.6-i686 /usr/local/mysql
Now the symbolic link mysql points to the NDB Cluster Directory mysql-cluster-gpl-7.5.8-linux2.6-i686 inside /usr/local.
After changing to the mysql directory, initialize the MySQL installation (mysqld) by creating the data directory and populating the mysql system table.
With --initialize option a random password will be generated to the MySQL root user. If no password generation is needed during this stage, use --initialize-insecure option instead of --initialize and set password later.  The database base directory and data directory also can be specified during initialization:
shell> mysqld --initialize --user=mysql
--basedir=/usr/local/mysql
--datadir=/usr/local/mysql/data
Set permission for root and mysql users over the database directories and assign the base directory to the group mysql:
shell> chown -R root .
shell> chown -R mysql data
shell> chgrp -R mysql .
In order for mysqld to start during system startup, add the startup script to /etc/rd.d/init.d scripts, make it executable and add it as startup service.
shell> cp support-files/mysql.server /etc/rc.d/init.d/
shell> chmod +x /etc/rc.d/init.d/mysql.server
shell> chkconfig --add mysql.server
Thus mysql.server starts the SQL node. For each host machine containing SQL nodes, above steps are to be repeated.
Data node setup
Data nodes do not need the mysqld binary. To setup the data node, either the single-threaded NDB Cluster data node executable ndbd or multi-threaded executable ndbmtd is required. These binaries are also included in the NDB Cluster binary downloaded above. Download and unpack the binary in the data node host. The data node binaries will be inside the bin directory of the unpacked cluster binary directory. From the downloaded location (assuming as /var/tmp), move the data node binaries to the /usr/local/bin location for binaries. Perform the below operations as root user.
shell> cd /var/tmp
shell> tar -zxvf mysql-5.7.19-ndb-7.5.8-linux-i686-glibc23.tar.gz
shell> cd mysql-5.7.19-ndb-7.5.8-linux-i686-glibc23
shell> cp bin/ndbd /usr/local/bin/ndbd
shell> cp bin/ndbmtd /usr/local/bin/ndbmtd
Delete the directory that contains the unpacked NDB cluster archive.
Now make both the data node binaries (ndbd and ndbmtd) executable:
shell> cd /usr/local/bin
shell> chmod +x ndb*
Repeat the above steps for each data node host. Note that the MySQL data directory is /usr/local/mysql/data.
Management node setup
Management node does not need the mysqld binary. It needs only the NDB Cluster management server ndb_mgmd. The management client named ndb_mgm is highly desirable to use the management node. Both binaries are located in the bin folder inside the NDB Cluster binary downloaded (inside /var/tmp) as mentioned above. As root user, unpack the cluster binary, copy the management node and client binaries to /ust/local/bin and make them executable.
shell> cd /var/tmp
shell> tar -zxvf mysql-5.7.19-ndb-7.5.8-linux2.6-i686.tar.gz
shell> cd mysql-5.7.19-ndb-7.5.8-linux2.6-i686
shell> cp bin/ndb_mgm* /usr/local/bin
Delete the directory that contains the unpacked NDB cluster archive.
Now make both the management node and client binaries executable:
shell> cd /usr/local/bin
shell> chmod +x ndb_mgm*
NDB Cluster Initial Configuration
After downloading and installing the SQL, Data and Management nodes, now it is time to configure the NDB Cluster.
Data and SQL Nodes Configuration
Each of the data and SQL nodes need the configuration file my.cnf to specify the connection string to connect to the management node and another parameter that enables the NDBCLUSTER storage engine on this MySQL Server. Create the my.cnf file, if not already exists, or open with vi editor and add the configuration parameters.
shell> vi /etc/my.cnf
Below shown is the part of the my.cnf file that holds the NDB Cluster related parameters.
[mysqld]
# Options for mysqld process:
ndbcluster # run NDB storage engine
[mysql_cluster]
# Options for NDB Cluster processes:
ndb-connectstring=192.168.0.10 # location of management server
Note: the IP address of management node above is as per this example setup only and needs to be replaced by the actual IP address in your case.
In our example, these my.cnf settings are to be added to Data Node A, Data Node B and SQL Node hosts.
 
In our example, these my.cnf settings are to be added to Data Node A, Data Node B and SQL Node hosts.
Management Node Configuration
The Management Node needs a config.ini file that specifies the following information:
1. How many replicas (data nodes) to be managed in the cluster.
2. How much memory is to be allocated for data and indexes on each data node.
3. IP address or Domain Names (with DNS Resolution) of Data and SQL Nodes in the cluster.
4. Data directories of each Data Node to save database data.
Create the configuration directory for the Management Node and create config.ini file inside it, running as root user.
shell> mkdir /var/lib/mysql-cluster
shell> cd /var/lib/mysql-cluster
shell> vi config.ini
For our example setup, the config.ini of Master Node should read as shown below after writing the settings:
[ndbd default]
# Options affecting ndbd processes on all data nodes:
NoOfReplicas=2 # Number of replicas
DataMemory=80M # How much memory to allocate for data storage
IndexMemory=18M # How much memory to allocate for index storage
                  # For DataMemory and IndexMemory, we have used the
                  # default values. Since the "world" database takes up
                  # only about 500KB, this should be more than enough for
                  # this example NDB Cluster setup.
ServerPort=2202 # This the default value; however, you can use any
                  # port that is free for all the hosts in the cluster
                  # Note1: It is recommended that you do not specify the port
                  # number at all and simply allow the default value to be used
                  # instead
                  # Note2: The port was formerly specified using the PortNumber
                  # TCP parameter; this parameter is no longer available in NDB
                  # Cluster 7.5.
[ndb_mgmd]
# Management process options:
HostName=192.168.0.10 # Hostname or IP address of MGM node
DataDir=/var/lib/mysql-cluster # Directory for MGM node log files
[ndbd el_id='59a5076ed1de1']
# Options for data node "A":
                                # (one [ndbd el_id='59a5076ed1e2b'] section per data node)
HostName=192.168.0.30 # Hostname or IP address
NodeId=2 # Node ID for this data node
DataDir=/usr/local/mysql/data # Directory for this data node's data files
[ndbd el_id='59a5076ed1e68']
# Options for data node "B":
HostName=192.168.0.40 # Hostname or IP address
NodeId=3 # Node ID for this data node
DataDir=/usr/local/mysql/data # Directory for this data node's data files
[mysqld]
# SQL node options:
HostName=192.168.0.20 # Hostname or IP address
                                # (additional mysqld connections can be
                                # specified for this node for various
                                # purposes such as running ndb_restore)
Note: the IP addresses are for this example setup only and need to be replaced by your actual IP addresses or Domain names.
The above are the minimal settings for the my.cnf and config.ini files.
Initial Startup of the NDB Cluster
The nodes must be started in the following order:
1. Management Nodes
2. Data Nodes
3. SQL Nodes.
Start the management node process ndb_mgmd in the management node host. During the initial startup, the node must be supplied as parameter, the location of its configuration file, config.ini, through the option -f or --config-file:
shell> ndb_mgmd -f /var/lib/mysql-cluster/config.ini
Next start the data nodes, ndbd processes, at the data node hosts:
shell> ndbd
Now start the SQL node process, mysqld, on the SQL node host:
shell> cd mysql_installation_directory
shell> bin/mysqld_safe &
OR
shell> mysql.server start
In the example, mysql_installation_directory is /usr/local/mysql
If the setup is correct and everything has gone well, the cluster will be operational now. Use the ndb_mgm client to test the cluster. A running cluster should produce the below output for this example:
shell> ndb_mgm
-- NDB Cluster -- Management Client --
ndb_mgm> SHOW
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @192.168.0.30 (Version: 5.7.19-ndb-7.5.8, Nodegroup: 0, *)
id=3 @192.168.0.40 (Version: 5.7.19-ndb-7.5.8, Nodegroup: 0)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @192.168.0.10 (Version: 5.7.19-ndb-7.5.8)
[mysqld(API)] 1 node(s)
id=4 @192.168.0.20 (Version: 5.7.19-ndb-7.5.8)
Here the [mysqld(API) ] reference is for the SQL Node process mysqld and it is acting as the NDB Cluster API node to accept data access requests from clients.
Now you can create tables using the NDBCLUSTER storage engine:
CREATE TABLE tbl_name (col_name column_definitions) ENGINE=NDBCLUSTER;
Alter existing tables to NDBCLUSTER storage engine, if any:
ALTER TABLE tbl_name ENGINE=NDBCLUSTER;
To shutdown the cluster, issue below command on the management node host:
shell> ndb_mgm -e shutdown
To restart the cluster, issue below command on the management node host:
shell> ndb_mgmd -f /var/lib/mysql-cluster/config.ini
In data node hosts:
shell> ndbd
In SQL node hosts, from the mysql installation directory:
shell> bin/mysqld_safe &
OR
shell> mysql.server start

Read more


Quest Software released Toad Edge to Simplify MySQL Database Development

toad-edge
Open Source database management systems are increasing their market share dramatically over the last couple of years. Market Experts like Gartner are predicting a market shift from commercial, closed-source DBMSs to Open Source DBMSs in the coming years. According to Gartner, by 2018, more than 70 percent of in-house deployments will be on open source DMBSs and 50 percent of existing RDBMS deployments will have been converted or start the process.
To address this space, Toad Edge – a new commercial database management tool is being released by Quest Software – a global System and Security management software provider, to simplify MySQL development and administration. It is part of the Toad Product Family – a database management toolset by Quest that enabled data analysts, DBAs and database developers to manage relational and non-relational databases using SQL. Toad is said to contribute in data technology management by automating processes, reduce development time, lowers development cost and enhancing performance and scalability.
Toad Edge currently supports MySQL RDBMS development, management and administration by providing the following valuable services:
1. Minimize the MySQL Learning Curve by utilizing a lightweight, intuitive IDE and automation facilities of Toad.
2. Simplifies MySQL development and administration.
3. Ensures quality and maintainability of code.
4. Supports DevOps workflows using the Jenkins Continuous Integration Plugin.
 
Features of Toad Edge
 
1. Object Explorer and User Manager – allows efficient management of database objects and user privileges.
2. SQL Worksheet with Content Assist for writing complex SQL queries rapidly and with ease.
3. SQL Query Monitor that displays running queries to enable creation of export scripts and logs easily.
4. Quick Import/Export of data using a variety of formats – SQL, CSV, ZIP archives etc.
5. Schema Comparison and Synchronization – for databases and creates change scripts for source and destination databases and exports to a file or SQL Worksheet.
6. Database Snapshot allows taking current state database snapshot to compare and synchronize it with another active database (replicated or clone).
7. Advanced MySQL JSON Browser and Editor for editing JSON scripts using a data browser that visualizes scripts segments and help to find and fix errors.
8. Supports both Windows and Mac OS.
9. Expandability to include open source, on-site or cloud databases like MySQL, Amazon RDS MySQL etc.
10. Enterprise Technical Support along with self-help tools.

Read more


An Overview of MySQL NDB Cluster

mysql ndb cluster
The “Shared Nothing” Architecture and Database Sharding
It is a distributed computing architecture with interconnected, but independent computing nodes with their own memory and disk storage that are not shared. Compared to a centralized controller based distributed architecture, the SN architecture eliminates presence of a “single point of failure”. An SN architecture offers great scalability by adding as many nodes as wanted. The data is distributed or partitioned among the different nodes that run on separate machines. To respond to user’s queries, some kind of coordination protocol is being used. This is also called database sharding, where the data is horizontally partitioned. A database shard that contains a partition will be holding on a separate database server instance to spread load and to eliminate failures. In horizontal partitioning, table rows are separated, rather than as columns in vertical partitioning or normalization. Each data set will be part of the shard that is distributed across different database servers and also at different physical locations, if needed.
MySQL Cluster and NDB Cluster
A MySQL Cluster is based on the shared-nothing clustering and database sharding architecture. It is built on the NDB or NDBCLUSTER storage engine for MySQL DBMS. MySQL Cluster provides high availability and scalability, high throughput with low latency, etc. It’s architecture is a real-time distributed, multi-master, ACID compliant and transactional, with no single point of failure. It has adopted the horizontally scalable auto-sharding on in-expensive commodity hardware and serves for intense workload read/write operations through both SQL and NoSQL queries.
 
MySQL Cluster is built on top of NDB Cluster while NDB Cluster is an independent MySQL storage engine named NDBCLUSTER (In-Memory type) that stores tables of rows as in database sharding. It can be accessed using the C++ based NDB API by any application. For the NDB Cluster, MySQL Server Instance is an API process that accesses the NDB Cluster using its API, along with other API processes that can access it, using API formats like Memcached, Node.JS, Java, HTTP/REST, etc. MySQL Cluster adds additional capabilities to the NDB Cluster, like: SQL processing, application connectors like JDBC/ODBC, Table Joins Mechanism, User authentication and authorization and Asynchronous data replication.
The NDB Cluster eliminates the “single point of failure” using the shared-nothing system in which each node has its own memory and disk. Network shares and file systems are not used by NDB Cluster. In technology documentations, the term NDB refers to the part of the setup related to storage engine and MySQL NDB Cluster refers to the integration of MySQL Server(s) with NDB storage engine. In an NDB Cluster there are multiple hosts that run their own node processes. These processes include MySQL server to access NDB data, data nodes for storing data, management servers and other co-ordination processes.

 
NDB Cluster Components
Image courtesy of MySQL Cluster Documentation - https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-overview.html 
 
MySQL Cluster Manager
MySQL Cluster Manager is responsible for creating, administering, monitoring and managing and MySQL Cluster setup. It is used to automate the common management tasks like scaling through node allocation, upgrades, configuration management, backup and restore. It manages both the MySQL server nodes and data nodes.
SQL and NoSQL APIs
In the MySQL Cluster, tables are stored in data nodes, instead of the MySQL server nodes. This data can be accessed using SQL through the MySQL Server and also by using NoSQL API. For this MySQL Cluster Libraries are there that can be used by any application to access the cluster data. The client libraries are available in the following formats:
1. Memcached
2. Node.js / JavaScript
3. Java and JPA
4. HTTP/REST
5. NDB API (C++)
MySQL Cluster Nodes
There are 3 types of nodes or processes in a MySQL Cluster:
1. Data node (ndbd/ndbmtd process) – The database tables are automatically sharded across the data nodes. Data nodes are programmed to handle replication, load balancing and failover.
2. Cluster manager node (ndb_mgmd process) – Configures, administers, monitors and perform other management activities like start/restart, arbitration etc.
3. Application node or SQL node (mysqld process) – This MySQL Server process connects to all data nodes for data access, reads and writes. This node is optional and other applications can access the cluster data directly through the NDB API.
Each node will run on separate hosts, either physical or VM or cloud instance.
Working of the MySQL Cluster
MySQL Cluster uses synchronous replication to commit data transactions across the multi-masters in the cluster. This is in contrast to the asynchronous MySQL replication. To ensure availability, replicas of the data, usually two, are created or mirrored and synchronously replicated to the nodes. This ensures fast failover and data loss protection. Sometimes nodes/clusters are hosted in geographically distant data centers and asynchronous data replication is used. This is for ensuring high data security, disaster recovery and to avoid latency by serving from the data center closest to the user.
The individual nodes can independently started/stopped/restarted and can re-join to the system after a failover. For making configuration changes and upgrades that are common to all nodes, a mechanism called Rolling Restarts is used in which all nodes are restarted in turn. Backup and restore of the MySQL Cluster is managed by the NDB Cluster Management client and the ndb_restore program included in the CDB Cluster Software. NDB Cluster nodes use TCP/IP over standard 100 Mbps pr faster hardware for inter-node communication.
Logs and Check Points
The NDB Cluster classifies events based upon categories like startup, shutdown etc), priority and severity. There are two types of event logs used by the cluster:
1. Cluster log - to log events for the cluster as a whole.
2. Node log – separate logs maintained by each nodes.
Nodes save data to the disk and the transaction is committed by adding a date/time checkpoint to identify last modified point. There are two types of checkpoints by which consistency of cluster data is maintained.
1. Local Checkpoint (LCP) – checkpoint specific to each node’s data saving activity. These depend upon the data activity involved with that node as well as on the cluster.
2. Global Checkpoint (GCB) – It occurs every few seconds after synchronizing transactions across all nodes and re-do log (logs data changes) is flushed to disk.

Read more