Understanding Isolation Levels in Galera Cluster

transaction-isolation-levels

In an earlier article we have gone through the process of Certification Based Replication in Galera Cluster for MySQL and MariaDB. To recall the overview, Galera Cluster is a multi-master cluster setup that uses the Galera Replication Plugin for synchronous and certification based database replication. This article examines the concept of Isolation Levels in Galera Cluster, that is an inherent feature of the MySQL InnoDB database engine.

A transactional DBMS uses the concept of isolation in which concurrent transactions are isolated from each other for performance, reliability and consistency. The setting “isolation level” determines the degree of isolation needed when multiple transactions try to make changes and issue queries to the database. Transaction Isolation represents the “I” in the ACID model; a set of database design principles that ensure reliability, consistency and durability needed for any transactional DBMS.

The ACID Model of Database Design
ACID stands for atomicity, consistency, isolation and durability. Database transactions should be atomic units of changes that can be committed upon successful completion or can be rolled back upon abortion of transaction. The database state should be consistent at all times, during the progression of a transaction and after commits and rollbacks. Queries should be able to see either the updated values or old values, but not a mix of two, when related data is being updated by a transaction across multiple tables. Transactions should be isolated from one another, so that one transaction could not interfere with the work set of another transaction and also to avoid conflicts. This is achieved through table and row locking. Isolation is controlled by the isolation level parameter. The more intense and prolonged is the isolation level, so us the locking, thus causing delays and performance issues for the sake of transaction integrity. So a balance between performance and isolation is what needed to be set as the isolation level. The new state of a database resulted after a committed transaction should be safe from possible database vulnerabilities like data loss, crashes, hardware/software failure, power failure etc. This quality represents durability of a transaction and is typically achieved through writing data to disk storage or like the doublewrite buffer mechanism in InnoDB.

Transaction Isolation Levels
The MySQL InnoDB engine and hence Galera Cluster supports 4 different isolation levels:

READ-UNCOMMITED: In this level, transactions can see data changes made by other transactions even before they are committed. Also known as dirty read, this level does not provide real isolation at all.
READ-COMMITED: Here transactions can see only committed changes made by other transactions. Here non-repeatable reads take place, means the SELECT queries read the committed data prior to the query. So when a single transaction runs multiple SELECT queries, each one sees their own snapshot of committed data that are different due to the changes in data caused by other transactions.
REPEATABLE–READ: This is the default isolation level for MySQL InnoDB. Here snapshots of data are taken before the first SELECT query and all subsequent queries see the same snapshot causing repeated read of same data. So queries will not see changes committed by other transactions making reads repeatable.
SERIALIZABLE: In this level, all rows accessed by the transaction are locked and appends are blocked. Since the data snapshot available to SELECT queries are the same ones, this is similar to REPEATABLE-READ but read-only.

Isolation Levels in Galera Cluster
Isolation levels are set when a client request causes a transaction. In the cluster these can be applied on a single node level (Intra node level) and on global cluster level (Inter node level) for all nodes. Single node level isolation can extend to the maximum level allowed by InnoDB engine, while for global cluster level isolation depends on the replication protocol.

In the master-slave mode of Galera Cluster, all four levels of isolation can be used, but multi-master mode supports only the REPEATABLE-READ level.

Working of Transaction Isolation Levels in Galera Cluster
Galera Cluster is utilizing the transaction isolation level feature that is inherent to the MySQL InnoDB engine. So transactions running on the same node in the cluster have isolation levels set for its MySQL configuration. That means if we have configured MySQL with the default REPEATABLE-READ level, transactions at same nodes will be isolated at that level. For transactions issued on separate cluster nodes, Galera Cluster uses SNAPSHOT ISOLATION, also called “first committer wins” isolation. This is used to solve the “lost update” problem that generally affects the plain REPEATABLE-READ isolation.
The lost update problem with REPEATABLE-READ on transaction with separate nodes typically occurs as follows when concurrent transactions operate:

1. Transaction 1 and Transaction 2 operating on node 1 is getting the same data snapshot before executing the operation.
2. Transaction 1 uses SELECT and get the snapshot from node 1.
3. Transaction 2 uses SELECT and get the same snapshot from node 1.
4. Transaction 1 updates the data on node 1, locking node 1 table.
5. Before Transaction 1 committing the above transaction, Transaction 2 trying to update data on node 1 (locked for Transaction 1’s commit), based on the previous snapshot that is no longer relevant now, because in step 4 above, Transaction 1 has already changed the data, but pending commit. Since this is plain REPEATABLE-READ and Transaction 1’s commit is pending, this change is not visible to Transaction 2
6. Transaction 1 commits data update made in step 4 and lock is released.
7. Transaction 2 commits its data update since lock is released. But note that, in step 5, Transaction 2 totally missed the update made by Transaction 1.

With Galera Cluster’s SNAPSHOT ISOLATION, the first committer wins logic is used for ensuring data integrity when concurrent transactions operate on separate nodes in the cluster (same database). It is implemented as follows:

Transaction 1 operates on node 1 and Transaction 2 on node 2, same database.
1. Transaction 1 begins transaction on node 1.
2. Transaction 2 begins transaction on node 2.
3. Transaction 1 reads (SELECT) from node 1 and gets the initial snapshot.
4. Transaction 1 reads (SELECT) from node 2 and gets the same initial snapshot as Transaction 1.
5. Transaction 1 updates data on node 1.
6. Transaction 2 updates same data on node 2. Here node 2 does not have the lock since Transaction 1 is operating on node 1. However Transaction 2 did not see the updated data (step 5).
7. Transaction 1 commits the update.
8. Transaction 2 starts commit, but failed due to the deadlock caused due to conflicting WRITE because Transaction 1 is working on the database. So transaction is aborted and restarted.
9. Transaction 1 sees (SELECT) the updated snapshot on node 1.
10. Transaction 2 reads (SELECT) and sees the updated snapshot as seen by Transaction 1 on step 9. So the first committer has won.

If the above scenario changes to concurrent transactions working on same node with REPEATABLE_READ isolation level, “lost update” issue arises. To overcome this issue, Galera Cluster utilizes MySQL/InnoDB’s SELECT FOR UPDATE statement for reading for updates. This statement locks the table from the READ operation prior to UPDATE operation itself making the table read only so that conflicting WRITEs are prevented. This is depicted below:

1. Transaction 1 and begins operating same data on node 1.
2. Transaction 1 issues SELECT FOR UPDATE statement. Lock is applied to table.
3. Transaction 2 issues SELECT FOR UPDATE statement, but is blocked by the InnoDB lock on node 2 data.
4. Transaction 1 updates and commits transaction on node 1.
5. The waiting Transaction 2 sees the updated snapshot on node 1 once the lock is removed. It issues update and commit the transaction without missing the update from Transaction 1.
In this way Galera Cluster can be configured to use REPEATABLE-READ isolation to overcome lost update problem using proper read locks. As of now, Galera does not support SERIALIZABLE isolation level in multi-master setup. This is because, currently the replication protocol does not have mechanism to prevent the read locks from being overwritten by replication. SERIALIZABLE isolation works in controlled master-slave setup.

Conclusion
Thus Galera Cluster effectively utilizes MySQL/InnoDB capability of isolation levels, especially REPEATABLE-READ isolation level coupled with proper read lock configurations to achieve data integrity and consistency. So Galera replication maintains and upheld the ACID compliance that is an essential requirement for any modern DBMS.

Read more


Monitoring Your MySQL Database Using mysqladmin

database-monitoring

As with any other dynamic running process, the MySQL database also needs sustained management and administration of which monitoring is an important part. Database monitoring allows you to find out mis-configurations, under or over resource allocation, inefficient schema design, security issues, poor performance that include slowness, connection aborts, process hanging, partial resultsets etc. It is giving you an opportunity to fine tune your platform server, database server, its configuration and other performance factors. This article provides you an insight into monitoring your MySQL database through tracking and analyzing important database metrics, variables and logs using mysqladmin.

Is your MySQL server running?
Obviously availability or up-time is the primary concern of any production deployment. But at the same time it does not guarantee that the running server will accept connection or whether it is running out of space. Still, it is worth to know about the server status. There are many methods to detect a running MySQL server:
1. Ping the server
mysqladmin –u root –p ping
Returns 0 if server is running and 1 if not. Even for access denial cases, 0 will be returned since server is still running. A remote server can be pinged by specifying the host name (IP or Domain if DNS resolution is available).
Eg: mysqladmin -h db.myserver.com –u root –p ping
or
Eg: mysqladmin -h 100.100.100.100 –u root –p ping
A running server will return:
mysqld is alive
2. Get the database server status
Eg: mysqladmin -h 100.100.100.100 -u root -p status
or
service mysqld status
3. Get server version
mysqladmin -u root -p version
The results will include version, uptime, threads, slow queries etc.

Get the MySQL Server Metrics and Optimize Database
The important metrics (server status variables) that reflect the health and status of a MySQL server are:

Aborted_clients: number of connections aborted because client did not closed the connection properly. Possible reasons are: client did not close connection using mysql_close() before exiting, client was sleeping for a time interval (seconds) more than that is set in the variables wait_timeout or interactive_timeout without notifying server and client program ended abruptly during a data transfer. After aborting a connection, server will log the details in error log:
[Note] Aborted connection 854 to db: 'customers' user: 'john'
Aborted_connects: number of failed connection attempts. Possible reasons could be: a client without required privileges tried to access a database, client used wrong password, the connection packet was missing some required information, and there was a delay more than the connect_timeout seconds to receive a packet from the client. Check error log to find out the affected user and grant needed privileges. It can also be due to an attack like spamming, DOS or brute force.
A common reason for above scenarios can be: the max_allowed_packet variable that contains the maximum memory size in bytes is too small or queries needed more than the allocated memory for mysqld. You can increase this memory through any of the following methods:

[mysqld]
max_allowed_packet=32M
OR
shell> mysqld --max_allowed_packet=32M
OR
shell> mysql --max_allowed_packet=32M

Other reasons could be related to network protocols, faulty hardware, OS thread level issues etc.

Connections: number of connection attempts (both successful and failed)
Uptime: number of seconds server has been up. A smaller value my point to intermediate shutdowns or server crashes.
Threads: number of active threads (client connections). This should reflect an average concurrent amount of connections. Otherwise the Aborted_clients and Aborted_connections should be examined for high values.
Threads_connected: number of threads created to connect clients at the current moment. A zero and too high values can be issues. Too high can be due to either smaller thread_cache_size value or too many concurrent client connections happening. This need to be investigated against database attacks especially DOS.
Threads_created: Threads created to handle connections. An optimal setup ensures threads to be reused through thread cache. A too high value for this variable can be due to smaller thread_cache_size value and it should be increased.
Questions: number of queries received from clients since last server start.
Slow queries: number of queries that have taken more than long_query_time seconds. Identify the slow queries and optimize them for improving performance. Check whether the database uses proper indexing and queries are utilizing this.
Open tables: number of tables that are currently open. A low value for this and high value set for table_cache means memory is free and table_cache size can be reduced and vice versa.
Bytes_received: bytes of data received from clients.
Bytes_sent: bytes sent to clients.
Max_connections: maximum permitted number of simultaneous client connections.
Max_used_connections: maximum number of concurrent connections occurred at any time. If max_used_connections/max_connections is high like more than 75%, there are chances for server to run out of connection slots and denies further connections.
Max_execution_time_exceeded: number of SELECT statements that caused a time out in execution. Check the slow query log and error log to find and optimize slow queries, if this value is high.

Check above 2 values for abnormal hike in traffic.

The server status variables can be displayed by issuing the SHOW STATUS command:
mysql> SHOW GLOBAL STATUS;
+--------------------------+------------+
| Variable_name | Value |
+--------------------------+------------+
| Aborted_clients | 0 |
| Aborted_connects | 0 |
| Bytes_received | 155372598 |
| Bytes_sent | 1176560426 |
| Connections | 30023 |
| Created_tmp_disk_tables | 0 |
| Created_tmp_tables | 8340 |
| Created_tmp_files | 60 |
...
| Open_tables | 1 |
| Open_files | 2 |
| Open_streams | 0 |
| Opened_tables | 44600 |
| Questions | 2026873 |
...
| Table_locks_immediate | 1920382 |
| Table_locks_waited | 0 |
| Threads_cached | 0 |
| Threads_created | 30022 |
| Threads_connected | 1 |
| Threads_running | 1 |
| Uptime | 80380 |
+--------------------------+------------+
OR
shell> mysqladmin -u root -p extended-status processlist
+----+-------+-----------+----+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------+-----------+----+---------+------+-------+------------------+
| 51 | monty | localhost | | Query | 0 | | show processlist |
+----+-------+-----------+----+---------+------+-------+------------------+
Uptime: 1473624 Threads: 1 Questions: 39487
Slow queries: 0 Opens: 541 Flush tables: 1
Open tables: 19 Queries per second avg: 0.0268

To see values of specific variables, use LIKE clause with or without patterns:
$ mysql -u root -p
mysql> SHOW GLOBAL STATUS LIKE 'aborted_connects';
mysql> SHOW STATUS LIKE 'Key%';
+--------------------+----------+
| Variable_name | Value |
+--------------------+----------+
| Key_blocks_used | 14955 |
| Key_read_requests | 96854827 |
| Key_reads | 162040 |
| Key_write_requests | 7589728 |
| Key_writes | 3813196 |

Check Logs Frequently
The slow query log and error log needs to be checked regularly to detect issues. An effective method is to monitor and investigate both logs and server status variables to identify issues and space for further optimization.

InnoDB Table Monitoring
The command SHOW ENGINE INNODB STATUS can be used to detect aborted or rolled back transactions. This indicates presence of deadlocks and the InnoDB configuration as well as client application’s deadlock handling mechanism should be enhanced. Below are some server status variables for InnoDB that need to be monitored:

Innodb_row_lock_waits: number of times operations on InnoDB tables had to wait for a row lock.
Innodb_buffer_pool_wait_free: number of times InnoDB had waited for memory pages to be flushed. This happens when there are no free pages available for writing and InnoDB flushes some pages first and waits for this operation to be finished. A too high value indicates that innodb_buffer_pool_size is too small and needs to adjust according to the work load.

Summary
The mysqladmin tool along with other MySQL administrative tools is sufficient enough for properly monitoring the MySQL database. Regular monitoring, investigation of any issues and performance degradations and taking corrective measures on a daily basis will make your database robust and well optimized through an evolution process. This is the desired and feasible way of adapting your MySQL database to fulfill its responsibilities like a faultless engine working harmoniously.

Read more


Working of Certification based Replication in Galera Cluster

galera-cluster
In the earlier articles, we have covered the basics of Galera Cluster for MySQL and MariaDB and another article about MariaDB Galera Cluster in particular. To recall the overview of Galera Cluster – It is a synchronous multi-master cluster that uses the InnoDB storage engine (XtraDB also for MariaDB). It is actually the Galera replication plugin that extends the wsrep API of the underlying DBMS, MySQL or MariaDB. Galera Cluster uses Certification based synchronous replication in the multi master server setup. In this article we will look into the technical aspects of Galera Cluster Certification based replication functionality.
Certification based Database Replication
Database replication allows setting up of synchronized master-slave or master-master database clusters. Since data is synchronized across the individual databases also called nodes, this setup is fault proof with failover since a failed node is replaced by other nodes until the former’s recovery, thus ensuring high availability. The type of database replication can be either Synchronous or Asynchronous. In synchronous replication the transactions are committed at all nodes concurrently and in asynchronous replication, target nodes receive new transaction set from source node after a negligible lag from original transaction. Synchronous replication ensures that all nodes are in same state (transactions being committed at same time) and thus there are high availability 24/7 with consistent replicas and no data loss during individual node crashes. Another advantage is improved performance because clients can always perform READ/WRITEs at any nodes irrespective of where the transaction originated.
The disadvantage with Synchronous replication is the increased lock time and possible conflicts arose due to parallel transaction commits at all nodes. This can affect performance when number of nodes and transactions increases. Asynchronous transaction solves this problem by allowing each node to independently commit transactions that they have received from the source node. This prevents concurrent locks and ensures more availability with large number of nodes and transactions. The performance issues with synchronous replication are solved by Certification based synchronous replication in Galera Cluster. This replication is based on:
1. Group communication model that define a pattern for inter-node communication.
2. Write-sets that groups or bundles a set of WRITES as a single transaction message to be applied to individual nodes.
3. Database state machine process that treats READ/WRITE transactions on one database as its state at a given time and then broadcasts the transaction to other nodes to change their state to that of the source node’s state.
4. Transaction reordering to reorder transactions that are either not certified or not committed due to a node failure. Transactions are re-ordered so that they are not lost.
The Certification process uses a global coordinated certification scheme in which transaction from the source node is broadcasted to other nodes using global total ordering of concurrent transactions to achieve global consistency. Certification based replication works with databases having the following features:
1. Transactional database with COMMIT and ROLLBACK capability.
2. Atomic changes capable database that accepts entire transaction to be applied for COMMIT else no COMMIT at all.
3. Global ordering capable database that can undergo global ordering of replication events or transactions.
Working of Certification based Replication
When a transaction (series of changes) occurs at a database node and it issues a COMMIT, before the commit actually takes place, all the changes or WRITES/UPDATES/ALTERS occurred at the database node along with the modified rows’ PRIMARY KEYS are collected as a write-set. The source node then broadcasts this write-set to all other nodes. Each node in the cluster, including the originating node then performs a deterministic certification test on the write-set using the PRIMARY KEYS in the write-set and the actual PRIMARY KEY values in the nodes. This test is to determine the key constraint integrity of the write-set. If the test fails, the originating node drops the write-set and the cluster rolls back the original transaction. If the certification test succeeds, the transaction commits and write-sets are applied to all nodes in the cluster, thus making the replication.
In Galera Cluster, each transaction is assigned a global ordinal sequence number. During the deterministic certification test on the write-set, the cluster checks the current transaction with the last successful transaction. If any transactions would have occurred in between these 2 globally ordered transactions, primary key conflicts will occur and the test fails. Upon a successful deterministic certification check, all replicas apply the transaction in the same order. Thus all nodes reach on a consensus about the outcome of the transaction and replication happens. On this, the originating node notifies the client about the successful transaction commit.
Galera Cluster Replication Plugin and wsrep API
The core of Galera replication is the Galera Replication Plugin that implements the write-set replication functionality. The DBMS (MySQL or MariaDB) that is used to setup the Galera Cluster uses the Galera Replication Plugin that implements an API called the Write Set Replication API or wsrep API. It is implemented as a replication plugin interface in the Galera Replication Plugin. Thus the Galera Replication Plugin is the replication or wsrep provider. It consists of 3 components:
1. Certification layer that prepares the write-sets and performs the certification tests.
2. Replication layer that manages the replication process and global ordering.
3. Group communication framework provides plugin architecture for the group communication systems in the Cluster.
 
The wsrep API considers the content including data and schema of a database as a state. When a client performs WRITE/UPDATE/ALTERs on the database, it is considered as a transaction which is nothing but the changes happened to the database represented as a series of atomic changes. To keep a consistent state across all nodes, the wsrep API uses a Global Transaction ID (GTID) for each transaction write-set. The GTID allows to identify state changes and to compare two states, the current and a previous one. In the Galera Cluster all nodes need to have the same state. The synchronization and replication needed to maintain consistency in state is performed using the GTID serial order. The GTID consists of 2 parts: a State UUID that identifies the state and sequence of changes happened, and an Ordinal Sequence Number (seqno) used to denote the position of the change in the sequence.
Eg: 45eec521-2f34-11e0-0800-2a36050b826b:94530586304
The Galera Cluster Replication Plugin uses the wsrep hooks defined in the DBMS to call the wsrep API. A function called dlopen() acts as a bridge between the wsrep provider (the Galera Replication Plugin and wsrep API) and the wsrep hooks in the DBMS. The state change or atomic changes grouped as a transaction write-set with a GTID is the key for implementing replication. Below are the steps performed to implement replication using the wsrep API.
1. At any node a change occurs, causing a state change.
2. The database invokes the replication provider using the wsrep hooks to create the write-set from the changes.
3. dlopen() connects the wsrep provider functions with the wsrep hooks in the database.
4. The Galera Replication Plugin performs the write-set certification and replication in the cluster.

Read more


Is Cloud Database Good for Your Business?

You have heard lot about Cloud now-a-days – cloud computing, cloud storage, cloud database, etc. but have you thought about how cloud technology can influence your business, the positive ones and negative sides? Also it is worth discussing how you can utilize cloud for the development of your business, overcoming any disadvantages associated with it. This article is trying to define what cloud database implies for your business and its advantages and disadvantages. Also it is trying to provoke discussions about the suitability of cloud database for you.
What is a Cloud Database?
A cloud database is a DBMS system deployed and running on a cloud computing platform. The underlying infrastructure, hardware, software installation, setup etc are provided by the cloud provider. Customers are provided with access to the cloud database they have purchased or subscribed through a service model that allows management of the database, requesting more computing, memory and network resources. This service model is called Database As A Service (DBaaS). The requested or demanded resources are immediately made available to the customer’s database, also called DB Instance. So cloud databases provide high availability, scalability, rapid provisioning and automated DBA.
Advantages of Cloud Databases
A cloud database offers the much needed advantages to your business in terms of cost, time, effort, expertise, planning, etc. The cloud vendor is a business that has costly implementations of distributed and distributable infrastructure, platform and resources that are used to provide you sophisticated database service. You along with other organizations consume his broad pool of resources as per demand and these results in advantages for you.
1. Reduction in IT expenditure: Businesses can avoid the initial investment in setting up the infrastructure and then the operational and administrative costs associated with maintaining an in-house database and networking system. What are not needed when you use a cloud database are – the hardware and platform software including OS, server rooms including furnishing and air-conditioning, salaries and other perks needed to sustain a proactive and smart DBA team, networking costs and periodical costs needed for hardware/software/infrastructure upgrades. What you need to pay is only the consumption or utilization costs for resources that you need at the moment, scalable to foreseeable future, like CPUs, memory, storage, bandwidth and the DBMS engine optionally.
2. Go with the latest technology: Technologies change with new innovations and releases. Database industry is highly competitive, especially due to the ever growing and unavoidable demand of databases for any organization. Whether it is the enterprise giants or an open source global community that is behind the database product, the product is evolving continuously with minor releases and patches to major version releases and even new product families that incorporate the latest in technology in terms of computing power, speed, scalability, performance and reliability. Contrary to an in-house database deployment where you need to spend money, time and effort to incorporate a new database technology, the cloud provider is doing this for you and you need to worry only about how to enhance your application to best utilize the capabilities of the upgraded or new version of the database.
3. Scalability: For an in-house database, you need money, time and effort to scale up the database beyond certain thresholds. This need for scalability can occur not only with a business growth, but when you accommodate additional operative or service projects in your organization or during a peak season when you experience a spike in traffic and incoming requests. With a cloud database provider and his distributed deployments, rich with pre-configured or raw resources, your only task is to demand provisioning of additional resources needed for your requirement. The rapid provisioning technologies associated with any cloud database will immediately bring them at your disposal for utilization.
4. Centralized management and reliability: Since the core technology infrastructure and platform is managed by the cloud vendor and automated programs perform the DBA tasks for your database(s), cloud offers significant benefits in terms of effort and expertise demanding maintenance and operational requirements. Your data is accessible for your applications through the distributed or replicated cloud database instances. The failover mechanisms ensure that there is no single point of failure for your cloud database and you have replicas or stand-by databases synchronized with the primary database to take charge in case of failures. This ensures reliability of operations and services for your organization.
Disadvantages with Cloud Databases
However there are some disadvantages that you need to know and consider when going for a cloud database. These may not be significant enough for you to hold off from choosing the cloud, but you can use this information to plan your consumptions, configurations and above all adapt policies that make use of the advantages of cloud more than letting you affected with the disadvantages.
1. Security: A cloud system, like any other network information system, is vulnerable to threats. While there are unlikely any internal threats or misconfigurations that are common with in-house deployments, there can be hackers and intruders who try to intrude into the system through application vulnerabilities or any other system loopholes. Moreover, cloud system hosts databases and other resources in shared or virtualized server environments in case of shared products. In such cases, your database may be in the same machine or network with other businesses. An improper security configuration or application vulnerability of another customer can compromise the server in which your database is hosted.
However this is less likely or probable due to the logical and platform based isolation the cloud vendor is providing for his customers. Latest containerization and virtualization technologies provide separate secure environments for each customer. In a business view point, you can mitigate or nullify this risk by securing your application, database configurations and setups that you can control, like strong passwords and hashes, using secure communication options instead of cheaper raw format data transmission etc. Also using a Private or Hybrid Cloud for sensitive and confidential data and a Public Cloud for operational and project data is a good decision to overcome these security concerns, especially for medium to big businesses.
2. Vendor Lock-In and limited control: Various cloud technologies differ in configuration at various levels of implementation like OS level, containerization, virtualization, replication, synchronization, optimization etc. So in a case when you want to migrate your database to another vendor, there can be compatibility or migration issues surface up due to these differences. This typically occurs when you attempt to migrate the entire database filespace, including the physical files, schema, configuration, implementation setup like replication parameters etc. An experienced DBA can easily migrate your database through a step by step or incremental approach with his planning and skills. Also, since cloud provider is providing you a virtual environment for your databases, access to the underlying OS, shell access or other firmware level routines are not provided to customers. Though these are not needed in most cases since you are getting the most optimized configuration, sometimes this can prevent vertical configuration scalability. Dedicated or Elastic Cloud based database instances are exceptions and grant you complete access.
3. Need of planning to select optimal packages : Though not a disadvantage in the first glimpse, this is a crucial factor to be considered and worked out before purchasing your first cloud database instance. Cloud database or cloud computing vendors provide a wide variety of packages, categorized and organized in hierarchies of resource availability and performance. The cost scale ranges from the simple package – a couple of GB memory, a dual core or quad core Virtual CPU and minimum bandwidth to enterprise level packages that are optimized for various options like General, Storage, Computing, Speed etc. While the temptation is always to save cost in the beginning, before going too much into the operational realm, you need to realize the optimum resource requirements and configuration your business needs for operation and scaling. Such a package must be selected to avoid unnecessary downtimes and DOS during crucial periods. Another factor is the selection between Single Availability Zone versus Multi Availability Zone (Multi-AZ) packages. While single zone packages are better for small businesses, most SMBs and higher businesses need Multi-AZ packages that have synchronized and standby database instances deployed in multiple geo-locations to ensure high availability and failover 24/7. Obviously these packages come at higher costs, but again significantly less than a privately owned multi location cloud.
The Verdict
As clearer from above points, cloud database’s advantages outweigh its disadvantages or have provisions and methods to make your database run on a carefully planned and selected cloud package. The savings in cost, time and money you will make along with the opportunity to dedicate your focus to development and growth rather than infrastructure and implementation, will benefit your organization in the near future itself. It is a matter of blending the cloud culture with that of your IT needs, where the difference lies whether cloud is suitable for your business or not.

Read more


Choosing between AWS RDS MySQL and Self Managed MySQL EC2

aws-rds-ec2
The Amazon Web Service (AWS) products AWS EC2 and AWS RDS comes with sophisticated technology features and capabilities that serve the needs for Web based computing and storage efficiently. When coming to the decision making process for selecting between AWS RDS and the self managed AWS ECS or any independent MySQL DBaaS for your MySQL database needs, there are some prominent factors to consider. These factors are crucial and affect the operational and economic aspects of your database deployment. This article provides a comprehensive overview of both products to help you understand the distinction between them for making a suitable and cost-effective decision.
AWS EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides scalable cloud computing facilities. You can select either a pre-configured instance optimized for your application needs or first deploy an instance and configure it through provisioning and scale up/down facilities. The EC2 packages are organized according to the level of optimization and resource allocation done in terms of CPUs, Memory, Storage, etc.
Its advantages are:
1. Pay per provisioning.
2. Quick scale up/down capabilities.
3. Self managed with root access and allows configuration of CPU, memory, storage etc.
4. Rapid commissioning of multiple server instances.
5. Choice to select instance types and OS (Linux and Windows).
6. Availability of pre-configured instances.
7. Flexibility to choose instance location.
8. Elastic IP addresses.
AWS RDS
Amazon Relational Database Service (RDS) is a managed web service that assists in setting up a relational database in cloud. Its advantages are:
1. Resource scaling.
2. Automated database administration for hardware provisioning, database setup, patching, upgrades, backups, etc.
3. Automatic failure detection and recovery.
4. Flexibility to schedule and perform backups and restore operations.
6. Read scaling through Read Replicas.
7. Availability of AWS IWM (Identity and Access Management) and network isolation through Amazon Virtual Private Cloud (VPC) to enhance security.
8. On-demand pricing.
9. High availability with provisioning of Multi-AZ DB instance deployments with synchronized standby replicas on different Availability Zones for automatic failover and recovery.
Amazon RDS is implemented by using DB instances – an isolated database environment in the cloud and can contain multiple databases. A DB instance can be created and modified by using the AWS CLI, the AWS API or the AWS management console. Currently the Amazon RDS supports the DB engines: Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle and SQL Server.
Amazon MySQL RDS
It is the MySQL based DB instance deployments in the cloud. As like other types of DB engines, MySQL RDS also offers advantages like high availability, automated DB administration, backup and recovery, provisioned storage and IOPS with scaling, read replicas on different AWS regions for load balancing and automatic failover, isolation and security, Multi-AZ instances with synchronous standby replicas on separate Availability Zones for failover and backups, etc.
Limitations with MySQL RDS
1. Storage is charged per provisioning, not according to consumption. Once storage is allocated, it cannot be returned without rebuilding the instance from scratch or a logical backup restore.
2. Access to the Operating System and Shell Access is denied. So tasks that need to be run from an OS access cannot not be done.
3. MySQL user accounts with SUPER privilege are not allowed. Operations that need this privilege like setting of some system variables, replication configuration and operations, binary log management, InnoDB key rotation etc. cannot be performed. These can only be achieved through RDS stored routines and RDS parameter groups.
4. There is no slow query log, instead need to use RDS provided mechanism for logging slow queries.
5. There is no multi-master replication in RDS. For Single Zone instances there is one master and multiple read replicas with asynchronous replication and no failover. This means that the replication is not fault proof and may cause data loss. Multi-AZ instances come with additional cost and have a primary instance and standby secondary instance at another zone. Though they use synchronous replication, the secondary database is in-active and couldn’t be used until the failover. Then the secondary becomes the primary.
6. Scale up or down is automatic but have downtime due to switchover time.
7. Replicating from AWS RDS to another cloud provider is not supported.
8. Does not support all available MySQL replication options like multi-master, multi-source, tree replication etc.
Cost Comparison 
Base Cost calculation for the m4.large setup (2 X vCPU, 8 GB Memory) as on September 8, 2017 for 1 month of usage for EC2, Single Zone RDS and Multi-AZ RDS
EC2 m4.large
$0.1 per Hour X 24 hours X 30 days = $72
Single Zone RDS db.m4.large
$0.175 per Hour X 24 hours X 30 days = $126
Multi-AZ RDS (Multi Zone) db.m4.large
$0.350 per Hour X 24 hours X 30 days = $252
We can see that base monthly cost (without adding backup storage and bandwidth) for RDS is almost double with EC2 for the same configuration. When it comes to Multi-AZ RDS, it becomes double than Single Zone RDS.
 
The Conclusion
AWS RDS is a fully managed solution and you have little to do in terms of frequent and daily DBA tasks. So you are free to focus on the application side and its connectivity and interface to the already running database. It is a good choice for critical business implementations where little customizations are needed and scalability is the main constraint.
AWS EC2 on the other side demands good effort in configuring and managing your DB deployment. But the flexibility is that, if you have reasonable expertise with DBA tasks, you can implement a tailored DB setup with your choice of configuration, optimization and other enhancements like specialized scripts for carrying out custom tasks etc. Along with the cost effectiveness, EC2 is suitable if you need multiple RDS instances or your MySQL server has a variety of plugins needed. So you can go for EC2, if you need to have the flexibility of setting up an adaptive DB implementation that scale and evolve with your applications.

Read more


MariaDB Galera Cluster

mariadb-galera-cluster
We have covered the basics about Galera Cluster in a previous article – Galera Cluster for MySQL and MariaDB. This article further goes into the Galera technology and discusses topics like:
1. Who provides Galera Cluster?
2. What is MariaDB Galera Cluster?
3. An overview of MariaDB Galera Cluster Setup.
Galera Cluster is a synchronous multi-master cluster that uses the InnoDB storage engine. In MariaDB, it supports the XtraDB and InnoDB storage engines. It is actually the Galera replication plugin that extends the wsrep API of the underlying DBMS. Galera Cluster uses Certification based synchronous replication in the multi master server setup. This replication eliminates many issues faced by asynchronous replication based clusters, like write conflicts, replication lag between cluster nodes, slaves going out of sync with masters, single point of failure, etc. Certification based replication utilizes group communication and transaction ordering techniques. Changes at one node are grouped as a write-set upon COMMIT and this write-set is broadcasted to other nodes. Each node including the source node performs a certification test to decide whether the write-set can be applied or not. Only if the certification test passes, the write-set is applied as a transaction and COMMITTED on the nodes; otherwise a roll-back is performed, thus discarding the changes. The certification test is based on Global Ordering of transactions in which each transaction is assigned a global transaction id. During COMMIT time the last transaction is checked with previous transactions to detect any primary key conflicts and if conflict detected, certification test is failed. If test passes, all nodes receive transactions in the same global order.
Galera Cluster Plugin is an open source patch for MySQL developed by Codership. It is available at Codership as 2 software packages – the Galera replication library and original MySQL version extended with the Write Set Replication (wsrep) API implementation (mysql-wsrep). Including this Codership product, there are 3 Galera variants:
1. MySQL Galera Cluster by Codership
2. MariaDB Galera Cluster by MariaDB
3. Percona XtraDB Cluster for MySQL by Percona that integrates the Percona Server and Percona XtraBackup with Codership Galera library.
The MariaDB Galera Cluster
Starting with MariaDB 10.1, the wsrep API for Galera Cluster is included in the original MariaDB package. MariaDB Galera Cluster uses the Codership Galera Library and mysql-wsrep to implement its Cluster. MariaDB is available for the major Linux distributions like openSUSE, Arch Linux, Fedora, CentOS, RedHat, Mint, Ubuntu, Debian, etc. As in any Galera Cluster implementation, MariaDB Galera Cluster is also implemented using the below components:
1. DBMS – here it is MariaDB Server.
2. wsrep API – defines and implements the interface and responsibilities for the DBMS Server and replication provider.
3. wsrep hooks – the wsrep integration of the wsrep API, inside the DBMS engine.
4. Galera plugin – implements the wsrep API for Galera Library to provide the write-set replication functionality.
5. Certification layer – prepare write-sets and performs certification.
6. Replication layer – manages the entire replication and provides total ordering capabilities.
7. GCS framework – provides plugin architecture for various group communication plugins for the Galera Cluster.
Features of the MariaDB Galera Cluster
As like any other Galera Clusters, MariaDB Galera Cluster also has the following features:
1. Certification based synchronous replication.
2. Multi-master topology.
3. Clients can read/write to any node.
4. Cluster membership control and dropping of failed nodes from the cluster with re-join upon recovery.
5. Automatic joining of nodes to running cluster.
6. Row level parallel replication.
7. Direct client connections through MariaDB interface.
Downloading and Installing MariaDB Galera Cluster
It can be downloaded and installed using Yum or Apt. The Galera package is included by default in the MariaDB packages. On Ubuntu, following will install the MariaDB Server with Galera Plugin:
sudo apt-get update
sudo apt-get install mariadb-server
Apart from the memory needed for MariaDB Server, additional memory is needed for the certification index and uncommitted writesets. There is a process called Writeset caching that takes place when a node could not process an incoming writeset. This typically occurs when the node is undergoing a state change operation like WRITE or dumping through utilities like mysqldump. Correspondingly the source node also could not apply the writeset at the target node in this scenario. Then the pending writeset will be cached in memory for a catch-up phase. The reading of writeset from in-memory cache and committing it should happen normally but if the system runs out of memory, the state transfer of the nodes will fail or the cluster will block waiting for the state transfer to end. The following Galera parameters are to be configured for writeset caching:
gcs.recv_q_hard_limit : it is the maximum allowed size of the recv queue (committed writesets to be replicated).It should normally has a value of RAM + SWAP. If this limit is exceeded, the Galera Cluster will abort the server.
gcs.recv_q_soft_limit : It is the fraction (in decimal form) of gcs.recv_q_hard_limit after which replication rate will be throttled to avoid a memory shortage and server abort. Its default value is 0.25.
gcs.max_throttle : It is a fraction (in decimal form) that specifies how much to throttle replication rate during state transfer so that running out of memory can be avoided. Its default value is 0.25.
The following server requirements are applicable to a MariaDB Galera Cluster:
1. General log and slow query log need to be file type.
2. For versions before 5.5.40-galera and 10.0.14-galera, query cache has to be disabled.
Running MariaDB Galera Cluster
A new cluster needs to be bootstrapped in the server indicating to the server that there is no existing cluster running on the server. For this invoke mysqld with the option --wsrep-new-cluster.
$ mysqld --wsrep-new-cluster
For systems using SysV init scripts:
$ service mysql bootstrap
To add a new node to a running cluster:
$ mysqld --wsrep_cluster_address=gcomm://192.168.0.1
The IP address can be replaced by DNS name also. Once added, the new node need to connect to any existing node to automatically retrieve the cluster map and reconnect to the rest of the nodes.
MariaDB Galera Cluster Settings
Following mandatory settings are applicable to the cluster. They are to be set either in the MariaDB Server configuration file or as command line option (given in brackets).
wsrep_provider : file location of the wsrep library. (--wsrep-provider=value)
wsrep_cluster_address : IP/DNS address of cluster nodes to connect to when starting up. Eg: gcomm://192.168.0.1:1234?gmcast.listen_addr=0.0.0.0:2345. It is also possible to specify addresses of all nodes - gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port> (--wsrep-cluster-address=value)
binlog_format : specifies the binary log format that decides whether the replication is row-based, statement-based or mixed. Valid values are ROW, STATEMENT or MIXED. (--binlog-format=format)
default_storage_engine : to be enabled at server startup for server to start. Default value is InnoDB. (--default-storage-engine=name)
innodb_autoinc_lock_mode : A numeric value from 0 to 2, that specifies the locking mode to be used for generating auto increment values in the tables. Default values – 2 for >=MariaDB 10.2.4 and 1 for MariaDB 10.2.3. (--innodb-autoinc-lock-mode=#)
0 – Traditional lock mode that holds a table-level lock for all INSERTS until the auto increment value is generated.
1 – Consecutive lock mode that holds a table-level lock for all bulk INSERTS. For simple INSERTS no lock is used, instead a lightweight mutex is used (mutual exclusion object that facilitates multiple threads to access a resource).
2 – Interleaved lock mode that does not locks tables. Though most fastest and scalable mode, this is not suitable for statement-based replication.
innodb_doublewrite : A Boolean default value of 1 (ON) specifies InnoDB to first store data in a doublewrite buffer before writing to the data file. Disabling this has a marginal improvement in performance. (--innodb-doublewrite)
query_cache_size : the query cache size in bytes. By default it is 1M (--query-cache-size=#)
wsrep_on : To enable wsrep replication. It has a default Boolean value OFF and needs to be turned ON for transactions to be applied to nodes in the cluster. However, it will not affect setting up the cluster and membership of the nodes with default value OFF. For Galera replication ON is needed. (--wsrep-on[={0|1}])
The Galera STATUS VARIABLES can be queried with the statement:
SHOW STATUS LIKE 'wsrep_%';

Read more


Securing MySQL NDB Cluster

mysql-cluster-security
As discussed in a previous article about MySQL NDB Cluster, it is a database cluster setup based on the shared-nothing clustering and database sharding architecture. It is built on the NDB or NDBCLUSTER storage engine for MySQL DBMS. Also the 2nd article about MySQL NDB Cluster Installation outlined installation of the cluster through setting up its nodes – SQL node, data nodes and management node. This article discusses how to secure your MySQL NDB Cluster through the below perspectives:
1. Network security
2. Privilege system
3. Standard security procedures for the NDB cluster.
Detailed information is available at MySQL Development Documentation about NDB Cluster Security.
NDB Cluster Network Security
Cluster communication can be divided into client to cluster and cluster node to node categories. Node to node (between SQL, Data and Management nodes) communication are not recommended to be through SSL or using any other encryption protocols. This is because of the extra overhead due to network lag and latency that can affect the cluster performance. A network related security concern that needs to be addressed for the NDB Cluster is specified below.
For the config.ini file of an NDB Cluster, if the HostName parameter under [mysqld el_id='59ad103a65a56'] is blank or absent, or if there are empty slots under the [mysqld el_id='59ad103a65a93'] section, then the management node will not check about from which host, an SQL/API node or MySQL Server is attempting a connection to it. In such a case an SQl/API node can connect to the management node and access its data. To prevent this, specify a HostName parameter for all [mysqld el_id='59ad103a65acc'] and [api] sections in the config.ini file. Additionally, in such a scenario and management node client (ndb_mgm client) that knows the management node’s host address can connect to it and execute administrative commands to control the cluster.
The above issue can be resolved by many ways. One is by isolating cluster node network from external public network from which connections to the cluster is initiated. This can be done by placing a hardware or network firewall between the public network and SQL nodes.
Though costly, this setup is dependable and fault proof. It does not allow any access to data and management nodes from outside or any communication from the data and management nodes to the public network. Obviously the SQL nodes need to be secured against hacking.
Another option is to use software firewalls on all host nodes and configure them to block data packets that need not come or go outside the cluster.
The firewall on the data and management nodes hosts allow connection from SQL nodes only. Outside applications or clients can only communicate with cluster through SQL nodes. Each host machine will allow and deny traffic as specified below:
SQL node: Allows traffic from management and data nodes from within the cluster network that’s IP address is allowed, allows traffic from within the network through the assigned port and denies all other traffic.
Management and Data nodes: Allows traffic from management and data nodes, allows traffic from SQL nodes through the allowed IP address and denies all other traffic.
A combination of hardware and software firewall (network and host based) also can be used to secure the cluster.
This setup keeps the cluster behind the network firewall, but allows traffic between the management/data nodes and SQL nodes.
MySQL Privileges and NDB Cluster Security
Like any other MySQL storage engines, user privileges information for NDB storage engine tables are stored in mysql system database. Since MySQL grant tables use the MyISAM storage engine, those tables are not duplicated or shared across the SQL nodes in the NDB cluster. So changes in users and privileges made in one SQL node are not propagated to other SQL nodes by default. To enable this propagation, or distributed privileges across all SQL nodes, following steps need to be implemented:
On each of the SQL node MySQL server, import the script share/ ndb_dist_priv.sql available inside the MySQL installation directory.
shell> mysql options -uroot < share/ndb_dist_priv.sql
This script will create a number of stored routines on the SQL node. To verify this, use the below query to get the output as shown:
mysql> SELECT ROUTINE_NAME, ROUTINE_SCHEMA, ROUTINE_TYPE
    -> FROM INFORMATION_SCHEMA.ROUTINES
    -> WHERE ROUTINE_NAME LIKE 'mysql_cluster%'
    -> ORDER BY ROUTINE_TYPE;
+---------------------------------------------+----------------+--------------+
| ROUTINE_NAME | ROUTINE_SCHEMA | ROUTINE_TYPE |
+---------------------------------------------+----------------+--------------+
| mysql_cluster_privileges_are_distributed | mysql | FUNCTION |
| mysql_cluster_backup_privileges | mysql | PROCEDURE |
| mysql_cluster_move_grant_tables | mysql | PROCEDURE |
| mysql_cluster_move_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_local_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_privileges | mysql | PROCEDURE |
| mysql_cluster_restore_privileges_from_local | mysql | PROCEDURE |
+---------------------------------------------+----------------+--------------+
7 rows in set (0.01 sec)
The stored procedure mysql_cluster_move_privileges creates 2 sets of copies of the mysql database, one as MyISAM and other as NDBCLUSTER storage engine. Then the script invokes mysql_cluster_move_grant_tables that contain ALTER TABLE statements to convert the mysql system tables to NDB storage engine.
Once this conversion to distributed privileges is made at each SQL nodes, any change in users or privileges at any one SQL node will be distributed to all SQL nodes in the cluster.
As stated above in Cluster Network Security, it is important to secure the users in any SQL Node, so that a compromised node is not used to access and corrupt data in other SQL nodes. So ensure that the config.ini does not contain an empty [mysqld] block or use the HostName parameter. Also a firewall based security system is necessary to segregate the cluster from public network to ensure proper security.
Standard Security Procedures for NDB Cluster
The security procedures applicable to a standard MySQL Server is also valid for a MySQL NDB Cluster.
1. Always run the MySQL server (mysqld) as the mysql system user, that is part of the mysql user group.
2. Ensure that the system’s MySQL data directory is owned by the mysql user.
3. Ensure that the my.cnf of each SQL node contains user=mysql in the [mysqld] section.
4. Never run mysqld as the system root user.
5. Delete the anonymous user account created during installation.
6. Always keep in mind that the different nodes inside the cluster have high mutual transparency and hence it is highly recommended to secure the cluster through tight user privilege management and network security.

Read more


Why Your Business Need Database Versioning?

database-versioning
How database versioning boosts your business growth?
Applications are neither ideal nor perfect. Along with the dynamic implementation environments and rapid technology changes, certain features can become obsolete and some new ones might be needed. Workflows and processes in the application need to be changed. Bugs and vulnerabilities can be reported any time. Competitors may release a new feature that demands you to reciprocate with matching or better feature. The chances are huge for you to initiate, perform and deliver a stable minor or major version. A delay or absence in this can negatively affect the reputation and/or monetary attributes like market performance, profit etc. In these circumstances, it is essential for you to make available for your teams, tools that automates and simplifies processes as much as possible. Applications or services have code and databases are integral parts of any product. A properly version controlled database boosts and simplifies development and production deployment processes. It also helps in critical analysis, comparison and cross review of your application sets. The results give you new insights and directions for a new promising version release and you can focus on your business growth, since everything is automated.
Advantages of general version control
Version Control (VC) is important to organizations due to the following factors:
1. It allows tracking changes and modifications.
2. It allows comparison of any 2 versions and to roll back to a previous version if something went wrong in the current version.
3. Multiple teams can share a single repository with their own branches (that represent a different functionality or feature) and can pull from the main branch or root and can push into the main branch to merge their work with that of other teams’ branches.
4. Interim releases or minor versions and major version releases can be effectively performed.
5. Provides an automated and robust workflow for fixing errors or bugs in the form of patches (a new branch that fixes the bug) and hotfixes that addresses customer’s revised needs.
6. Enables organizations to adopt agile development and delivery in which new features can be added as per market needs efficiently.
Why businesses need database versioning – technology and process perspective.
While the above advantages are clear and a necessity for application codebase, when it comes to databases where the applications work with real data and data structures or schemas, version control achieves more significance. This can be attributed to many reasons, some of which are:
1. Technology, business environment and customer requirement changes pushes businesses to alter and adapt their database to best suite the necessity of the current time. If this is ignored, businesses suffer setbacks like losses in market share, decrease in product popularity and reduced relevance of the product due to lack of required features etc.
2. Even if application codebase utilizes version control to implement agile delivery models that enable the product to be in sync with market needs through minor and major version releases and patches, code developers at some point of time get stuck due to the complexity of database modifications that occurred in tandem with feature or code changes. At certain levels of branch merges, a mismatch or disparity in database related code or the database itself can break the entire merging process.
3. Database changes (structure, metadata and data) made by one team or branch, need to fully managed and tracked to aid error-free merging for minor and major releases. If database versioning is not implemented along with code versioning, some branches will not merge properly resulting in bugs and errors.
4. It might need to go back for a previous version of code at some point of time. This rollback requires the database at that time so that rolled back version is bug-free. Also there can be occasions when you need to audit your database versions to identify the best performed one. Database versions along with corresponding code versions enable auditing and continuous delivery patterns.
What to be brought under database versioning?
The database schema, metadata and data itself are the main elements that need to be version controlled. While it is not logical or recommended to version the database source code or binary, the above mentioned elements fit perfectly as candidates for version control in an agile, continuous integration and delivery scenario. All 3 elements can be represented as SQL files and it becomes as easy as file versioning with SQL files and other needed log files only to be put under version control. Another prominent element that needs versioning is backups. These will be compressed SQL files and will be represented as physical or logical backups. They also can be incremental backups where the binary log implements some kind of version control without much automation.
Implementing database versioning
A best practice to make your database fit for version control is using scripts, preferably SQL or otherwise any programming language of your choice to create and modify your database schema. There will be an initial script that creates the first version and further changes in schema, metadata and data are to be done through scripts only. Multiple teams in a project working on different branches of the application can maintain their own database instances and the versions of scripts that represent various points of branches (pulls and pushes) are stored along with the corresponding code. Some kind of automation script needs to work on the team’s database instance to run the current version of the database scripts to bring the database to the state of code at that level. This ensures that during a merge process, the database is updated automatically to reflect the current state of the application.
Incremental versioning of the database scripts is a simple and straightforward method to track changes in incremental versions of SQL scripts and logs. During the merge all increments are applied to the database to bring it to the current state. The extra overhead here is the need of manual actions from developers or DBA to run the incremental scripts during merges.
 
Who is responsible – DBAs or Developers?
Once an application is deployed, the DBA takes control of the database administration. While developers continue to work on the application codebase repository, modifying code and database, and finally do a merge for a release, DBA gets the new code with installation and upgrade script for database. In this model of continuous delivery, DBAs scope does not cover database version control but developers do. The exception is that DBA can monitor and audit database versions if he can be a part of database versioning. For this purpose, it is recommended to provide the DBA with their own test and QA database version repository. They can comfortably perform all the needed activities like benchmarking, analysis, optimization, auditing etc. and can find the best possible schema, structure and configuration. All these attributes and information can be conveyed to the developers as SQL scripts or as metadata or as plain text and developers can take this information as guidelines for next release. Like this, DBAs can be active contributors to the growth and development of entire application including the database.

Read more


Docker Releases Enterprise Edition with Multi-Architecture and Multi-Tenant Support

Docker Enterprise Edition
On August 16, 2017 Docker announced the new release of Docker Enterprise Edition (EE). Docker EE is a Container-as-a-Service (CAAS) platform that provides a cross-platform, cross-architecture supply chain and deployment environment for software applications and microservices built to work in Windows, Linux and Cloud environments. The Docker container platform is integrated to the platform infrastructure to provide a native and optimized experience for the applications. Docker EE has the Docker tested and certified infrastructure, containers and plugins needed to run applications on Enterprise Platforms. The new Docker EE provides a container management platform that accommodates Windows, Linux and Mainframe apps and supports a broad range of infrastructure types.
According to Banjot Chanana, Head of Product Management for Docker:
“With Docker EE, organizations can shift more investment into innovation by reducing the costs of maintaining their existing applications, while increasing flexibility and security. Docker EE enables organizations to unite traditional applications and microservices built on Windows, Linux or mainframe onto a single, secure software supply chain, accelerating application delivery by 13x while reducing IT spend more than 50 percent.”
The salient features of the new release of Docker EE are:
Multi-Architecture Orchestration for Applications
Docker claims to enable organizations to modernize Windows, Linux and Mainframe applications by managing all of them in the same cluser. The definite advantage of this is centralization of access controls, security and operational policies etc across teams, departments and business units.
Secured Multi-tenancy for Organizations to Define Their Own IT Services Model 
With these, organizations are offered more flexibility and options to customize their IT services model using the new role-based access models. This allows sharing, allocation and collaboration of nodes and resources according to the business requirements. So teams and projects can now logically and physically compartmentalized without disrupting the workflow, roles and processes.
 
Enhanced Security and Integrity through Policy based Automation
The flow of an application from development to production can be securely automated and accelerated using features like automated image promotion, immutable repositories to prevent version overrides during promotion to production etc. This allows teams to focus on development quality and speed while organizational compliance to standards and quality are automatically ensured and maintained.
 
Docker is changing conventions?
With the new features like multi-tenancy, Role Based access control, integrated management and end to end security model, Docker EE is changing the tedious and complex workflows and policies for organizations. The level of abstraction and functionality Docker is offering certainly will enable organizations to focus on modernizing their software engineering and operational paradigms. This will certainly be a shift from the rigid and over-surveillance process implementation to a more relaxed and growth oriented service delivery and deployment model.

Read more


How to Secure your MariaDB Database Server

mariadb-security
At the time of writing this article, the current MariaDB 10.2 offers MariaDB TX Subscription to access the feature MariaDB MaxScale. MariaDB TX subscription offers high-end functionality like implementing business-critical production, test and QA deployments, management, monitoring and recovery tools, mission-critical patching and above all, access to the MariaDB Database Proxy, MaxScale.
Following are the security related features of MariaDB MaxScale:
1. Database firewall filter.
2. Denial of Service protection.
3. Data Masking to protect sensitive data.
This article discusses steps to secure your MariaDB Server installation from the installation phase onwards.
Install MariaDB with mysql_secure_installation
mysql_secure_installation is an executable shell script available with MariaDB to install it with added security measures like setting password for root accounts, remove root accounts that are accessible from outside localhost, remove anonymous user accounts and remove test database (along with its privileges) that can be accessed by anonymous users.
shell> mysql_secure_installation
Below are the prompts being asked by the script:
Enter current password for root (enter for none):
OK, successfully used password, moving on...
Change the root password? [Y/n] n
Remove anonymous users? [Y/n] y
Disallow root login remotely? [Y/n] y
Remove test database and access to it? [Y/n] y
Reload privilege tables now? [Y/n] y
Avoid running mysqld as Linux root user
If the mysqld user has Linux System user “root” access and MariaDB FILE privilege, then he can modify any files in the system. To prevent this, always create or use the included user mysql to run the mysqld process. For this start mysqld with mysql as user.
mysql> mysqld_safe –user=mysql
Below is a list of user management guidelines for a secure MariaDB implementation:
1. Grant root access only to local clients. Example: apache server installed on the same machine or restrict host name to specific host only. ‘root’ @ ‘192.250.200.25’ – specifies access to user root from the host 192.250.200.25 only.
2. Use strong passwords for users.
3. Have separate users for each database.
4. Restrict the hosts or IPs that can access the database server.
5. Grant only required privileges to client users. Administrative privileges like GRANT etc. are not needed for client users.
Password Check and Validation
The Password Validation Plugin API in MariaDB allows for ensuring that user passwords meet at least minimum security requirements. There are 2 plugins – simple_password_check plugin and cracklib_password_check plugin that use this API. Since they are not enabled by default install them manually. Atleast one plugin needs to be enabled to implement password validation checks. If multiple plugins are loaded, all validation checks must be passed for setting or changing passwords.
To install simple_password_check plugin:
INSTALL SONAME 'simple_password_check';
OR
INSTALL PLUGIN 'simple_password_check';
The default values for the options for this plugin enforce the below password validation rules. Password should have:
1. At least 8 character in length.
2. Requires at least one digit.
3. Requires at least one uppercase and one lowercase letter.
4. Requires at least one symbol or special character.
The configuration options that can change the validation requirements are set either through SQL SET GLOBAL statements or through my.cnf/my.ini. Various options are:
--simple-password-check-digits=# minimum number of digits needed in password (range: 0 to 1000, default = 1)
--simple-password-check-letters-same-case=# needs this many upper and lower case letters (range: 0 to 1000, default = 1)
--simple-password-check-minimal-length=# minimum password length (range: 0 to 1000, default = 8)
--simple-password-check-other-characters=# needs this many special characters or symbols (range: 0 to 1000, default = 1)
Examples:
SET GLOBAL simple_password_check_minimal_length = 10;
OR
[mysqld]
simple-password-check-minimal-length = 10
If password is directly set by hash, then these validations are bypassed. In order to avoid this, enable the strict_password_validation variable (along with validation plugin) that reject passwords that cannot be validated (hash passwords).
SET GLOBAL strict_password_validation = 1;
OR
[mysqld]
strict-password-validation = 1
The SET DEFAULT ROLE statement uses MariaDB’s User Roles feature to assign a default set of packaged privileges as roles to new users.
SET DEFAULT ROLE new_user;
To remove the Default Role from the current user:
SET DEFAULT ROLE NONE;
Use Secure Connections to transfer data
The Transport Layer Security (TLS) can be used to encrypt data during transfer. In MariaDB TLS support is pre-compiled, but disabled by default. To enable it, start the server with --ssl option. Configure the server for encrypted connections by adding the Certificate Authority (CA) certificate, Server Public Key Certificate and the Server Private Key in my.cnf.
[mysqld]
ssl-ca=ca.pem
ssl-cert=server-cert.pem
ssl-key=server-key.pem
Use the file names as applicable.
For client side configuration, use CA certificate, Client Public Key certificate and Client Private Key. Invoke the client like:
mysql --ssl-ca=ca.pem
Encrypt Tables and Data
Table encryption ensures that even if someone intrudes into database or access data, the data is protected due to encryption and is immune to tampering. In MariaDB the storage engines XtraDB, InnoDB and Aria supports encryption with Aria needing tables to be created with the default option ROW_FORMAT=PAGE. All tablespaces, individual tables and log files can be encrypted. Only data that are at rest in the database are encrypted and transferring data sent to the client are not encrypted. For that, secure connections using TLS protocol needs to be used.
The limitations are:
1. Disk based Galera gcache is not encrypted.
2. The Audit Plugin cannot create encrypted output. Instead the output needs to be sent to syslog for configuring protection.
3. File based general query and slow query logs cannot be encrypted.
4. Aria log is not encrypted.
5. Error log is not encrypted since it contains query text, data, crashes, assertion failures etc. It can also be protected through syslog.
For table encryption, use an encryption plugin like file_key_management plugin and configure storage engines to use it. This plugin has the following components to be configured:
1. file_key_management_filename : file that stores the encryption keys and plugin read the keys from this file.
2.file_key_management_filekey : an optional key to decrypt the key file. The key can also be a filename that contains the key preceded by the string FILE (eg: FILE:/path/to/filekey).
3. file_key_management_encryption_algorithm : the encryption algorithm
An example configuration in my.cnf or my.ini:
[mysqld]
file_key_management_encryption_algorithm=aes_cbc
file_key_management_filename = /home/mdb/keys.enc
file_key_management_filekey = secret
To encrypt XtraDB and InnoDB tables, use the below configuration (values are examples) in my.cnf or my.ini:
[mysqld]
plugin-load-add=file_key_management
file-key-management
file-key-management-filename = /mount/usb1/keys.txt
innodb-encrypt-tables
innodb-encrypt-log
innodb-encryption-threads=4
Then use CREATE TABLE or ALTER TABLE queries with the below options:
ENCRYPTED : YES/NO (YES)
ENCRYPTION_KEY_ID : Positive Integer
Example:
CREATE TABLE T (id int, value varchar(255)) ENCRYPTED=YES ENCRYPTION_KEY_ID=17;
ALTER TABLE T ENCRYPTED=YES ENCRYPTION_KEY_ID=18;
To disable encryption and decrypt the table:
ALTER TABLE T encrypted=NO;
Aria tables can be encrypted by the below configuration in my.cnf/my.ini:
[mysqld]
aria-encrypt-tables=1
To encrypt temporary tables in disk created during query execution, set the below parameter in my.cnf/my.ini:
[mysqld]
encrypt-tmp-disk-tables=1
To encrypt temporary files created on disk during binary log transaction caching, filesorts etc:
[mysqld]
encrypt-tmp-files=1
To encrypt binary logs and relay logs:
[mysqld]
encrypt-binlog=1
Setup Auditing and Logging
The MariaDB Audit Plugin logs lot of security and performance related relevant information like incoming connections, executed queries, accesses to individual tables and information about users who read/write at what time to the database. The logs are entered on a file or syslog. This information helps us to detect suspicious activities and to investigate security breaches.
Enable logging by setting server_audit_logging to ON.
SET GLOBAL server_audit_logging=on
The events that can be logged are grouped as CONNECT, QUERY and TABLE events. To log these events, set the server_audit_events variable:
SET GLOBAL server_audit_events = 'CONNECT,QUERY,TABLE';
OR
[mysqld]
...
server_audit_events=connect,query
User activities are also logged since the activities belong to any one of the types connect, query or table. You can include or exclude users whose events are to be logged using the server_audit_incl_users variable.
To add a user without removing previous usernames:
SET GLOBAL server_audit_incl_users = CONCAT(@@global.server_audit_incl_users, ',maria');
To specifically include or exclude users:
server_audit_incl_users=robert
server_audit_excl_users=sam,rocky

Read more