For organizations, small to large, database plays a major role in their IT infrastructure as well as in storing and maintaining the invaluable data assets. These data comprises of the company’s own data plus project data that has high significance in terms of business values. Data makes a company or organization, a working business, and hence it is the biggest asset an organization is expected to take care of, in maintaining the integrity and security of the business itself. Database plays a major role in managing this data, since it is the data warehouse that gives data the capabilities to make the business run, through efficient data storage, maintenance, accessibility, retrieval and security. The DBMS – Database Management System, is a software application that uses hardware and networking for representing, storing, managing, maintaining, serving and securing data – the life energy of any business.
This article discusses some important points organizations need to seriously consider pertaining to their database needs and requirements.
Selecting the database technology
Database technology often is correlated to the overall IT infrastructure of the organization. It depends on the server technology, computing model, networking and distribution demands, scalability and performance requirements etc. Different DBMS products available in the market have their own advantages and disadvantages.
RDBMS is the industry leader for decades, with a solid underlying technology that is based on discrete mathematical principles, the intelligent SQL query language and performance based computing and logical architecture. Each of the major RDBMS systems, like Microsoft SQL Server, Oracle, MySQL, and IBM’s DB2 etc. has their own market share and fan base. This is mainly due to the ecosystem, these systems have nourished over decades, and that comprises of the software versioning and release platforms, support, bug reporting and fixes, announcements, documentation and tutorials, support from resource providers like hosting companies etc.
With the evolution of Big Data applications and massive web-based data requirements, NoSQL emerged as a good choice with its flexible schemas, ability to hold and process bulk amount of data without performance overhead. This is mainly due to the fact that many modern web-based applications have more sparse or redundant data that needs only loose linking. The transactional capabilities, complexity and compatibility considerations required by RDBMS systems are not needed to represent such data and NoSQL is emerged as a great option here. But on the other side it lacks certain features of RDBMS that provides ACID compliance, like transactional integrity, indexing, ease of querying etc.
NoSQL has support for additional data structures like key-value pairs, Documents, Graphs and Wide column store that are used heavily by todays Big Data users, RDBMS has also started providing similar functionality like providing column-store engine to their core.
The In-Memory DBMS improves performance because data is stored in main memory rather than hard disk. Since there is no disk I/O is involved, access latency is highly reduced and other computational requirements for processing disk operations. The disadvantage here obviously is the cost due to the use of costly primary memory.
Another consideration is the computing platform and technology stack – whether you are based on Linux/Unix, Windows or Mainframe. While many of the DBMS solutions are evolving as cross-platform, a prominent percentage of them are still platform dependent. Another factor is the enterprise support needed. While most solutions are open source, an enterprise level support needs you to purchase the commercial version.
Next is the level of implementation that is crucial in terms of cost and scalability. Today, with growing data and storage needs, companies are moving toward the Cloud Based DBaaS providers, also called database outsourcing. Compared to an in-house database server(s) setup, this offers much scalability with reduced cost and resource needs in terms of administration, hardware and software.
Following are other areas to consider during a database decision making process.
Database Design
It is essential for a table or entity to have a unique identifier for each row of data in the form of a primary key. This key is by default a unique constraint also and represents its row uniquely. Without a primary key searches and sorts will take large amount of processing resources. A typical data scheme will have many dependencies and relations among the data entities. This relationship needs to be considered and enforced during database design. For this foreign keys need to be used wherever necessary and its integrity needs to be preserved across tables. There should be indexes applied for columns that need sorting and also are keys for other operations. The number of indexes must be reasonable only because too many indexes can affect performance due to the extra overhead for preserving the index. Periodic monitoring of database performance can greatly assist in identifying useful and unnecessary indexes.
Table or Entity normalization is another feature that must be addressed from the beginning itself. A well normalized database will help the application to issue fast and efficient queries. Such a database can be scaled both horizontally, if more entity relationships are needed. Data must be atomic enough for organizing as a well defined entity relationship. Usage of appropriate data types and correct data sizes are another factor that affect performance and scalability at a table level. For eg: an arbitrary length text column can be assigned the SQL VARCHAR type that grows according to the data in the field, while a fixed length data can be assigned a CHAR type with a fixed size to save space. VARCHAR allows scalability while CHAR provides space optimization. Also it is inefficient to use a DECIMAL or INT with decimal places for an integer data instead of the INT type.
Configure your server properly
However good the database may be, a poorly configured database and system server will produce bottlenecks for productivity and performance. It is important to solve all software dependencies in the system server well before production to avoid crashes during load time. The memory allocation for various storage elements like database files, table space, caches, logs, configuration files etc must be sufficient enough for the smooth operation of the database. Performance statistics and query analysis can be used to optimize and fine tune these configurations throughout the database life cycle.
Backup your database
Your database is a software application that runs on hardware. The very materialistic nature of the database in itself demands solid backup and restore policies. Any point of time, there needs to be a latest possible backup must be available for restoring the database in case of an expected or unexpected emergency. Backups can be automated at system server level by using file backup tools or at database server level using the database backup tools. Scheduling the backup is also important for performance. During backups, sometimes the backup tool may lock tables to prevent changes during backup so that the latest data snapshot can be taken. If this process takes place during a (peak) load time when clients are actively using the database, service denials and application lags can happen, thus degrading the performance. So it is recommended to schedule backup during off-peak time. However for multi-node based database systems like master-slave and clustered systems, this is not an issue, since backup tool runs on a synchronized and separate node while other nodes are available for clients.
Secure your database continuously
Security is the top priority for any database. A breach can occur through many ways, including unauthorized system server access, database server access through over privileged client user, query based attacks etc. The first and foremost place to secure is the system server itself, followed by the database server. In majority of the cases the super admin user named root will be used to install and configure the system. That’s it! Many places are there from which root user can be compromised. The allowed hosts for root in some databases would be ‘%’ meaning ‘any host’. The root user may either be deleted or renamed to some other non ad hoc name that is not familiar to intruders like that the name root does. Next is confining the database users to either the local machine (localhost) or specific client machines/nodes from which database connection is seek.
Next part is restricting the privileges to trivial users to that of READ/WRITE only, so that client applications including that of the hacker do not use database users to perform any DML queries or administrative actions on the database. The GRANT OPTION privilege as well as ALL PRIVILEGES should be restricted to only the genuine administrator of the database. Physical security to your servers comes next, in which the servers are placed in dedicated and secure server rooms with unauthorized access/entry restriction. In case of outsourced database service, any user names supplied to developers/testers etc should be revoked or deleted immediately after the task/project is completed. Majority of DBaaS providers have firewalls to block malicious query injections, DOS attacks, application level intrusions, etc. Your application must be 100% secure to prevent hackers to use it to gain access to the database. Perform security audits periodically to identify the security strength of not only the database but your entire infrastructure.
Perform monitoring, maintenance and optimization rigorously
Monitoring is the great tool that makes you understand the vibes of your database and move along with it. Monitor the database traffic, performance, query cache, slow query log, error log, access log etc. to understand how your database is functioning. Use this information to further optimize and fine tune the database. Occasionally empty the caches and logs and defragment the memory pool to refresh the system. Optimize slow queries for speed and performance. Periodically rebuild indexes or if found of little use, drop less worthy indexes and identify indexes that really boost performance.
Prepare for change and scale out
Every business strives for growth and databases are the primary area that reflects this growth. So from the planning stage itself, leave room for growth and scalability. Prepare for data growth and traffic increase and use resources that allow this expansion or scalability. It is essential that your selected database technology has in-built provisions for vertical scalability. Irrespective of the DBMS, the data must be formatted or backed up in a multi-platform, multi-architecture compatible fashion so that, when time arrives for a major shift, you can carry your data for re-deployment in the new system without any issues. This ensures that the business is capable for growth and the data along with database is always there with the business.