Archive for the ‘Performance’ Category
Some real bad design
Posted February 14, 2014
on:But even after looking at the latencies people wish to go with “Web Services” even for applications which are hosted in the same data center and have been developed by the same team and in the same technology, just because it is the “in thing”.
To use web services one needs make the following conversion
1. On the client before making the request convert the native format to strings to form the XML to be sent.
2. On the server on receiving the XML request convert to native format for better processing
3. On the server after getting the results convert the native back to string so that it can be sent over XML
4. On the client after receiving convert the string in the XML to native format for consumption.
The steps 1 and step 4 may not be mandatory in some scenarios. Steps 2 and 3 can almost not be avoided. These steps need to be executed irrespective of the protocol that is being used. These conversions are expensive and should be avoided when one is writing applications in the same technology and more so when the applications are hosted in the same data center (read LAN)
In an extreme, one of the projects had the following data flow:
1. User from the Desktop Browser makes a request
2. The Application Server converts this into a Web service request and posts it to another server.
3. This Web Service server makes an EJB call to another server.
4. The EJB in turn invokes one or more stored procedures which actually have the business logic, data access and everything else.
5. The data is obviously returned through the same path.
There is no technology other than Java involved in any of the layers above and they are all hosted in the same data center.
Now the same functionality over a mobile has been implemented as follows:
1. User from the Mobile Application makes a request
2. This is sent to the server as a REST request.
3. This REST request from mobile is converted and another internal REST request is made.
4. This internal REST request is converted into a Web service request and posted to another server.
5. This Web Service Server makes an EJB call to another server.
6. The EJB in turn invokes one or more stored procedures which actually have the business logic, data access and everything else.
7. The data is obviously returned through the same path.
Is it any surprise that this application is facing performance, stability, availability and reliability issues?
How to make SQL Perform
Posted February 12, 2014
on:SQL Performance – 101
Some Design Considerations/Optimizations
Summary
Perils of Stored Procedures
Posted October 7, 2013
on:Introduction
The topic will seem to be a sacrilege to quite a few members of the architect and designer community. Stored Procedures have been considered to be a panacea for all performance problems in the RDBMS world. Of late they also have been considered as the panacea to prevent SQL injection in applications. Given this background it will not be surprising to see many architects and designers react with shock and awe looking at the heading of the topic.
Advantages/Benefits of Stored Procedures
First let us explore some of the advantages of Stored Procedures.
1. Usage of Stored Procedures prevents SQL injection, unless one uses dynamic queries within the Procedures themselves.
2. The data is processed within the database server and only the data that is really required to be sent to the client travels over the network. This optimizes the network traffic between the Database Server and Client and makes the performance better.
3. It is easier to leverage features like cursors and temporary tables to optimise the processing of the data.
4. All the applications that can connect to the database server can execute the business processes by executing the stored procedures in the database. This leads to the belief that stored procedures eases in reusability of code across applications.
The Past
Given these advantages it is not surprising that many consider stored procedures to be a panacea to many of the problems that they face. These benefits are(read were) genuine and were a boon in the days past.
Till a decade ago the scalability requirements of the applications were limited. A bulk of the processing happened in batch processes and the users of the Online Transaction Processing transactions were limited. Applications typically followed the client server paradigm where the client managed the User Interactions and the Database managed the Business Processes.
There was a lot going in the favour of stored procedures in this scenario. A big sized server could manage the load that was generated by the online transaction processing.
The Present
The scale of operations in today’s applications has drastically changed. The applications today are not exposed to a limited number of users like the applications of the bygone era. Today’s applications are exposed on the internet and are accessed via browers, tablets and mobile phones. If one considers a banking application, in the past, this access and usage of this application was limited to the employees of the bank. Today the same banking application is exposed on the internet to the customers of the bank. A variety of functionality of the application are exposed through the internet. The customers can enquire their balance, they can make a payment to a vendor via net banking, they can transfer funds to another person and for a variety of other features are available to the customers. This has tremendously increased the load on the application.
Before we proceed it is important to remember the fact that stored procedures execute within the database server and can leverage resources in that server.
If now, in an application a majority of the business logic is in stored procedures, every user who will be using the application will be using the resources in the database server. So as the number of users and correspondingly the number of transactions in the applications go up the load in the database servers go up. Soon we will reach a stage where the database server reaches its capacity. The only way now to scale the application will be to increase the sizing of the server on which the database is running. Databases scale only VERTICALLY and not HORIZONTALLY (Yes, I hear you, we will come to Oracle RAC in a short while)
Instead of having all the logic or majority of the logic in the databases, one were to put them in the “Application Servers” and leverage features like distributed caching it will be possible to scale the application horizontally. The database will be used primarily for storing and retrieving the data required for the data. To scale the application it will be possible to increase the number of “Application Servers” on which the application runs. In this way one can almost scale indefinitely.
The retrieval of data from the database server to the application server will add some inefficiency of the processing, but this will offset by the fact that one will be able to scale the application support a much larger load.
The cost of ownership also comes down as one can use normal servers as “Application Servers”. One does not require high cost specialized servers for this purpose.
To get the efficiency of a stored procedure one can use PreparedStatement in the Java world and Prepared Commands in the .Net world. Using PreparedStatements and Prepared Commands, the right way, one can also shield the application from SQL injection.
Other perils of stored procedures
1. If one makes a heavy usage of stored procedures then one will be stuck with the particular database for the life of the application. Migrating to another database will be an expensive proposition.
2. This becomes an even bigger limitation in case of products. The customers wish to have the ability to be able to run the product using a database of their choice rather than the vendor of the application deciding the underlying database. If we use stored procedures in the product we will end up maintaining stored procedures per database and this will be an expensive proposition.
3. The code in stored procedures is not conducive for efficient documentation even if one uses good coding practices and we end up having bad code smells which are not easy to wish or whiff off.
4. The code in stored procedures is not conducive for refactoring which means that the code will continue to smell.
5. The newer techniques of using interfaces and/or web services to expose features of an application to another application are a much better way of reusing the business logic rather than using the stored procedures from multiple applications. These mechanisms give a more elegant mechanism to loosely couple applications than stored procedures.
Note on Oracle RAC
Oracle RAC has the capability of scaling horizontally. But it comes with its own set of complexities and limitations. Databases leverage the memory on the servers to cache data to give improved performance. This is possible and it is very efficient when the database server is running on a single server as all the changes to the data pass through the same process and it is possible to update the cache with the changed data or invalidate the right section of the cache.
When one starts running the database servers on more than one physical server it becomes important to ensure that any change to the data is conveyed to all the caches on the various servers or the corresponding caches are invalidated on all the servers or before using the data in the cache one will need to validate if the data has changed through the process in the other server. This is not easy and adds to the overhead of processing thus reducing the benefit that ones gets from the caching mechanisms used by the traditional databases.
A note on NoSQL
Today’s applications have reached a scale where even the most powerful RDBMS is unable to scale to the levels to traffic experienced in these applications. Even if it scales, the cost of the hardware required to support these loads, is prohibitive.
Google, Facebook, Twitter and many sites which deal with large data (read Big Data) all use RDBMS only to a limited extent. They instead use a new breed of database servers classified as NoSQL. The term NoSQL itself has no expansion. It was used as a twitter hashtag to denote the set of database applications which do not use SQL to provide access to the data that is stored. Also they do not store the data in the traditional way where the data is organized in tables and columns. NoSQLs will be a topic for another post sometime in future.
This further strengthens the reasons to use stored procedures only in really exceptional situations.
Conclusion
There are situations where stored procedures are still a possible solution. But given the scale of operations of today’s applications it is advisable to consider stored procedures only in really, really exceptional situations. It is time we understand and learn better ways of developing applications that perform and are secure rather than sticking to the older practice of using stored procedures to achieve performance.
In the CMG meet at TCS, Mumbai Mr. B. Gopal on 4th October used an apt phrase “Conservation of Minimum Complexity” to mean that any system has a minimum level of complexity and it not possible to reduce this complexity any further. The aim should be distribute the complexity across the various layers of the system. When one makes excessive usage of stored procedures one is increasing the complexity of the database layer and this will is not be good in the long run both from a performance as well as from a maintenance perspective.
- In: Cache Memory | Garbage Collection | JSR | JVM | Memory Management | Performance
- Leave a Comment
One of the highly recommended techniques for improving performance is to cache as much data as possible.
And it is possible to allocate large memory to processes running in the current machines (read 64 bit operating systems on 64 bit CPUs). But the one road block that holds technicians from providing a very high heap memory to Java is Garbage Collection.
Java introduced the concept of Garbage Collection to reduce the pains of developers of managing memory and does a fairly good job. But it fails to deliver when it has to deal with large memory cleanups. Every JVM that exists today pauses the applications one time or the other to clean up the garbage that has been generated and during these pauses mean the response to the requests of the users are slowed down drastically. This is an unacceptable situation for most applications.
At a high level there are two ways the heap is treated
1. The whole heap is treated as one and memory is allocated and freed as required.
2. The heap is split into multiple regions based on the age of the object. Objects are allocated in the young region and are promoted to the tenured (Old generation) region as their life increases. This is also called the generational heap and the three major JVMs today prefer this mechanism to manage the heap with some variations. Coincidentally the .Net CLR also uses a similar mechanism to manage the heap.
There are different types of collectors that exist
1. Serial – This is the one with the most pause time
2. Parallel – This reduces the pause times based on the number of CPUs available
3. Concurrent Mark and Sweep – This further reduces the pauses as it tries to do some work during the free cycles of the application threads and starts working before the memory gets exhausted.
Most JVMs and most collection mechanisms do well in collecting the memory in the young generation regions. The problem starts when they have to collect data for the tenured generation as the objects are typically inextricably linked by the time it comes to this generation. Also this generation is of the biggest size and it contains more number of objects than the other generations. This means that it takes longer to trace the roots of the objects in this generation.
Consider the scenario in which the application caches data. This would typically get allocated in the your generation and would slowly progress to the tenured generation. If the right objects are cached then these objects will not change often. At the same time there will be other objects which will come to the tenured generation and these may not be of this nature. Now when a tenured generation collection is triggered the JVM not only has to walk through the ones that will potentially die, but it also has to go through the objects which are very unlikely to die. It is possible that the ones that will not die will occupy a larger space and will be more in number than the ones that are likely to die. This means that most of the work done in the Garbage Collection Cycle is redundant.
Now instead of this if the JVM provided a separate memory section for caching then what will happen is that the tenured will be only those objects that have stayed long for some right or wrong reason. And in a well written program these will be very few. The collection objects in this tenured area should be faster as now the objects in cache will be excluded from this Garbage Collection Cycle and this should as per our assumption be the biggest contributor to the objects in the tenured area.
The added advantage will be that the objects identified for caching will directly go to the cache region and will never trouble the young generation collection or the tenured generation collection. To handle references of the cache objects in the other areas we could maintain a reference counting algorithm which can decrement the reference counts at the time of Garbage Collection Cycles in those areas. This will ensure that one does not have to walk the young and tenured regions when running a Garbage Collection cycle in the tenured region. It will be better if we do not allow reference of these objects to be maintained by objects allocated in the other regions other than within a method.
The memory management of this area can be further enhanced by dividing this section into the following subareas:
1. Area with objects that never die
2. Area with objects that can be removed using LRU algorithm
3. Area with objects that can be removed using LFU algorithm
With ability to tune the sizes of these regions.
Now with this it will be possible for the application developers to allocate and manage a huge cache within a single JVM which is difficult today given the limitations of Garbage Collection Cycle.