Srinivasan's Java Blog: Achieving Performance in Huge Data Transfer via HTTP

This whitepaper throws light on areas of programming where certain techniques can be used to expedite the performance in transferring huge data via HTTP in java platform.

Http has gained momentum in huge data transfer and therefore performance excellence in data handling has become inevitable. The base idea of handling is to load chunks of data in memory to process it and forget it to make the data available for garbage collection as soon as possible. This obviously will improve performance. This white paper discusses about the techniques of handling the data in chunks in Http request and response to yield performance and to highlight the performance benefit.

The performance is exhibited through executing programs which use plain java http server deployed in Jetty (version 7.4.5) servlet container. The program was executed in eclipse IDE and monitored using jvisualvm tool provided by Oracle JDK in windows 7 platform.

Reading HTTP request:

The data will be read from the request to be validated and processed. Reading the whole data from the request will not be a good idea if the data is huge, as we might end up using huge memory which in turn will affect serving other requests as well. This might cause out of memory exception. The better technique will be to use streams to handle the data, get a chunk of data from the request input stream for processing and then read the next chunk. Thus we can avoid storing the huge data in memory.

To illustrate the above point two test cases has been executed in Jetty Server to exhibit the differences in reading whole data and reading chunks of data from request. The first test case reads the whole data from request and loads it in memory and writes that to a file. The Second test case reads chunks of data from request and writes it to a file in chunks. Same client program has been used to send data to these test cases. The above graph indicates that loading data in chunks has greater performance.

Test Case 1: load the 200 mb into memory for processing.Execution Time: 8845 millisecond

Test Case 2:Load 5 mb chunks into memory for processing.Execution Time: 2862 millisec

Return Http Response:

To return huge data in response using chunk operations rather returning data as whole will save memory. The Http Response OutputStream object is used to return data in chunks. The data in the file can be read in chunks and written to the response outputStream. All good frameworks will support this chunk operation.

Two test cases have been executed in Jetty Server to show the performance difference in returning whole data and returning chunks of data through response. The first test case returns the data by loading the whole content in memory. The Second test case returns the data in 5 mb chunks to response. A client program has been used to receive the data sent by these test cases. Both the test cases returns 200 mb data and it’s been monitored by JVisualVM.

Test Case 1 - Execution Time: 3889 millisecond

Test Case 2 - Execution Time: 1796 millisecond

Thus the HTTP request and response can be effectively handled from the server side. Below point explains about client side performance of handing the request and response.

Sending Http Request:

Here comes the scenario how to handle sending huge data from client side. Similar technique of not holding the whole data in memory and sending it in chunks can be used here. The Http Connection object has couple api to do this. The setChunkedStreamingMode and setFixedLengthStreamingMode are those functions which send data in chunks. The setChunkedStreamingMode function is used to enable streaming of a HTTP request body without internal buffering, when the content length is not known in advance. The setFixedLengthStreamingMode method is used to enable streaming of a HTTP request body without internal buffering, when the content length is known in advance.

First test case loads the whole data in memory and then sends it to server. Second test case user chunk mode to send the data to server. Both the test case uses the same server which receives the data in chunks since its performing better. The cpu usage, memory usage and time taken are captured by monitoring through JVisualVM.

Test Case 1 - Execution Time: 3916 millisecond

Test Case 2 - Execution Time: 2613 millisecond

Receiving Http Response

Here is the scenario of Client reading a response. The performance has been measured with test case reading and holding the whole 200 mb of data in memory and then writes a file. The other test case which reads data in chunks and writes to a file in chunks. The performance is better while reading and writing in chunks.

Test Case 1 - Execution Time: 8621 millisecond

Test Case 2 - Execution Time: 1645 millisecond

The performance test result has been tabulated below. Loading data in chunk obviously will have better performance. But this is to highlight how the performance difference.

Operation	Loading Data in chunks	Memory (mb)	CPU (Avg)	Execution Time (milli sec)
Server Reading 200 mb of data from request	N	500	40	8845
Server Reading 200 mb of data from request	Y	14	25	2862

Server Returning 200 mb of data to response	N	505	50	3889
Server Returning 200 mb of data to response	Y	12	20	1796

Client Sending 200 mb data to server	N	500	45	3916
Client Sending 200 mb data to server	Y	11	10	2613

Client Receiving 200 mb data from server	N	490	45	8621
Client Receiving 200 mb data from server	Y	10	35	1645

The following guide lines can also be useful to handle input streams efficiently

Input Stream ReadLine api usage:

In huge data transfer scenarios the readLine api of inputStream object might create performance bottle necks. If the data contains huge data in single line, then the readLine api will end up in storing huge mb of data in memory. The main idea of this whitepaper is to handle the data in chunks. This api does not help in loading in chunks in all scenario. The better way to handle this situation is to read the data from stream using read method with a buffer size of 100kb to few mb depending upon the max heap size of the application.

Sequence Input Stream usage:

It’s easy to append or prefix data to input stream without reading whole data from it. Sequence Input Stream will do the job. This might be used to log few chars from a stream and put the read byte back to the stream. This make the stream reusable for other processing without strange logic in business layer. Below is one example for prefixing the data to stream.

SequenceInputStream sq = new SequenceInputStream(new

ByteArrayInputStream(readBytes), in);

Below is one example for appending data to stream.

SequenceInputStream sq = new SequenceInputStream(in, new

ByteArrayInputStream(bytes));

in = inputStream

readBytes = byte array which has been read already

sq = sequence stream object

The idea of copying the whole data into ByteArrayInputStream to reuse the stream may not be a good idea for large data stream since ByteArrayInputStream will keep the whole data in to memory and may cause out of memory exception.

Conclusion:

Finally, processing huge data in chunks and forget it as soon as possible will make the data available for garbage collection and yield better performance. This keeps the overall memory footprint and especially the old generation as small as possible. Thus makes the application more scalable, reliable, stable and available.

Srinivasan's Java Blog

Wednesday, 9 August 2017

Achieving Performance in Huge Data Transfer via HTTP

No comments: