Hibernate provides some strategies to improve performance. One common strategy is to use batch-fetching. A 1-to-N relationship between two entities generally means that entity A has a set of entities B. Take the following LibrePlan relationship between WorkReports and WorkRepotLines:
<set name="workReportLines" cascade="all-delete-orphan" inverse="true">
This means an entity WorkReport has many WorkReportLines. When workReportLines are attached to the current session a SQL query is executed for each of them. If the set contains many elements, this means many queries. Batch-fetching allows to prefetch a certain amount of elements, reducing significantly the number of queries.
<set name="workReportLines" cascade="all-delete-orphan"
The book “Java Persistence with Hibernate” recommends using batch-sizes between 3 and 15. Hibernate also provides a default parameter (hibernate.default_batch_fetch_size) to turn on batch-fetching for all collections, however I wouldn’t recommend using it and turn on batch-fetching only for large collections. Lastly, Hibernate also provides other fetching strategies such Join fetching, Select fetching and Subselect fetching.
Another mechanism to improve performance in Hibernate is to use the second-level cache. But, what does it mean? Well, perhaps to understand what the second-level cache is, I should explain first what’s the first-level cache.
First-level cache has to do with a session lifespan. It’s active by default. When a transaction is being executed all the objects retrieved are cached in the same session. So, think of first-level cache as the cache attached to a session (transaction-scope-cache), it allows reusability of objects within a session.
But, how we could cache objects retrieved during different sessions? That’s what the second-level cache allows. Think of the second-level cache as a process-scope-cache. Second-level cache is pluggable, it means it can be turned on or not. In addition, it can be configured on a per-class and per-collection basis.
To activate second-level cache, first modify your Hibernate default settings.
Then activate caching for a specific class. I do it for Label in LibrePlan.
<class name="Label" table="label">
One useful tip to check whether second-level cache is working is to add a log appender in Log4Java configuration.
<param name="file" value="/tmp/libreplan-second-level-cache.log"/>
<param name="MaxFileSize" value="5000KB"/>
<param name="MaxBackupIndex" value="4"/>
<param name="ConversionPattern" value="%d [%t] %-5p %l - %m%n"/>
<appender-ref ref="second-level-cache-file" />
When checking the log, you should see something like this:
2012-05-28 12:05:52,523 [19765316@qtp-4334864-0] DEBUG org.hibernate.cache.ReadWriteCache.get(ReadWriteCache.java:85) - Cache hit: org.libreplan.business.calendars.entities.BaseCalendar#202
2012-05-28 12:06:15,049 [19765316@qtp-4334864-0] DEBUG org.hibernate.cache.ReadWriteCache.put(ReadWriteCache.java:148) - Caching: org.libreplan.business.resources.entities.Resource#1718
2012-05-28 12:06:15,050 [19765316@qtp-4334864-0] DEBUG org.hibernate.cache.ReadWriteCache.put(ReadWriteCache.java:169) - Item was already cached: org.libreplan.business.resources.entities.Resource#1718
Going back to the on class cache configuration, there are 4 possible values: transactional, read-write, nonstrict-read-write and read-only. Use transactional and read-only for read-mostly data, nonstrict-read-write doesn’t guarantee consistency. Lastly, use read-write for mostly-read data with eventual write.
Another important feature of second-level cache configuration is the second-level cache provider. Different cache providers support different cache operations and features. I won’t get deeper into this. In LibrePlan we use EhCache, which is the most popular open-source second-level cache provider and widely used in many Hibernate projects. I recommend this article, Hibernate Caching, to know more about fetching, caching and second-level cache providers.
So, after configuring all these settings it was time to do some benchmarking and see what was the real gain. To do the benchmarking I used JMeter. We used it some time ago in LibrePlan also for measuring performance. The benchmark consisted of a large dataset with 10 use cases. After executing 60 samples for each use case I stopped the benchmark and got the following results:
- Average. It’s the average response time. On average there was no gain.
- Aggregate_report_min. It’s the min response time. With cache the min is 0, without cache 1.
- Aggregate_report_max. It’s the max response time. With cache the max is 9, without cache 34, so there’s a 75% gain.
- Aggregate_report_stddev. Standard deviation. Without cache was high because the the average is 2 and the min and max are 1 and 34 respectively.
Summarizing, second-level cache and the new batch-fetching strategies provided a big drop on maximum response time, which also reduces dramatically the standard deviation. Without any doubt, a big gain.
Notice that time of benchmarks is measured in milliseconds. To know more about aggregate reports in JMeter check JMeter Aggregate Report.
And this is all. If you’re a LibrePlan user I hope you enjoyed knowing more about what kind of things we do to run LibrePlan faster. If you’re a LibrePlan developer or a Java developer in general I hope you found this information useful and can help you in your future projects.