We've covered a lot of topics on Hibernate already on this blog in a nice sequential format. If you haven't had the pleasure of going through all the tutorials, I'd suggest going to this page and having a flip through all the topics.
How to Avoid Duplicate Records from Hibernate Queries
This problem was the bane of my existence when I first started using Hibernate because I had no idea where the problem was coming from.
If you're executing a query and finding that you have a bunch of duplicate records and haven't a clue why, then you're in the right place.
You see the problem is typically caused by having left joins (or optional joins) in your objects. When you have a base object, like say User
and it joins to another table/object in an optional One-to-Many or optional Many-to-Many format, then you may get duplicates.
Consider this scenario… A User
objects joins to the LoginHistory
object, which keeps track of all the times a particular User
has logged into the system. And let's say our user has logged in many times. You'll have a situation where you have many records in the LoginHistory
table.
So what happens when you run a query that joins to the LoginHistory
table? Well it will return as many rows as there are entries for that User
in the LoginHistory
table.
So because of this, Hibernate doesn't massage the data for you, it just returns exactly what it got from the database. The ball is in your court to tell Hibernate what to do with records it has retrieved.
There are two solutions to this problem:
- Declare your joining object as a
Set
- Make use of Distinct Root Entity Results Transformer
The Problem at a Glance
So here's an example of the problem in action. Below you'll see an outline of the optional one-to-many join between the User
class and the LoginHistory
class.
User.java
import java.util.List; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.FetchType; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.OneToMany; import javax.persistence.Table; @Entity @Table(name="users") public class User { private Long userId; private String username; private String password; private List<LoginHistory> loginHistory; @Id @GeneratedValue(strategy=GenerationType.AUTO) @Column(name="user_id") public Long getUserId() { return userId; } public void setUserId(Long userId) { this.userId = userId; } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } @OneToMany(cascade=CascadeType.ALL, mappedBy="user", fetch=FetchType.EAGER) public List<LoginHistory> getLoginHistory() { return loginHistory; } public void setLoginHistory(List<LoginHistory> loginHistory) { this.loginHistory = loginHistory; } }
LoginHistory.java
package com.howtoprogramwithjava.example.persistence; import java.util.Date; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.FetchType; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.JoinColumn; import javax.persistence.ManyToOne; import javax.persistence.Table; @Entity @Table(name="login_history") public class LoginHistory { private Long loginHistoryId; private Date loggedIn; private User user; @Id @GeneratedValue(strategy=GenerationType.AUTO) @Column(name="login_history_id") public Long getLoginHistoryId() { return loginHistoryId; } public void setLoginHistoryId(Long loginHistoryId) { this.loginHistoryId = loginHistoryId; } @Column(name="logged_in") public Date getLoggedIn() { return loggedIn; } public void setLoggedIn(Date loggedIn) { this.loggedIn = loggedIn; } @ManyToOne(cascade=CascadeType.ALL, optional=true, fetch=FetchType.EAGER) @JoinColumn(name="user_id") public User getUser() { return user; } public void setUser(User user) { this.user = user; } }
And then we have the entry in a DAO class that will query for User
s by passing in a username
.
public List<User> getUserByUsername (String username) { Session session = sessionFactory.getCurrentSession(); return session.createCriteria(User.class).add(Restrictions.eq("username", username)).list(); }
So in the code directly above, we are querying the database for User
s that have a username
matching the String
that's being passed in.
The problem with this is that once it runs this query, it will return multiple rows if the User
it finds has logged into the system more than once.
Again, it does this because it's an optional (left) join. If you were to take a look at the query built by Hibernate, it could look something like this (assuming the username
being passed in is tpage
):
select * from users left join login_history on login_history.user_id = users.user_id where users.username = 'tpage';
This query returns multiple results (three results to be exact, in my database) as I've inserted three separate rows into the Login_history
table that point back to the tpage
user.
Alright, so hopefully you fully understand the problem, now let's talk about some solutions to this problem.
Declaring the Join as a Set
On the parent side of the relationship, you'll be declaring a collection of objects that embodies your *-to-Many relationship. What you'll need to do here is to use a Set
as the backing collection as opposed to something like a List
(which is what we were using in the User
class above).
Here's an example of how to set it up correctly:
import java.util.Set; import javax.persistence.CascadeType; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.OneToMany; import javax.persistence.Table; @Entity @Table(name="users") public class User { private Long userId; private String username; private String password; private Set<LoginHistory> loginHistory; @Id @GeneratedValue(strategy=GenerationType.AUTO) @Column(name="user_id") public Long getUserId() { return userId; } public void setUserId(Long userId) { this.userId = userId; } public String getUsername() { return username; } public void setUsername(String username) { this.username = username; } public String getPassword() { return password; } public void setPassword(String password) { this.password = password; } @OneToMany(cascade=CascadeType.ALL, mappedBy="user") public Set<LoginHistory> getLoginHistory() { return loginHistory; } public void setLoginHistory(Set<LoginHistory> loginHistory) { this.loginHistory = loginHistory; } }
Again, the key is to use a Set
as the collection of child entities. So in our example above we used Set<LoginHistory>
Distinct Root Entity Results Transformer
This sounds like pure gibberish, but thankfully it's not hard to implement. All you need to do is make sure that you set a ResultsTransformer
when you're building your query.
Remember that DAO query we talked about near the beginning of this article? I'll show you that query again for the sake of completion:
public List<User> getUserByUsername (String username) { Session session = sessionFactory.getCurrentSession(); return session.createCriteria(User.class) .add(Restrictions.eq("username", username)) .list(); }
This query is alright, but as we've seen, it doesn't work well when we have an option join and we're dealing with a List
of entities.
In the event that we NEED to use a List
of entities, we can alter this query to use the ResultsTransformer
. Let's see what the corrected code looks like:
public List<User> getUserByUsername (String username) { Session session = sessionFactory.getCurrentSession(); return session.createCriteria(User.class) .add(Restrictions.eq("username", username)) .setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY) .list(); }
You see the difference? We just added one additional line to the query: setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY)
and voila! You will no longer have duplicates in your returned result set.