First and foremost, let's talk about why you should learn how to use streams.
Streams in Java 8 are essentially the solution to aggregating data easily.
What the heck do I mean by aggregating data?
Consider a database query that uses aggregates.
In SQL, the aggregates we have access to are operations that are performed on groups of data like:
- counting rows
- summing up values
- finding a min or a max
- getting an average
All of these operations can be performed on data that is grouped into buckets.
If you're not at all familiar with the concept of aggregate functions and grouping, I'd highly suggest reading my articles / listening to my podcasts on the topics: SQL Aggregate Functions and SQL Group By
Data Aggregation without Streams
Okay, so if streams are great for data aggregation, then let's see some examples.
Let's say we have a game. In this game there are Player
s, and Player
s have high scores. They also have location information (i.e. city and state).
Now, if we wanted to get the top high scores for players in all 50 states in the USA, how would we do this with our Java 7 level coding skills (read: not using streams)?
Let's see how we would get aggregate information without using streams:
(brace yourself, code is coming!)
Player.class
package com.coderscampus; public class Player { private Long id; private Integer highScore; private String state; private String name; public Long getId() { return id; } public void setId(Long id) { this.id = id; } public Integer getHighScore() { return highScore; } public void setHighScore(Integer highScore) { this.highScore = highScore; } public String getState() { return state; } public void setState(String state) { this.state = state; } public String getName() { return name; } public void setName(String name) { this.name = name; } @Override public String toString() { return "\nname=" + name + ", state=" + state+ ", highScore=" + highScore; } }
WorkingWithStreams.class
package com.coderscampus; import java.util.ArrayList; import java.util.Arrays; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.LinkedHashMap; import java.util.List; import java.util.Map; public class WorkingWithStreams { static List<Player> players = new ArrayList<>(); public static void main (String[] args) { System.out.println(getHighScoresWithoutStreams()); } private static List<Player> getHighScoresWithoutStreams() { // without the use of streams, we'll need to break down each step // of the process for finding the high scores in each state // Step 0: make sure our players List is populated with data... normally // we would have a database, but for our purposes, we'll just hard code // some example data from the populatePlayerData method. populatePlayerData(); // Step 1: Group the Player data by state Map<String, List<Player>> groupedPlayerDataByState = new HashMap<>(); for (Player player : players) { if (groupedPlayerDataByState.containsKey(player.getState())) { groupedPlayerDataByState.get(player.getState()).add(player); } else { groupedPlayerDataByState.put(player.getState(), new ArrayList<Player>(Arrays.asList(player))); } } // Step 2 & 3: Sort grouped Player data by high score and return the highest score List<Player> highScores = new ArrayList<>(); for (Map.Entry<String, List<Player>> entry : groupedPlayerDataByState.entrySet()) { Collections.sort(entry.getValue(), new Comparator<Player> () { @Override public int compare(Player player1, Player player2) { return player2.getHighScore().compareTo(player1.getHighScore()); }}); highScores.add(entry.getValue().get(0)); } return highScores; } private static void populatePlayerData () { players.add(createPlayer(1L, "John Doe", 5048, "Arizona")); players.add(createPlayer(2L, "Jane Doe", 2400, "Arizona")); players.add(createPlayer(3L, "Super Man", 1450, "Washington")); players.add(createPlayer(4L, "Bat Man", 3205, "Washington")); players.add(createPlayer(5L, "Frodo Baggins", 100, "Washington")); players.add(createPlayer(6L, "Daenerys Targaryen", 10000, "Colorado")); players.add(createPlayer(7L, "John Snow", 9800, "Colorado")); players.add(createPlayer(8L, "Arya Stark", 6050, "Colorado")); players.add(createPlayer(9L, "Sansa Stark", 7220, "California")); players.add(createPlayer(10L, "Tyrion Lannister", 4680, "California")); } private static Player createPlayer (Long id, String name, Integer highScore, String state) { Player aPlayer = new Player(); aPlayer.setHighScore(highScore); aPlayer.setId(id); aPlayer.setName(name); aPlayer.setState(state); return aPlayer; } }
The output of all that code above will be this list of Player
high scores by state:
[name=Daenerys Targaryen, state=Colorado, highScore=10000,
name=Sansa Stark, state=California, highScore=7220,
name=John Doe, state=Arizona, highScore=5048,
name=Bat Man, state=Washington, highScore=3205]
Now that's a CRAP load of code just to get this output… but without the use of Streams and Lambdas in Java 8, this is more or less what you would have to do to get the correct output.
Data Aggregation with Streams
Thankfully the alternative is stupidly simple.
First let's have a look at what the code would be to get our desired output using Streams, then I'll explain it.
What I'm going to show you is the one method that will do the work we need to return a list of Player
s. This method can be called from the main
method in our code example above.
private static Collection<Optional<Player>> getHighScoresWithStreams() { return players.stream() .collect(Collectors.groupingBy(Player::getState, Collectors.maxBy(Comparator.comparing(Player::getHighScore)))) .values(); }
Optional Keyword
Obviously, there's a LOT going on inside of this code, but that's why I'm here to try my best to explain what's going on.
Starting from the top, the first thing you may see and scratch your head over is the Optional
keyword. The Optional
keyword was also introduced in Java's version 8. I'll cover that topic in-depth on the next post.
For now, you just need to know that this means there may or may not be a Player
object inside of the Optional
Collection
container object.
Stream() Method
Next up is the actual stream
method. This is the method you call when you want to start the process of creating a stream from your iterable Collection.
In this case, we have a collection of Player
objects, this is the collection that we want to start streaming.
Once we've started our stream, there are a few things that we can do to our data. Sometimes you'd want to filter your data so as to eliminate data points that you don't care about, but that's not what we're doing in this example (as we want to consider all data points). I'll show an example with filtering next.
What we do want to do, is group our transactions together by some criteria. In our example, we want to group by the Player
‘s state.
This is accomplished by first executing the collect
method. The collect
method is usually that last operation that you perform on a stream (though that isn't evident from our example here). Again, I'll show more examples after this.
In order to collect
our data such that it will be returned to our collection of Player
s, we need to tell Java how we'd like to collect our data. So we tell Java that we'd like to group our data together using the Collectors.groupingBy
method.
You can think of the Collectors
keyword like a utility class. It's got a whole bunch of static methods that can be used to help us tell Java how we want to collect our data.
So when we initiate the Collectors.groupingBy
method, we need to tell it how we'd like to group our data… so we say groupingBy(Player::getState)
.
Now, we're not done with the grouping function just yet. If all we wanted to do was to group our data by the Player
‘s state, then sure, we'd be done… but we want to get the MAX value for each state.
Luckily the Collectors.groupingBy
method can also take a second parameter. This second parameter is the “downstream” parameter which performs a reduction on the data. All this means is that it will reduce the number of returned data points in some way. For our purposes this reduction is going to reduce the data set to just the maximum value in each bucket.
So how do we tell Java that we'd like to reduce our data set to just the maximum values? Well lucky for us, we have a Collectors.maxBy
utility method.
The Collectors.maxBy
method takes one parameter, a Comparator
.
Now from our last lesson, we learned about Lambdas and how to use them with a Comparator
. You could just pass in a lambda expression like we already learned about… or, you could make use of yet another utility method: Comparator.comparing
.
With the Comparator.comparing
method, you just pass in the method of the class you'd like to compare on. So for our example we pass in Player::getHighScore
. It will automatically compare values highest to lowest. If you wanted to get the reverse order, then you would just append .reversed()
to the Comparator.comparing()
method call.
And finally, now that we've told our collect
method call how we'd like to collect our data, we're left with a Map
of String
s as the key (the Player
‘s state) and a List
of Player
s as the values. Remember… We grouped List
s of Player
s by state, so that's what we're getting after all our collector calls ;)
So what we truly want to end up with, is not a Map
, but a List
. So how do we get the values from a Map
? We invoke the .values()
method on our Map
.
And voila! We have our result set.
So, with four lines of code, it took me about 40 lines of text to explain what was going on. That's both the upside AND the downside of Java streams. They're efficient in terms of execution and conciseness, but there's a lot going on in the background that you need to understand.
Another Java Streams Example
As promised, I'd like to talk about another example.
This time, let's just say that we want to get a list of Player
s so are over the age of 20. Maybe the reason for this is that you want to throw an adults only party?
First, let's assume that we have the appropriate age
property coded into our Player
object. I'm too lazy to re-post the entire Player
object with this new property, but we should all be familiar with how to add a property to an object at this point, right?
Now, how do we get a list of Player
s who are older than 20 with streams? Let's have a look:
players.stream() .filter(p -> p.getAge() > 20) .collect(Collectors.toList());
Here we can see the use of the filter
method.
Filtering is pretty straight forward. The Filter
method takes one parameter, a Predicate
object.
A Predicate
is a special interface provided by Java as of version 8. Recall that we talked about another one of these special interfaces in the last lesson… it was called the Consumer
interface.
The difference between a Consumer
interface and a Predicate
interface is that the Consumer
doesn't return anything, but the Predicate
returns a Boolean
.
This is important, because if we were to code a method for filtering, we would need to return a Boolean
value to say if we should or shouldn't include a particular data point.
So, this is why the filter
method takes a Predicate
.
Here's a peek at what the Predicate
looks like:
@FunctionalInterface public interface Predicate<T> { boolean test (T t); }
So using Lambda expressions, we can concisely write out an appropriate filter like we did above with: filter(p -> p.getAge() > 20)
.
Then finally, we fire off our collect
method which will allow us to return all our our filtered data. In this case we use the Collectors.toList()
utility method to convert our data points into a List
of Players
(very handy).
Example Using Map Functions
So, we've seen how to use some neat utility methods with the Collectors
object, and we've seen how to filter results. Now let's have a look at how we can use the mapping functionality with streams.
With the mapping functions, you can translate your larger data set into a smaller one and return it.
A great example can be shown with the getPlayerIds()
method that we'll build.
Let's assume that we want to get a List
of the Player
IDs, we can stream them and return them with the map
function:
public List<Long> getPlayerIds() { return players.stream() .map(Player::getId) // instead of returning all the players, just return the player IDs .collect(Collectors.toList()); }
Hopefully this one is fairly straight-forward to understand. The key here is that we're wanting to return a Player
‘s ID, so we use the map
function to facilitate the translation from Player
to Long
.
Another Random Mapping Example
There are other mapping functions that we can make use of, like the mapToInt
method. Let's have a look at a simple example:
Stream.of("a1", "a2", "a3") .map(s -> s.substring(1)) // returns a stream of Strings: ["1", "2", "3"] .mapToInt(Integer::parseInt) // returns a stream of Integers (an IntStream): [1, 2, 3] .max() // returns an OptionalInt: 3 .ifPresent(System.out::println); // prints 3