Most of the times I had used the “relatively new” streams in Java, it was for straight-forward tasks with operations on one stream. I sometimes use more sophisticated collectors, like Collectors.toMap, but recently I had to use more complicated stuff to solve a real problem and I thought it might interesting writing about it, as the examples one usually reads about are synthetic and lack the focus of solving a real problem.
In this case, my problem was gathering some data from some Jira reports and producing some statistics. The part about gathering the data I did simply by exporting some Jira filters into Excel files and reading them using the Apache POI library (also using streams). I was just interested in some of the data (Issues, components, versions and time reported by users) so I created two simple classes to hold the data:
Basically, an issue holds the component and version attributes and a list of reports that include the username and the time reported in that task. In this case, I weren’t interested in the dates of the reports, as they had already been filtered at the Jira level.
First, I read the data from several excel files, one with the issues definition and others with the time reports for a given dates interval and end up with a “List nonEmptyIssues” structure where we have the issues that have some useful reports.
Once we have the data read, the questions to answer were:
· How much time was reported by each user?
· How much time was reported for each component?
· How much time was reported for each version?
The user report is the simplest, as the user attribute is at the report level: I just had to convert the stream of issues into a stream of reports and then aggregate the time for each report, for each user. In “stream-speak”, that means flattening the stream, grouping by user and then reducing each list of “reports by user”. The final code looks like this:
The component and version reports are a bit more complicated, as they are attributes at the Issue level, but we want to aggregate the time at the _Report_level. So, in this case, we have to group first and then for the values of the grouping (a list of issues), transform them into a stream of issues and then aggregate them. Fortunately, there is a Collectors.groupingBy function that allows you to operate further on the values associated to each key. The final code looks like this:
Unfortunately, the Collector that we need to convert the collection of issues into a stream of reports (Collectors.flatMapping) was not present in Java until version 9, but a bit of StackOverflow and some smart extension and voilà, you can backport it to Java 8:
And that’s it, you can execute these pieces of code in order to get aggregated stats for each user, component and version, and the code is much simpler than what is used to be. These are some of the type of things that make streams really shine.
PS: As a side note, if you encapsulate some functions properly, reading the Excel sheets can look also quite elegant: