Hi all,

This will be a bit long-winded, but I beg you to hear me out...

I have a table containing information about financial reports that have been produced. It basically looks like the following, under the best circumstances:

Code:
============================================================
| id | acct | month | year | date_created | routing_status |
|----------------------------------------------------------|
|  1 |    A |     5 | 2014 |   time stamp |           NULL |
|  2 |    B |     5 | 2014 |   time stamp |           NULL |
|  3 |    C |     5 | 2014 |   time stamp |           NULL |
|  4 |    D |     5 | 2014 |   time stamp |           NULL |
|  5 |    E |     5 | 2014 |   time stamp |           NULL |
============================================================
There are more columns in the real table, but for my purposes here, these are the columns we will be worried about.

From this, I am trying to find all reports that do not have a "routing_status" value of 1. NULL is the default state of this column. All newly-run reports have the routing_status value set to default (NULL). When a report is routed sucessfully it is updated to an integer value of 1. Routing errors and any retracted routing status will be flagged with other integers, which aren't important to note except that they will NOT be 1. Getting the unrouted reports from the basic data above is simple as pie with only one instance of a given account number.

BUT...I will have some (most) of the same accounts showing up with different month/year values AND I may have more than one version of any given financial report even in the same month/year if a report is re-run (these reports are done in batches, so re-running a report should not mean that it shows up on an "unrouted reports" list which is what I am trying to build). So I need a SELECT query that will essentially do the following:

  1. SELECT all reports and group them by year, month, and acct (these reports are done monthly, so most accounts will repeat in subsequent months until the account is closed).
  2. For each of those year/month/report combinations, return - at most - only one record (either the whole row or at least the report's "id" column value).
  3. The one record returned for each of those combinations should be either one with a "routing_status" value of 1, or if no rows qualify then we should return the record with the greatest time stamp value in the "date_created" column.


If I can get a query working that does all of that, I can make it into a subquery and filter out the already-routed reports in the next step. I don't need help with that final part, I just need a hand on how to basically get either 1) a previously-routed report id, if present; else 2) the report id of the row with the max creation date for each combination of year/month/acct.

As the table gets more populated things will get to look more like this:

Code:
============================================================
| id | acct | month | year | date_created | routing_status |
|----------------------------------------------------------|
|  1 |    A |     5 | 2014 |   time stamp |           NULL |
|  2 |    B |     5 | 2014 |   time stamp |              1 |
|  3 |    C |     5 | 2014 |   time stamp |              1 |
|  4 |    D |     5 | 2014 |   time stamp |           NULL |
|  5 |    E |     5 | 2014 |   time stamp |           NULL |
|  6 |    A |     4 | 2014 |   time stamp |           NULL |
|  7 |    C |     4 | 2014 |   time stamp |           NULL |
|  8 |    D |     4 | 2014 |   time stamp |           NULL |
|  9 |    F |     4 | 2014 |   time stamp |           NULL |
| 10 |    A |     5 | 2014 |   time stamp |              1 |
| 11 |    A |     5 | 2014 |   time stamp |              0 |
| 12 |    B |     5 | 2014 |   time stamp |           NULL |
| 13 |    C |     5 | 2014 |   time stamp |           NULL |
| 14 |    D |     5 | 2014 |   time stamp |              1 |
============================================================
So, for example, I would want the above table to be pared down to just the following:

Code:
============================================================
| id | acct | month | year | date_created | routing_status |
|----------------------------------------------------------|
|  2 |    B |     5 | 2014 |   time stamp |              1 |
|  3 |    C |     5 | 2014 |   time stamp |              1 |
|  5 |    E |     5 | 2014 |   time stamp |           NULL |
|  6 |    A |     4 | 2014 |   time stamp |           NULL |
|  7 |    C |     4 | 2014 |   time stamp |           NULL |
|  8 |    D |     4 | 2014 |   time stamp |           NULL |
|  9 |    F |     4 | 2014 |   time stamp |           NULL |
| 10 |    A |     5 | 2014 |   time stamp |              1 |
| 14 |    D |     5 | 2014 |   time stamp |              1 |
============================================================
Here you can see that we have (for each year/month/acct combination) only the routed reports or those with the latest creation date. For the most complex situation I have highlighted account "A" for a 5/2014 report so you can see what I am trying to achieve. The record returned for that month/year report on account "A" was not the oldest or newest record, but it was marked as routed. The other two records for that report period and account are now ommitted, but without impacting in any way the report record for that same account for the 4/2014 report period.

Hopefully somebody is following me here...To me, it seemed simple. All I have to do is group them by "year,month,acct" and then do some WHERE clauses to pick out the correct record from each group. But that hasn't panned out the way I had hoped. I can't even get to the point of implementing WHERE clauses because my GROUP BY result is not coming back as expected.

This is the closest I have been able to get, but it isn't doing the job:

Code:
SELECT test.* FROM (
	SELECT status_specific.* FROM (
		SELECT date_specific.* FROM (
			SELECT * FROM reports
			GROUP BY year,month,acct #shouldn't this be grouping?
		) date_specific
		ORDER BY date_specific.routing_status DESC, date_specific.acct ASC
	) status_specific
	GROUP BY status_specific.acct
) test ORDER BY test.acct ASC
That gets me some weird half and half mix of 5/2014 and 4/2014 reports (which are the only two reporting periods in my test set currently). It correctly gives me only one row per account, but it fails to give me one row, per account, PER PERIOD, and to make sure it's either a routed report or else the newest report version.

Anyway, it's probably obvious that I'm pretty new to SQL syntax. I have been trying this with the GROUP BY expression. I'm starting to think GROUP BY isn't what I need. I really just can't get them to group into year/month/acct sets but perhaps I am fundamentally misunderstanding the abilities/purpose of the GROUP BY expression.

I can't seem to wrap my brain around how to do this. Can anyone lend a hand?