This is a problem on HackerRank that I spent a good deal of time trying to solve. I decided to write up my process for solving the problem in the interest of reinforcing what I’ve learned.
The problem description states that you’re to determine whether an expenditure is fraudulent activity by comparing a day’s spending to the median of a previous number of days spending.
You’re provided with an array
expenditures[i] represents the amount of money spent on day
i and an integer
d representing the number of days to look back in the expenditures array to determine the median spending.
Essentially, the goal is to use a sliding window of size
d in the
expenditures array to determine a number,
n = median x 2 and if
expenditures[i] is greater than or equal to this number, classify it as fraudulent. Return an integer representing all the days in the array that could be classified as fraudulent.
My first solution was a bit of a brute force approach. I used a sliding window to capture the previous
d days of expenditures, sorted the days and determined the median. This was fine for test cases 0, 6 and 7 but timed out for test cases 1, 2, 3, 4 and 5.
def activityNotifications(expenditure, d): total = 0 for i in range(d, len(expenditure)): days = expenditure[i-d:i] arr = sorted(days) if d%2 != 0: median = arr[(d+1)//2 - 1] else: median = (arr[(d)//2 - 1] + arr[(d+1)//2])/2 if expenditure[i] >= 2*median: total +=1 return total
While technically correct, this solution times out on HackerRanks larger test inputs. I attribute most of the time complexity to the
sorted() function which has a worst case performance of
O( n log n). So what to do?
After reading around a bit in the discussions, I noticed someone had mentioned counting-sort. I was not previously familiar with this technique, but after some reading I learned that it’s an effective sorting algorithm in cases where the variation in integer keys is not significantly greater than the number of items being sorted. Perfect for our use case where
1 ≤ d ≤ n and
0 ≤ expenditure[i] ≤ 200. So how do we use it?
We build a frequency array for the range 0 through 201 and use it to store the frequency of each expenditure for the previous
d days. The frequency array can be traversed until the sum of
frequency[i] is greater than or equal to
In the following solution, I use
counts to track these frequencies and I’ve written a separate function
get_median to return the median.
# Complete the activityNotifications function below. def activityNotifications(expenditure, d): total = 0 counts =  * 201 for day in range(d): counts[expenditure[day]] += 1 if d%2 == 0: #even odd = False else: #odd odd = True oldest_day = 0 for i in range(d, len(expenditure)): median = get_median(counts, odd, d) if expenditure[i] >= 2*median: total += 1 counts[expenditure[oldest_day]] -= 1 counts[expenditure[i]] += 1 oldest_day += 1 return total def get_median(counts, odd, d): temp = 0 left, right = -1, -1 for i, v in enumerate(counts): temp += v if odd: if temp >= ((d//2)+1): return i else: if temp >= (d//2): left = i if temp > (d//2) and left != -1: right = i return (left + right) // 2 if temp > (d//2) and left == -1: return i
This solution passes test cases 0, 2, 5, 6 and 7 but fails 1, 3 and 4. I’m unable to determine exactly why the failure at this time, but I plan to revisit this to determine the cause.
If you find any errors in my code, let me know. I’d love to find a solution that passes all test cases!
Here are some other interesting solutions to this problem, not necessarily in Python: