This is a problem on HackerRank that I spent a good deal of time trying to solve. I decided to write up my process for solving the problem in the interest of reinforcing what I’ve learned.
The problem description states that you’re to determine whether an expenditure is fraudulent activity by comparing a day’s spending to the median of a previous number of days spending.
You’re provided with an array expenditures
where expenditures[i]
represents the amount of money spent on day i
and an integer d
representing the number of days to look back in the expenditures array to determine the median spending.
Essentially, the goal is to use a sliding window of size d
in the expenditures
array to determine a number, n = median x 2
and if expenditures[i]
is greater than or equal to this number, classify it as fraudulent. Return an integer representing all the days in the array that could be classified as fraudulent.
My first solution was a bit of a brute force approach. I used a sliding window to capture the previous d
days of expenditures, sorted the days and determined the median. This was fine for test cases 0, 6 and 7 but timed out for test cases 1, 2, 3, 4 and 5.
First Solution
def activityNotifications(expenditure, d):
total = 0
for i in range(d, len(expenditure)):
days = expenditure[i-d:i]
arr = sorted(days)
if d%2 != 0:
median = arr[(d+1)//2 - 1]
else:
median = (arr[(d)//2 - 1] + arr[(d+1)//2])/2
if expenditure[i] >= 2*median:
total +=1
return total
While technically correct, this solution times out on HackerRanks larger test inputs. I attribute most of the time complexity to the sorted()
function which has a worst case performance of O( n log n)
. So what to do?
After reading around a bit in the discussions, I noticed someone had mentioned counting-sort. I was not previously familiar with this technique, but after some reading I learned that it’s an effective sorting algorithm in cases where the variation in integer keys is not significantly greater than the number of items being sorted. Perfect for our use case where 1 ≤ d ≤ n
and 0 ≤ expenditure[i] ≤ 200
. So how do we use it?
We build a frequency array for the range 0 through 201 and use it to store the frequency of each expenditure for the previous d
days. The frequency array can be traversed until the sum of frequency[i]
is greater than or equal to d
.
In the following solution, I use counts
to track these frequencies and I’ve written a separate function get_median
to return the median.
Second Solution
# Complete the activityNotifications function below.
def activityNotifications(expenditure, d):
total = 0
counts = [0] * 201
for day in range(d):
counts[expenditure[day]] += 1
if d%2 == 0: #even
odd = False
else: #odd
odd = True
oldest_day = 0
for i in range(d, len(expenditure)):
median = get_median(counts, odd, d)
if expenditure[i] >= 2*median:
total += 1
counts[expenditure[oldest_day]] -= 1
counts[expenditure[i]] += 1
oldest_day += 1
return total
def get_median(counts, odd, d):
temp = 0
left, right = -1, -1
for i, v in enumerate(counts):
temp += v
if odd:
if temp >= ((d//2)+1):
return i
else:
if temp >= (d//2):
left = i
if temp > (d//2) and left != -1:
right = i
return (left + right) // 2
if temp > (d//2) and left == -1:
return i
This solution passes test cases 0, 2, 5, 6 and 7 but fails 1, 3 and 4. I’m unable to determine exactly why the failure at this time, but I plan to revisit this to determine the cause.
If you find any errors in my code, let me know. I’d love to find a solution that passes all test cases!
Here are some other interesting solutions to this problem, not necessarily in Python:
- https://fizzbuzzer.com/fraudulent-activity-notifications-challenge/
- https://codereview.stackexchange.com/questions/230294/timeout-error-in-fraudulent-activity-notification-hackerrank
- https://www.youtube.com/watch?v=46V6tnxy_Vs
Check the test case when there is an even number of days.
I was able to fix your solution as follows:
def activityNotifications(expenditure, d):
total = 0
counts = [0] * 201
for day in range(d):
counts[expenditure[day]] += 1
if d%2 == 0: #even
odd = False
else: #odd
odd = True
oldest_day = 0
for i in range(d, len(expenditure)):
median = get_median(counts, odd, d)
print(median)
if expenditure[i] >= 2*median:
total += 1
counts[expenditure[oldest_day]] -= 1
counts[expenditure[i]] += 1
oldest_day += 1
return total
def get_median(counts, odd, d):
temp = 0
left, right = -1, -1
if odd:
for i, v in enumerate(counts):
temp += v
if temp > ((d//2)):
return i
else:
for i, v in enumerate(counts):
temp += v
if temp == (d//2):
left = i
right = i+1
return (left+right) / 2
if temp > (d//2):
return i