Fraudulent Activity Notifications – HackerRank Python Solution

This is a problem on HackerRank that I spent a good deal of time trying to solve. I decided to write up my process for solving the problem in the interest of reinforcing what I’ve learned.

The problem description states that you’re to determine whether an expenditure is fraudulent activity by comparing a day’s spending to the median of a previous number of days spending.

You’re provided with an array expenditures where expenditures[i] represents the amount of money spent on day i and an integer d representing the number of days to look back in the expenditures array to determine the median spending.

Essentially, the goal is to use a sliding window of size d in the expenditures array to determine a number, n = median x 2 and if expenditures[i] is greater than or equal to this number, classify it as fraudulent. Return an integer representing all the days in the array that could be classified as fraudulent.

My first solution was a bit of a brute force approach. I used a sliding window to capture the previous d days of expenditures, sorted the days and determined the median. This was fine for test cases 0, 6 and 7 but timed out for test cases 1, 2, 3, 4 and 5.

First Solution

def activityNotifications(expenditure, d):
    total = 0
    for i in range(d, len(expenditure)):
        days = expenditure[i-d:i]
        arr = sorted(days)
        if d%2 != 0:
            median = arr[(d+1)//2 - 1]
            median = (arr[(d)//2 - 1] + arr[(d+1)//2])/2
        if expenditure[i] >= 2*median:
            total +=1
    return total

While technically correct, this solution times out on HackerRanks larger test inputs. I attribute most of the time complexity to the sorted() function which has a worst case performance of O( n log n). So what to do?

After reading around a bit in the discussions, I noticed someone had mentioned counting-sort. I was not previously familiar with this technique, but after some reading I learned that it’s an effective sorting algorithm in cases where the variation in integer keys is not significantly greater than the number of items being sorted. Perfect for our use case where 1 ≤ d ≤ n and 0 ≤ expenditure[i] ≤ 200. So how do we use it?

We build a frequency array for the range 0 through 201 and use it to store the frequency of each expenditure for the previous d days. The frequency array can be traversed until the sum of frequency[i] is greater than or equal to d.

In the following solution, I use counts to track these frequencies and I’ve written a separate function get_median to return the median.

Second Solution

# Complete the activityNotifications function below.
def activityNotifications(expenditure, d):
    total = 0
    counts = [0] * 201

    for day in range(d):
        counts[expenditure[day]] += 1

    if d%2 == 0: #even
        odd = False
    else: #odd
        odd = True

    oldest_day = 0
    for i in range(d, len(expenditure)):
        median = get_median(counts, odd, d)
        if expenditure[i] >= 2*median:
            total += 1
        counts[expenditure[oldest_day]] -= 1
        counts[expenditure[i]] += 1
        oldest_day += 1
    return total

def get_median(counts, odd, d):
    temp = 0
    left, right = -1, -1
    for i, v in enumerate(counts):
        temp += v
        if odd:
            if temp >= ((d//2)+1):
                return i
            if temp >= (d//2):
                left = i
            if temp > (d//2) and left != -1:
                right = i
                return (left + right) // 2
            if temp > (d//2) and left == -1:
                return i

This solution passes test cases 0, 2, 5, 6 and 7 but fails 1, 3 and 4. I’m unable to determine exactly why the failure at this time, but I plan to revisit this to determine the cause.

If you find any errors in my code, let me know. I’d love to find a solution that passes all test cases!

Here are some other interesting solutions to this problem, not necessarily in Python:


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.