When we’re collecting forensic information, sometimes it’s not always clear who it belongs to. We might be trying to trace and identify e-mails from unknown senders or understand the source of a stream of message data. All too often the binary logic comparison operators provided by our standard tool sets fall short when we try to describe and understand the vagueness of the real world. So that’s where the fun begins – utilization of fuzzy logic typically offers more graceful alternatives.

You’ll be provided a randomized interlaced dataset comprised of user data drawn from the command history of an unknown number of UNIX computer users over a multi-year period. The data has been sanitized to remove identifying attributes such as user and file names as well as directory structures. Additionally, the identifiers #BOF# and #EOF# have been inserted into the dataset to designate the beginning and end of each shell session. The sessions have been concatenated into a single stream by date order but no timestamps are provided. The output of your program should predict within a degree of certainty how many users comprise the dataset.