3 Ways to Use External Databases
to Enhance Nuix
The Nuix Engine really is a Swiss Army Knife. It can be used as a stand-alone investigation platform by a single investigator, or it can be injected into a massively scaled and completely automated data processing pipeline. A very common activity is to enhance Nuix by using external databases – here are three of my favourite ways that people have done just that:
- Enhanced Reporting
- Global Deduplication
- Artifact Data Warehouse
1. Enhanced Reporting
People need to generate reports for many different reasons. Standard implementations of the Nuix Engine have struggled in this area – many of the data points wanted for reporting are hidden away, or are spread across multiple locations and indexes.
For example, its relatively straightforward to identify how many GB were indexed for a specific task. But what if someone needs to show how many GB were processed this month for a particular client? Manually finding individual cases and performing the steps is tedious, difficult and error prone. Now imagine doing that for every client the team has supported over that month.
Rampiva helps address this issue in two ways – first, all jobs processed through Rampiva Automate are automatically logged into a central database for operational reporting. This makes it very easy to understand everything that’s been done, who did it, and for what Case, Matter, and Client.
Rampiva also provides a utility that can scan an entire server, find Nuix indexes and logs and extract and upload metrics into a database for reporting can open up a new world of understanding a Nuix environment. This helps teams gather historical metrics for a baseline analysis, and compare “before” and “after.” We’ve seen great results from clients planning adoption milestones and forecasting ROI with this approach.
2. Global Deduplication
Now, deduplication was always going to be on this list, but discussing deduplication is one of those topics that can feel like falling into a rabbit hole with many twists and turns. Multiple approaches exist for handling cross case deduplication, such as using static digest lists, or relying on a Nuix Elastic backend to build very very large indexes, but those approaches come with challenges of their own.
A brilliant use of external databases is to manage a list of item hashes (unique document fingerprints) across multiple cases. When Nuix’s worker side scripting is incorporated, duplicates can be identified and discarded from case indexes during processing.
The advantage this brings is pretty obvious – if the files we are processing comprise 50% duplicates, then we will be able to process almost twice as fast (not quite, because there are a few other factors such as the time spent reading bytes from disk, but you get the picture). And we end up with a smaller population that we have to work with.
3. Artifact Data Warehouse
I’ve been lucky to have had interesting conversations in my career with several police agencies around the world, giving rise to this next enhancement. A common theme has been to leverage the large amounts of information held in separate and multiple Nuix cases for multiple investigations. Nuix has a fantastic ability to extract artifacts, such as credit cards or licence plates or people’s names automatically. Interestingly, this ability goes right back to the roots of when Nuix was first invented.
One European police agency is currently building a data warehouse solution, whereby artifacts are collated into a central database. Artifacts found in new investigations can be automatically flagged if they exist in the central data warehouse, leading investigators to new insights. The original case indexes can be discarded (some jurisdictions require the deletion of the original source evidence after some period of time has passed).