Back to blog

eDiscovery at LegalTech – Is data the new oil?

Feb 3, 2017 by Alan Byrne

For me, the second and third days of legal tech were all about data, artificial intelligence and automation.  I’ve touched on this topic before, and on how the future of technology and business depend on accurate interpretation and analysis of data.The legal profession is no different – particularly as most of legal practices and processes are founded on the treatment of evidence – which, of course, is in itself data.

If law-based TV shows have taught me anything, it’s that any single piece of data may make or break a case. This means it’s important that every document is analysed and scrutinised to determine if it is relevant to winning or defending a case. I’ve discovered recently that lawyers aren’t cheap – so having them trawl through all this data during a legal case is a process which can cost an organisation a significant amount of money – which obviously increases based on time, complications, and all kinds of other factors. On the flip side, if important documents or data are missed out, it can cost even more money in settling litigation.

So how do organisations search through these ever-increasing data sets to identify which documents are important and which are not?

I mentioned in my first Legal Tech blog this week that more and more eDiscovery solutions are only able to accommodate this amount of data when backed by cloud computing systems – and these systems are able to unleash an entirely new class of automation through machine learning and artificial intelligence (AI).

The best eDiscovery solutions I came across at the Legal Tech conference this week all leveraged machine learning and AI in some way, to reduce the work that was required to be done by humans, and to save money on costs incurred by in-house or external counsel when reviewing all this information.

These tools usually work in a similar way. Firstly, you enter the keywords you are looking for using the eDiscovery system – for example “Enron”.  It then goes and searches all the available data, usually using a powerful cloud back end, finding emails, documents, instant messages and whatever else that contains that keyword.  A machine learning system may now be invoked and detect that many documents that included the keyword “Enron” may also regularly include other keywords such as “Power station”, and therefore the system is able to create this association, and bring these documents into the case as well.

You now end up with a huge amount of data returned to you – some of which is relevant and most of which is not. Rather than give everything to the legal team, you can take advantage of another piece of machine learning to try and weed out these irrelevant data sets.

To get the best results, you first need to train this system with a small subset of this data of maybe 50-100 items.  Each piece of the subset is presented in a preview window and you are asked if each one is relevant to this case or not.  This is a standard way to train a machine learning model and once trained it has a pretty good idea how to automatically decide whether a piece of data is relevant or not.  Cogmotive will eventually be utilising a similar machine learning and AI methodology in their Discover & Audit product to determine if a user’s activity is normal or suspicious.

Now that cloud computing provides the power to search through huge swathes of data in a short period and machine learning reduces the work that humans need to do, eDiscovery cases are not only more comprehensive, but also have the potential to be cheaper than ever before.

The keynote on the third day raised a very great point around this that resonated with me.  One of the panellists, Zev J. Eigen, mentioned a quote he heard about data being the new oil. Oil powered the industrial revolution, and data will power the information revolution. The companies with the best access to oil were most profitable in the first half of the 20th century, whereas companies with the best access to data will own this half of the 21st century.

Eigen also mentioned that most companies have huge reserves of data trapped in parts of their organisations that are not currently accessible. This can be on paper in archive boxes in basements, in the memories of employees that have worked in the industry for a long time or in legacy file systems and database formats like PST files and mainframe systems. In order for organisations and eDiscovery tools to harness this legacy, they need it to be brought into a modern system that can be analysed and mined by these new tools.

All in all, I had a great time at the Legal Tech conference and it has been fascinating to see how technology, cloud computing and innovation are affecting an industry that I’m otherwise not all that familiar with.