eDiscovery skills

If you're curious about eDiscovery, what skills should you learn?

Legal technology is one subset of Legal innovation.  I'd file eDiscovery under legaltech, though it may be a service.  eDiscovery is an aspect of most major litigation.  The basic goal is for defendants to 'produce' (turn over) electronic information in accordance with the courts' requirements, and for plaintiffs to find useful evidence within that data.  Recently the Duke Conference and the 2015 Amendments to the Federal Rules of Civil Procedure (FRCP, pay special attention to rule 16, 26, and 37e, which enforces) explicitly concern(ed) electronic evidence.  I have to say, linking to supremecourt.gov is always fun.

I asked my professor for 'homework' to do over Winter break, because I'm that sort of person.  Here is what he recommended for eDiscovery.

Learn Python

To generalize, Python is a high level (you write more abstract, and therefore less, code to execute any given task) which has a large community and is thought to be a good language to interact with data.

One benefit of Python - if you have a Mac, you already have it installed.  Search for "Terminal" in spotlight.

If you've never written a line of software code, I recommend starting with Zed Shaw's Learn Python the Hard Way (promise me you'll do the terminal exercises), but I've also seen people have great outcomes (ie. zero experience with software to six figure job) using Treehouse.

Read about Information Retrieval

Information Retrieval (IR for short) is the discipline of using technology to access information.  Google and search engines are obviously is the logical extension of IR.  It's a bit funny - the intro of the book mentioned below points out: "people preferred getting information from other people rather than from information retrieval systems."  My generation of course prefers to interact with a machine as it's more option-rich, usually quicker, and arguably more accurate for the bulk of retrieval activities, ie. getting facts or data. 

Here is the recommended book, Introduction to Information Retrieval by Manning, Raghavan, and Shutze of Stanford. (NB: pay attention to the pre-requisites).    

Take a look at Machine Learning

My understanding of Machine learning (ML) is that it's almost the "opposite" of programming. 

Programming requires a person to use specific language to tell a machine what to do - for example, in generic terms, a programmer could tell the machine how they would like something done (e.g., when I invoke the "double" function, multiply my input by 2), which would later provide an output on request (e.g. run "double" on 3: input = 3; double multiples input by 2; so calculate, 3*2; output is 6).  

Machine learning instead stipulates a set of inputs and outputs to the machine, and the programmer asks the computer to create a program that conforms. This is "training" a model and as far as I'm currently aware ML is backwards-looking - in simple terms, it's suitable when you want to understand cause and effect or more specifically, which inputs tend to correlate with which outputs (e.g., does a yellow background increase or decrease advertising clickthroughs?).

Learn on Coursera