OCR is very good nowadays, so that's not so big a problem - and for big lawsuits (eg patent fights) there are literally hundreds of lawyers hired to comb through the vast array of documents deciding whether they're responsive or not.
The pre-processing is largely automated but there's certainly a portion (maybe 5%?) of documents that need to be hand-classified in the database before they get to the lawyers. It an interesting field - lots of money to be made, intense competition for it, relatively simple technology requirements but a legal industry which has been resistant to technology for quite a long time. Autonomy seems to be the leading software company in this space.
The pre-processing is largely automated but there's certainly a portion (maybe 5%?) of documents that need to be hand-classified in the database before they get to the lawyers. It an interesting field - lots of money to be made, intense competition for it, relatively simple technology requirements but a legal industry which has been resistant to technology for quite a long time. Autonomy seems to be the leading software company in this space.