Extracting Text from documents and Entity recognition | S02 E01 | Winging It

Brian and Darko use Machine Learning to sort documents

Published Aug 16, 2023
Screenshot of the code generated by Amazon CodeWhisperer

Today, Brian and Darko continue their adventure in document sorting. Yes, document sorting. Imagine this, you have hundreds - NO Thousands of documents littered around in various storage systems and devices. What can you make of them? Do you know where is that specific note you got from school? Do you have that invoice that is badly needed? Well, what if there was a way to extract that, and comprehend what is in each of those docs? And maybe, just maybe store in a central place for you to access, and all that (and more) with the power of Machine Learning. THIS is what we worked on in this stream. We extracted text from PDFs using Textract, passed it on to Amazon Comprehend and worked out how to do all that in Python. (Thank you CodeWhisperer ❤️ ).

