AWS | Community | Extracting Text from documents and Entity recognition | S02 E01

Screenshot of the code generated by Amazon CodeWhisperer

Today, Brian and Darko continue their adventure in document sorting. Yes, document sorting. Imagine this, you have hundreds - NO Thousands of documents littered around in various storage systems and devices. What can you make of them? Do you know where is that specific note you got from school? Do you have that invoice that is badly needed? Well, what if there was a way to extract that, and comprehend what is in each of those docs? And maybe, just maybe store in a central place for you to access, and all that (and more) with the power of Machine Learning. THIS is what we worked on in this stream. We extracted text from PDFs using Textract, passed it on to Amazon Comprehend and worked out how to do all that in Python. (Thank you CodeWhisperer ❤️ ).

If you are interested in learning all this with us, check out the recording below 👇

Links from today's episode

🐦 Reach out to the hosts and guests:

Darko: https://twitter.com/darkosubotica
Brian: https://twitter.com/bketelsen

Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.

Site Terms, Privacy, and more.

Extracting Text from documents and Entity recognition | S02 E01 | Winging It

Brian and Darko use Machine Learning to sort documents

Links from today's episode

Comments