public:gsoc:poormantextract

Poor Man's Textract

Introduction Amazon Textract a (paid) service that “automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.”. We want to build a free alternative that provides an output of similar quality.

Your job Improve upon the existing PMT project: https://github.com/kenAlparslan/Texttract

Previous (GCi) tasks that did something (albeit simpler) similar:

Musab Kılıç's exam analyzer
RobOHt's exam analyzer
knightron0's exam analyzer

Qualification tasks
Take a look at this page.

  • public/gsoc/poormantextract.txt
  • Last modified: 2021/03/14 21:07
  • (external edit)