public:gsoc:poormantextract

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
public:gsoc:poormantextract [2020/02/29 17:42]
thealphadollar styling
public:gsoc:poormantextract [2021/03/14 21:07] (current)
Line 1: Line 1:
 +====== Poor Man's Textract ======
 +
 +**Introduction**
 +Amazon Textract a (paid) service that “automatically extracts text and data from scanned documents. Amazon Textract goes beyond simple optical character recognition (OCR) to also identify the contents of fields in forms and information stored in tables.”. We want to build a free alternative that provides an output of similar quality. 
 +
 +**Your job**
 +Improve upon the existing PMT project: https://github.com/kenAlparslan/Texttract
 +
 +
 +Previous (GCi) tasks that did something (albeit simpler) similar:
 +
 +[[https://github.com/musabkilic/ExamAnalyzer|Musab Kılıç's exam analyzer]]\\
 +[[https://https://github.com/RobOHt/AutomataQP-Decomposer|RobOHt's exam analyzer]]\\
 +[[https://github.com/knightron0/exam-analyzer|knightron0's exam analyzer]]
 +
 +**Qualification tasks**\\
 +Take a look at [[https://ccextractor.org/public:gsoc:takehome|this page]].
  
  • public/gsoc/poormantextract.txt
  • Last modified: 2021/03/14 21:07
  • (external edit)