We are looking for a skilled Python developer to assist in the development of an intelligent resume parser for our ATS.
Key requirements for the resume parser include:
- Accurate and efficient parsing of a predetermined set of structured fields based on a resume's main sections and sub-sections
- Extraction of clean, dis-aggregated, and normalized data for each field, e.g. 'Bachelor of Arts in Economics' --> { 'degree': 'Bachelor of Arts', 'major': 'Economics'}
- Ability to automatically handle documents of multiple formats, including doc, docx and PDF. This includes both text- and image-based documents (using OCR), as well as multi-columned documents
- Output of parsed resume data in a standardized JSON format
- Ability to programatically test the accuracy of the parser with an existing sample resume dataset for continuous improvement
- Ability to programatically train the parser with a growing sample resume data set to continually increase its level of accuracy
Desirable skills and qualifications for the task include:
- Excellent command of the Python programming language
- Solid understanding of and practical experience with document parsing / data extraction
- Experience with natural language processing (NLP) and relevant Python libraries such as NLTK and / or Spacy
Work on the parser has already begun with the current version of the parser able to:
- Load and read a variety of document types with Python's Textract library
- Break the resume into sections based on a data dictionary of common section headings
- Extract basic fields such as name, email, phone number, and skills
Current extraction of entities such as skills are determined by a keyword search method based on a local database. However this method is both time- and resource-intensive which is why a greater emphasis on machine learning with an NLP will be necessary going forward.
Keywords: Python, resume parser, Textract, PDFMiner, OCR, natural language processing, NLP, Spacy, NLTK, named entity recognition, NER, data extraction
About the recuiterMember since Sep 5, 2017 Cooper
from California, United States