Presenting at ICDAR 2019 Venue University of Technology Sydney, Australia Date September 2019


NLP techniques to build
an Automatic Question
Answering System

Have you ever wondered how machines understand Natural Language? How “Google Translate” works? How “Siri”, a robotic voice, responds to your voice commands? Or how a piece of software understands a text document and does automatic summarization or extract relevant sentences? The answer to all these questions lie in this workshop where we explore the astounding domain of Natural Language Processing (NLP). We will unveil the very concepts of NLP with the help of your notion of how you understand the natural human language. With tons of text data being produced every day and with immense amount of efforts carried out to build a voice-controlled world, the domain of Natural Language Processing is gaining an exponential attention with simultaneous advances in Machine Learning and Deep Learning.

View Slides


Combination of theory
and practical examples

The overall session will be a combination of some basic theory and practical examples about the building blocks of the NLP followed by demo/hands on session for real world application.
Total duration of the tutorial will be 3 hours.
  • What is Machine Learning?
  • What is Natural Language Processing?
  • Brief insights on how natural language is studied.
  • Why Machine Learning over rule based methods?
  • Understanding the NLP Pipeline!
  • Preprocessing
       ➢ Regular Expression
       ➢ Paragraph Detection
       ➢ Sentence Boundary Detection
       ➢ Sentence to Words

    Feature Engineering for ML techniques
    Words: Meanings, Synonyms, Antonyms, Part Of Speech (Verb, Adverb, Cardinal) etc.
    Named Entity Relation, Dependency Parsing, Coreference Resolution etc.
    Word Normalization: Lemmatization and Stemming
    Keyword recognition
    WordNet, Synsets, Stanford Core NLP Parser, spaCy, NLTK

    Vector representation and Word Embeddings
    ■ Why vector or embedding is required?
    ■ Bag of Words, n-gram Model
    ■ Skip - gram model
    ■ Count Vectorizer
    ■ Term Frequency - Inverse Document Frequency Vectorizer
    ■ Hashing Vectorizer
    ■ Automatic Feature selection and vector representation in DL techniques

    How Scikit learn (sklearn) library comes to the rescue!

    Data and ML cookstart!!

    ■ Corpus
    ■ Training, Testing and Validation Phase
    ■ K-fold Cross Validation

    Training the Machine Learning Model
    ■ Evaluation Metrics

    How to make your model better and improve the performance?
    ■ Error Analysis
  • Demo and code walk through
  • Build Automatic Question Answering (AQuA) system which can be a quick document-analyzer providing relevant answers to related question from the document.

    For two simple examples:
    If we pass on a document of Roger Federer, the system when asked about his last Wimbledon Championship Title will answer “2017, against Marin Cilic.”
    If we pass on a Wiki-document of Google, the system when asked about the founder(s) will answer “Larry Page and Sergey Brin”

    AQuA can be trained and applied across range of domains and with diverse applications and can save immense amount of time of reading a large-content document.
  • Question and Answers (15 mins)


Abhishek Parikh Tutorial Presenter
Dhara Kotecha Tutorial Co-Presenter
Nisarg Vyas Tutorial Advisor
Vinish Lonhare UI Designer
Akash Shah UI Developer