Q.1
Which is based on tagging and is statistically based as opposed to rule based?
  • NLTK
  • spaCy
Q.2
From the sentence “Fintech Online Course”, how many bigrams can be created?
  • 1
  • 2
  • 3
  • 4
  • 5
Q.3
A vader compound score of 1.evaluates to
  • positive sentiment
  • neutral sentiment
  • negative sentiment
  • none of the above
Q.4
Why we use named entity recognition in NLP?
  • Classify entities into predefined labels
  • Creating a set of vocabularies
  • Breaking sentences into words
  • None
Q.5
How do we get from NLP text analysis to stock price correlation?
  • Transform some NLP results into features.
  • Convert parts of speech to categorical variables
  • Recognize some named entities
  • VADER it
Q.6
Which are included in named entity recognition?
  • Time and dates
  • Nouns
  • Currency
  • All of above
Q.7
What does spaCy tagging do?
  • Identifies more frequent words
  • Identifies importance and relevance
  • Identifies word order relationships
  • Identifies parts of speech
Q.8
Between NLTK and spaCy, which is faster and better for larger datasets?
  • NLTK
  • spaCy
Q.9
Between NLTK and spaCy, which is based on tagging and is statistically based as opposed to rule based?
  • NLTK
  • spaCy
Q.10
Which is the main Python package we use for NLP?
  • Scikit-Learn
  • NLTK
  • NLP-LIB
  • PyNLP
Q.11
Which is the process of turning different morphologies (i.e. versions) of a word into its base form?
  • Lemmatization
  • Tokenization
  • Ngrams
  • Stopwords
  • Corpus
Q.12
Which step is the process of breaking down documents into smaller units of analysis?
  • Lemmatization
  • Tokenization
  • Ngrams
  • Stopwords
  • Corpus
Q.13
Which are multiple word sequences?
  • Lemmatization
  • Tokenization
  • Ngrams
  • Stopwords
  • Corpus
Q.14
Which are common words usually removed in an NLP analysis?
  • Lemmatization
  • Tokenization
  • Ngrams
  • Stopwords
  • Corpus
Q.15
Which is a collection of documents?
  • Lemmatization
  • Tokenization
  • Ngrams
  • Stopwords
  • Corpus
Q.16
Which is a high term frequency and low document frequency?
  • A high weight in TF-IDF
  • A low weight in TF-IDF
  • A bag of words
  • A corpus
Q.17
Which company's tone analyzer service did we discuss?
  • Amazon
  • Apple
  • Google
  • IBM
Q.18
Which is the most useful metric from VADER for sentiment analysis?
  • Positivity
  • Compound
  • Negative
  • Intensity
Q.19
Which function would you use to implement a bag of words by creating a matrix of token counts?
  • CountVectorizer()
  • fit_tranform()
  • get_feature_names()
  • download()
Q.20
Which function would you use to retrieve the list of unique words?
  • CountVectorizer()
  • fit_tranform()
  • get_feature_names()
  • download()
0 h : 0 m : 1 s