Hello, all this is episode 2 of “All About Classification Model’s Accuracy “. Please visit episode 1 before going forward.

Episode 2

7. What is Specificity?

8. What is False Positive Rate?

9. What is False Negative Rate?

10. What is Type — I Error?

11. What is Type — II Error?

11. What is ROC Curve?

12. What is AUC Curve?

What is Specificity?

Among total negative predicted cases how many are negative correct.


Specificity = True Negatives / (True Negatives + False Positives)

What is False Positive Rate?

Among total negative predicted cases how many of them are positive incorrect.


False Positive Rate = 1-Specificity Or



Photo by Emily Morter on Unsplash

Hello all, we are going to look at the most confusing concepts related to classification problems in the data science field called the “Confusion Matrix”.

Below is the index of this article.

Episode 1

1. What is a confusion matrix?

2. What is the accuracy score?

3. Why we need Precession and Recall if we already have Confusion Matrix?

4. What is Precision?

5. What is Recall or Sensitivity or True Positive Rate?

6. What is the F1 Score or Dice similarity coefficient (DSC)?

Episode 2

7. What is Specificity?

8. What is False Positive Rate?

9. What is False Negative Rate?

10. What…

Photo by Isaac Smith on Unsplash

Random : made, done, happening, or chosen without method or conscious decision.

What is use of random numbers?

Ans : To select winners, data shuffling and music shuffle etc. are some of the use cases of random numbers.

So here we are using Python and Numpy to generate different random numbers as per our requirement. Please find methods to generate random numbers provided by numpy library.

Scene 1 : Generate single random number

Scene 2 : Generate 1D random numbers

Scene 3 : Generate 2D random numbers


Part 1 : Generate integer numbers

Part 2 : Generate float numbers


Part 1 : Generate 1D integer numbers

Part 2 : Generate 1D float numbers


Part 1 : Generate 2D integer numbers

Part 2…

Photo by Luke Chesser on Unsplash

We often get confused while reading a book which does not have index in it.

This article or document will be so beneficiary for all beginners and those are in the field of data. The problem with me while I was learning data science was I started from somewhere and keep going but later found that, I need an index that will keep me in the direction of effective and productive learning. Thats why if you want to enter into field of data my advise is to start with very basics and get complete knowledge about data analysis first.


Photo by Dmitry Ratushny on Unsplash

There are different input files we have to work on in NLP based applications. So what about the identification of documents language. We here worked on .png, .jpg, .jpeg, .tif, .htm, .html, .doc, .docx, .pdf, .txt and .msg input documents.

How to identify the language of the input document?

Approach :

As we already know that in every language we have some common set of words that we used in a particular language. So we take that set of common word set we called it stopwords, and extract the common words from an input source. Then compare from which set of language it belongs max.


Step 1: Read input…

Photo by Austin Distel on Unsplash

There is one problem with the pdf extraction method. When we are using normal libraries like PyPDF2, Pdfplumber, pdftotext, etc. to extract text from PDF documents which contain scan images in it, we get an error.

Hey guys this part is a very basic and important part. Before performing any action on the dataset we should know some rules of indexing and how indexing performs in Python.

Here we are using the Anaconda tool to perform some action on dataset which is a .csv file. By using data slicing we can perform data operations on limited data from datasets.

Import library to load a dataset


When we are working with .csv file we always use pandas library to import the dataset into our Spyder.

Import dataset into Spyder

Hello, guys did you hear, see the above terminology data preprocessing? Those who want to be Data Scientist or Data wranglers must be aware of this terminology.

This is the most important part of Machine Learning, Artificial Intelligence, Data Science, Computer Vision, Deep learning, etc…

Data preprocessing :

This is the first step towards any Machine learning algorithms or any algorithms. Because algorithms accept some properly formatted data we need to provide data in the same format as it required.

For example,

Suppose we are filling online scholarship form and there are various fields among that is upload a photo(size < 2 Mb)…

Kishan Tongrao

Data Science Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store