Implementing and comparing Bag Of Words and TF IDF to build a model to detect fake news

Photo by Matthew Guay on Unsplash

The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality.

The following program help in identifying such news articles programmatically if a news article is Fake or Not. Let us first understand the two feature extraction technique I have used to build the model—

  • Bag of…


A Beginner’s Guide To Implement Natural Language Processing Using NLTK & TensorFlow

Photo by NOAA on Unsplash

Twitter has become an important communication channel in times of emergency. The iniquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster. The following program helps in identifying a tweet programmatically if a tweet conveys disaster info or not.

Before we build the model it is important for us to understand few concepts in NLP (Natural Language Processing)

  • Applying Regular Expressions
    Before we begin ‘text’ processing…


Implemented CNN neural network on Fruit 360 Dataset available on Kaggle

Photo by ja ma on Unsplash

We often face a situation while trying to improve the accuracy of the neural network we end up overfitting the model on the training data. This leads to a poor prediction when we run the model of the test data. Hence I take a dataset and apply these techniques that not only improve the accuracy but also handles the overfitting issues.

In this article, we’ll use the following techniques to train a state-of-the-art model in less than 5 minutes to achieve over 95% accuracy in classifying images from the Fruit 360 dataset :

  1. Data Augmentation
    Data augmentation in data analysis are…


Dataset — CIFAR 10

Photo by Matthew Henry from Burst

While we develop the Convolutional Neural Networks (CNN) to classify the images, It is often observed the model starts overfitting when we try to improve the accuracy. Very frustrating, Hence I list down the following techniques which would improve the model performance without overfitting the model on the training data.

  1. Data normalization
    We normalized the image tensors by subtracting the mean and dividing by the standard deviation of pixels across each channel. Normalizing the data prevents the pixel values from any one channel from disproportionately affecting the losses and gradients. Learn more
  2. Data augmentation
    We applied random transformations while loading images…


Dataset — CIFAR 10 (acc > 75%)

Photo by JJ Ying on Unsplash

In my previous blog, I developed a feed-forward neural network to train on CIFAR 10 dataset. As a feed-forward neural network not being powerful on image dataset. We achieved an accuracy of 50%. I will build a CNN model from scratch and validate its performance on CIFAR 10 dataset. But before we get started I will try answering few fundamental questions.

  1. What is CNN ?
    A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from…


Dataset — CIFAR 10

If you are someone who wanted to get started with FFNN (feed forward neural networks)but not quite sure which dataset to pick to begin with, then you are at the right place. We see Neural network implementations in classical machine learning to deep neural networks. Today, neural networks are used for solving many business problems such as sales forecasting, customer research, data validation, and risk management, Let us start by asking couple of fundamental questions —

  1. What is a FFNN ?
    A feedforward neural network is an artificial neural network wherein connections between the nodes do not form a cycle. As…


Build On Dataset — Wheat Seed Species Prediction

Photo by Evi Radauscher on Unsplash

If you are someone who wanted to get started with PyTorch but not quite sure which dataset to pick to begin with, then you are at the right place. We see PyTorch implementations in classical machine learning to deep neural networks. I can’t wait to get started but before we get started, let us start by answering a couple of fundamental questions—

  1. What is PyTorch ?
    PyTorch is an open-source, community-driven deep learning framework developed by Facebook’s artificial intelligence research group. …


Photo by Tim Mossholder on Unsplash

If you are getting started with machine learning and looking for dataset to work with to test you skills and understanding then you are at right place. The Swedish auto insurance dataset is ideal for beginners as the volume of data is low (just 63 records) and you don’t have to do minimal feature engineering to understand its relation with the labels (or the final output).

Table to Contents

  1. Introduction
  2. Loading the data
  3. Feature Analysis
  4. Data cleaning
  5. Applying Train, Test and Split
  6. Training on ML model
  7. Cross validation to select best ML model
  8. Model performance

Introduction

The Swedish Auto Insurance Dataset involves predicting the…


COVID 19 Dataset — Using matplotlib, seaborn and plotly

Photo by Fusion Medical Animation on Unsplash

About Dataset

The below analysis is performed on Covid 19 Dataset which is freely available on GitHub. This is a near real time dataset which gets updated frequently on daily basis.

In this dataset I have tried performing analysis on various features to understand the spread of virus across various geographies and how the induvial countries have been impacted economically.

Table of content

  1. Data Preparation and Cleaning
  2. Preparing and Cleaning Raw Covid Data
  3. Exploratory Analysis and Visualization
    - GDP trend of affected countries
    - Medical infrastructure
    - Human Factor
  4. Asking and Answering Questions
    - Q1- Which countries have highest no of cases and deaths per million…


Photo by Anna Pelzer on Unsplash

I really get fascinated by good quality food being served in the restaurants and would like to help community find the best cuisines around their area.

About Dataset

Zomato API Analysis is one of the most useful analysis for foodies who want to taste the best cuisines of every part of the world which lies in their budget. This analysis is also for those who want to find the value for money restaurants in various parts of the country for the cuisines. …

Hargurjeet

Data Science Practitioner | Machine Learning | Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store