Category: Big Data

Top Ten Places where AI and Machine Learning make our Life Easier


AI and Machine
Creativity to make our surrounding automatic is our one and only aim left. Day by Day AI and Machine Learning automating more and more parts of our life.

We all have heard about AI thanks to movies for its introduction, but what about Machine Learning/ML. ML is the buzzword for most of us. Basically, ML makes computer to learn.

ml nutshellIn a nut shell, ML is similar to our very first learning part of our childhood. We have a book containing a lot of pictures of fruits, animals, vegetables, and trees. These are teaching data set for any child. That data will be used to answer a question.For example, a picture is given to a child and he/she has to identify that pictures based on pictures saved in his/her mind. It is what the ML. ML continues to update its teaching data set based on correctly or incorrectly                                       credits:
identification of things and get smarter and intelligent at completing its tasks over time. If you have used Google, Netflix, Amazon, Gmail, then you have interacted with machine learning (ML).

  1. Recommendations
    RecommendationsI am sure about recommendation type of thing if we use services like YouTube, Amazon or Netflix. Every click being monitored and recorded. Driven by Intelligent machine learning, these sites analyze our activity and compare it to the millions of other users to “recommend” or “suggest” other similar videos, products or films that we might like.
  2. Online Search
    Online Search AI is transforming Google and other search engine results by watching our response to result display. We click the results show on the very first page and we are done because we found what we are looking for. If not, then we go to the second page or refine our query at this point we assume that search engine didn’t understand what we want, so it learns its mistake and shows the better result in the near future.
  3. In Hospitals
    hospitals                                                                           credits:
    Due to its nature of analyzing vast amounts of data, ML takes the first place to process information and spot more pattern like cancer or eye diseases than a human can by several orders of magnitude.Computer-aided diagnosis (CAD) can help radiologists find early-stage breast cancers that might otherwise be missed, and it can identify 52% of these missed cancers roughly a year before they were actually detected. Zebra Medical Systems is an Israeli company that applies advanced machine learning techniques to the field of radiology. It has amassed a huge training set of medical images along with categorization technology that will allow computers to predict multiple diseases with better-than-human accuracy. In 2016, the company unveiled two new software algorithms to help predict, and even prevent, cardiovascular events such as heart attacks.
  4. Data Securityspam
    According to Kaspersky, between January and September, 2016 ransomware attacks on business increased from once every 2 minutes to once every 40 seconds. Symantec also reported high levels of ransomware attacks, over 50,000 in March 2016 alone. A report by Osterman Research indicates 47% of organizations in the US in 2016 had been targeted at least once. A survey in the UK suggested 54% of businesses had been attacked at least once. Friday, May 12, 2017, saw one of the largest most widespread attacks to date – the WannaCry ransomware. According to Deep Instinct new malware tends to have almost the same code as the previous one only 2 to 10% changes. Due to the slight change in code ML can predict which files are malware or not with great accuracy.
  5. Email spam filtering
    Email spam filteringAccording to Computer World magazine, the average employee gets 13 spam messages a day – and over 80 percent of all the email messages zipping around the Internet are spam. Microsoft founder Bill Gates is the most spammed man in the world, with 4m emails arriving in his inbox each day. All credit goes to ML which filter all emails and classify them into spam and not spam.                                                          credits:
  6. Marketing Personalization
    Personalized marketing is the ultimate form of targeted marketing. To sell more we have to serve better and to serve better we have to understand customers. This is the base idea behind marketing personalization. Companies can personalize customer emails, which products will show up as recommended, offer they see, coupons and so on, these are just the tip of the iceberg. All above things are achieved by the advance ML algorithm.
  7. Fraud Detection
    ML and AI are used and become better day by day at spotting potential cases of fraud or anomaly detection across many different fields. The Royal Bank of Scotland (RBS) for example, is using machine learning to fight money laundering. Companies have a lot of data and they use ML to compare millions of transactions and can precisely distinguish between legitimate and fraudulent transactions between buyers and sellers.
  8. Natural Language Processing (NLP)
    Virtual personal assistants – likes of Siri, Alexa, Cortana and Google Assistant – are able to follow instructions because of voice recognition and it is NLP. NLP process human speech and match it to best-desired command and respond it in a natural way.
  9. Financial Trading
    financeAt its heart, Financial trading is no different to any other form of trading: it is about buying and selling in the hope of making a profit. Here comes its beauty “Predict what the stock market will do on any given day”. Again ML wins the game of prediction with a very close margin. ML helps many prestigious trading firms to execute trades at very high-speed and high volume for prediction. ML throws human out of a window in consuming the vast amount of data at a very fast pace.
  10. Smart Cars
    smart carsOf all the uses for machine learning, one of the most exciting ones i.e Smart Cars. A recent IBM survey of top auto executives saw some 74% of these stating they expected                                                                            credits:
    there would be smart cars on the roads by 2025.Smart cars are integrated with IoT, ML and AI which help car to do many fantastic things by own like learn their owners and environment, adjust internal settings, report and even fix problem, offer real time advice about traffic and road conditions and in extreme cases it may even take evasive action to avoid a potential collision.

The biggest big data challenges

big data challenges


We all know that Necessity is the mother of invention and we don’t want to stop at any point in our life because it’s in our gene.

The complex business environment in the world made to invent the concept of big data. Nowadays, data and how to use them make the company different from each other and most important to stay in business. For that companies transform as much as data into a meaningful product with data-driven discoveries for the users. Right analytics on data maximise revenue, improve operations and mitigate risks. According to Demirkan and Dal, big data has following six “V” characteristics i.e Volume, Velocity, Variety, Veracity, Variability and Value. The biggest big data challenges are a bit opaque to see.

IDC predict big data revenue sales will increase more than 50% from nearly $122 billion in 2015 to more than $187 billion in 2019. Nearly 73% of companies increase investing on analytics to transform data into gold but 60 percent of them feel that they don’t have the proper tool to get insight from data. Research predicts that half of all big data projects will fail to deliver desired output.

When Gartner asked what the biggest big data challenges were, the responses suggest that while all the companies plan to move ahead with big data projects, they still don’t have a good idea about what they’re doing and why. The second major concern is not establishing data governance and management. Thomas Schutz, SVP, General Manager of Experian Data Quality. says that “The biggest problem organisations face around data management today actually comes from within,” and “Businesses get in their own way by refusing to create a culture around data and not prioritising the proper funding and staffing for data management.”

There are many challenges but data related issues are biggest challenges in big data.


$19 Trillion Tech Industry Heading Our Way( IoT )


The “Internet of Things” is emerging technology and phrase that 87% of people haven’t heard of. The Internet of things (IoT) is the inter-networking of physical devices, vehicles (also referred to as “connected devices” and “smart devices”), buildings, and other items—embedded with electronics, software, sensors, actuators(MEMs), and network connectivity that enable these objects to collect and exchange data via Internet.

We are surprised to know that ATMs were some of the first IoT objects dated back to 1974.
iot growthIoT market has been growing with the parabolic rate that world has never witnessed. IoT could DWARF every technology before it and it’s just begun…Experts estimate that the IoT will consist of almost 50 billion objects by 2020 only in the span of 4 years. It took 40 years to sell 1 billion Personal Computer, 20 years to reach nearly 7 billion cellphone users and 5 years to reach 1 billion tablets. So we can say that the IoT is the future of ALL technology. It’s literally on the pulse of everything shaping the new Internet.

In order to work ever IoT device has a piece of software and sensors also known as MEMs(Micro-Electro-Mechanical Systems). Software and MEMs make IoT sense, think and act. We can say that MEMs are eye and ear of IoT and Big Data is the “fuel” that power the IoT.

What makes IoT come from nowhere?

To answer this question we have to go through different views of tech luminaries. Inventor of the Ethernet, Bob Metcalfe thinks “It’s a media phenomenon. Technologies and standards and products and markets emerge slowly, but then suddenly, chaotically, the media latches on and BOOM!—It’s the year of IoT.” Chief Economist at Google, Hal Varian believes that it has something to do with Moore’s Law: “The price of processors, sensors, and networking has come way down. Since WiFi is now widely deployed, it is relatively easy to add new networked devices to the home and office”. The father of sensors, Janus Bryzek thinks there are multiple factors. First, there is the new version of the Internet Protocol, IPv6, “enabling the almost unlimited number of devices connected to networks.” Another factor is that four major network providers Cisco, IBM, GE and Amazon—have decided “to support IoT with network modification, adding Fog layer and planning to add Swarm layer, facilitating dramatic simplification and cost reduction for network connectivity.” and last but not least new forecasts made IoT be as future king of all technologies. IoT will add trillions to global GDP. “This is the largest growth in the history of humans,” says Bryzek.

feeding the sleeping Who is feeding the sleeping giant?

recent GE survey reveals, 90% of the company implementing IoT as one of their top 3 priorities. In an active move to accommodate new and emerging technological innovation, the UK Government, in their 2015 budget, allocated £40,000,000 towards research into the Internet of things.Warren Buffet has alone invested $13 billion. Barcelona, Spain invested in it and its water system alone saves $58 million annually. Glasgow, Scotland are investing $37 million in this innovation. “DIGIT“(Developing Innovation and Growing the Internet of Things Act ) is the bill representing investing in IoT by American Govt.

IoT in our daily life and in future with limitless opportunities.

iot every where

The London School of Economics said that “The future is now and [this revolution] is going to disrupt most of the traditional industries”.



It will boost productivity and save companies millions likely billions each and every year.

1.Facebook said that this breakthrough innovation has already saved its data centre by 38%.
2. UPS uses IoT to reduces its fuel consumption by 9 million gallons a 31 million Dollar saving.


driverless car3. Driverless cars will generate $1.3 trillion in annual savings in the United States, with over $5.6 trillions of savings worldwide. The number of cars connected to the Internet worldwide will grow more than sixfold to 152 million in 2020 from 23 million in 2013.


4. The global wearable device market has grown 223% in 2015. It exceeds $1.5 billion in 2014, double its value in 2013.

5. Connected Kitchen saves the food and beverage industry as much as 15% annually.

consumer electronic

6. Consumer Electronics M2M connections will top 7 billion in 2023, generating $700 billion in annual revenue. By 2020, there will be over 100 million Internet-connected wireless light bulbs and lamps worldwide up from 2.4 million in 2013.

7. Health Care industry University of California Medical centre has a robotic control pharmacy that has dispensed 350K prescription without making one error.

legal8. Legal System is also using robots. The New York Times reported that the Blackstone Discovery and e-discovery that uses electronic data discovery software. It can analyse 1.5 million documents for less than 100k dollars. A team of paralegal would have charge 2.1 million to review.


9. Fast Food: – At Eatsa, a futuristic San Francisco-based vegetarian fast food restaurant. There are no employees. The Customer uses a touch screen to order their food. The meal is ready in a matter of minutes.


10. Retail Industry: – RFID technology help tagging the product so that it can be tracked. With IoT, RFID tags retailers can expect 99% accuracy in inventory and there will be 2 to 7% increase in sale.

These above examples are just the tip of Ice-Berg. These machines are getting smarter and smarter every day starting to essentially think on their own.

This technology is not just a game changer. It is THE game changer. So get ready… The Internet of Things is here to stay.

5 pillars of data scientist career

data scientist career

data scientist career

  1. Genetically Modified Leadership
  2. Great Power Great Responsibility
  3. Big Picture in Big Data
  4. KISS
  5. Feedback


  1. Genetically Modified Leadership:

Genetically Modified Leadership You and you alone there to guide and lead yourself. To get noticed by the world you have to modify well exactly like GMO food. Tackle weakness and polishing your strengths lead you to greatness. By concurrently being the best mentor and best student you can possibly be, you will bring forth your GM leadership skills. Leadership will remove self-doubt and motivate you with a heightened self-worth. Control and create your own views to be more positive and productive. The daily intensity of the data scientist role can be increasingly stressful because they have data and data is power.

2. Great Power Great Responsibility:
Great Power Great Responsibility
Power and responsibility always exist in parallel. Nowadays data is power and here comes the responsibility. So, calm yourself being the highest priority then and only then other thing goes well.

3. Big Picture in Big Data:
Big Picture in Big Data
In the age of big data, it is essential to converge your environment into manageable points so that you can keep up a big picture perspective.

4. KISS(keep it short and simple):
The KISS principle states that most systems work best if they are kept simple than made complicated. So, keep rules and procedure of any model short and simple so that it will not burn your ass in coming future. And always keep in mind Albert Einstein quotes – “If you can’t explain it simply, you don’t understand it well”. and “Make things as simple as possible, but not simpler.”

5. Feedback:
Here comes the blood of data scientist i.e feedback. After countless efforts, data scientist waited for feedback to come. Feedback acts like oxygen, blood, all things which make you alive every moment. The more feedback the more powerful you and your work will be.

How should I integrate Pyspark with Jupyter notebook on Ubuntu 16.04?


Prerequisite for fully working of Apache Spark(pyspark) with Jupyter i.e  How to integrate Jupyter notebook and pyspark?

Step 1: – Download and Installed.

  1. Download and install Anaconda. (Anaconda comes with lots of packages like Jupyter, ipython, python3 and many more so no need to install these packages explicitly)
  2. Download and install if not installed Java(Because spark uses JVM to run.)
    to check Java is install run this command in terminal :-  $java -version or $which java (it returns path of java executable.)
  3. Download Spark and untar and move to your desired location and better to rename it as a spark.
  4. Data (in CSV format) to check for a proper working of Apache Spark.

Step 2: – Setting up Environment Variable.

  • Copy the path from your preferred installation and then open /etc/environment using nano or your favorite text editor. Note in setting environment variable path of folder is given not the executable file
    $ sudo nano /etc/environment
  • JAVA_HOME="/usr/lib/jvm/java-8-oracle"
  • PATH=/path/of/Anaconda/bin:$PATH   # (Anaconda bin directory contains jupyter, ipython, python3 )
    To see PATH:- echo $PATH
    Note again:-  executable(software) is search and executed in order as its display in the output in echo $PATH
  • Reload the environment variable file by running this command
    source /etc/environment

Step 3: – Configure Apache Spark file in conf folder

  • cd /path/of/your/spark/folder/spark/conf/
  • cp
  • nano
  • add these line:
    export PYSPARK_PYTHON=/Path/of/anaconda//bin/python3
    export PYSPARK_DRIVER_PYTHON=/Path/of/anaconda//bin/jupyter
    JAVA_HOME=/path/of/java/usr/lib/jvm/java8-oracleStep 4:- Configure Apache Spark pyspark file in bin folder
  • go to line 85 add this
    export PYSPARK_DRIVER_PYTHON=jupyter
  • go to line 86 add this
    export PYSPARK_DRIVER_PYTHON_OPTS=“notebook”
  • Save all
    Step 5: – To Launch pyspark in jupyter which is a web-browser-based version of IPython, use:-
    PYSPARK_DRIVER_PYTHON_OPTS=”notebook” /path/of/spark//spark-1/bin/pyspark

Spark Standalone Installation – Install Spark to Local Cluster

Install Spark to Local Cluster


Apache spark can easily be deployed in standalone mode, all you need is to Install Spark to Local Cluster. First download the pre-built spark and extract it. After that, open your terminal navigate to the extracted directory of spark from sbin start after that start followed by master spark URL which will be obtained at localhost:8080. Now you have started a cluster manually.

After that, you can start the Spark-shell (for Scala) or Pyspark (for Python) or SparkR (for R) from bin.

  1. Download pre-built Spark.
  2. Extract the downloaded Spark built (you can extract spark in either way by terminal or manually).
  3. From your terminal navigate to the extracted folder, now you have to start from sbin command: sbin/
  4. After Master, you need to start followed by master spark URL which you’ll get from browser by typing localhost:8080 command: sbin/ <URL>
  5. After performing step 3 & step 4, you have successfully started the cluster manually.
  6. Now you’ll be able to start your applications like Spark-shell, pySpark, SparkR for Scala, Python and R from bin. command: bin/spark-shell
  7. Start writing your code or application.


 Screenshots of Standalone Mode

Install Spark to Local Cluster at 2.53.05 PM

Install Spark to Local Cluster at 2.54.35 PM

Install Spark to Local Cluster at 2.55.22 PM

Install Spark to Local Cluster at 2.57.17 PM

Install Spark to Local Cluster at 2.58.33 PM

Install Spark to Local Cluster at 2.59.51 PM

Install Spark to Local Cluster at 3.01.03 PM

Install Spark to Local Cluster at 3.01.46 PM

Install Spark to Local Cluster at 3.02.47 PM

Install Spark to Local Cluster at 3.13.28 PM


Install Spark to Local Cluster at 3.16.49 PM

Install Spark to Local Cluster at 4.37.28 PM




for quick basic tutorial referred to official guide.