If the machine learning model is not generalised then the model contains some kind of error.
Error= difference between actual and predicted values/classes
Formulae = sum of (actual output-predicted output), Also Error is the sum of reducible + irreducible error.
Reducible Error= bias + variance
Bias is how far is the predicted values/class from actual values/class. If the predicted value is too far away from actual value then the model is highly biased.
If values are not too far away then its low biased.
If the model is Highly biased then it won’t be able to capture the complex data and hence it UNDERFITS. (Underfitting)
If the model performs well on training dataset but does not perform well on testing or validation data which is new to model then its termed as the variance. So variance is how scattered predicted values from the actual values. If the model has High variance than the model overfits (OVERFITTING). Oftenly termed as the model learned the noise.
While solving our problems on python, Many of us might have faced the situation of kernels, the specific package supports only python 2.7 and require python 3 and there are a lot of issues while installing python kernels and running it with Jupyter notebook side by side. Here’s my solution of running python 2 and 3 on the same machine.
System Overview: I ran the kernels on MacOS Mojave version 10.14.5
General Transit Feed
Specification is a common format for public transportation schedules and
associated geographic information. It is the data used by google Maps.
2. Details of GTFS
GTFS is a set of
text files that represent a snapshot of scheduled transit services
a. Agency.txt – Details of Agency publishing the data
b. Routes.txt – Details of Routes name and type
c. Trips.txt – Details of trip and service
d. Stops.txt – Details of location and stops name
e. Stop_times.txt – Details of arrival & Departure
f. calender.txt –
Details of availability days & dates
including other additional optional fields such as calendar
dates, fare attributes, fare rules, shapes, frequencies, transfers, and feed
info
When you’ll start working with NoSQL databases, sooner or later you come across CAP theorem, The theorem published by Eric Brewer in 2000, that describe any distributed system.
In a distributed database system C stands for Consistency, A stands for Availability and P stands for Partition Tolerance.
Consistency – Every node in the system will have the same view of data.
Availability – User can read & write from any node.
Partition Tolerance – Your system will still Operate even if any node or server fails.
So from the above illustration while selecting your NoSQL database you can choose only two characteristics i.e. A-P, A-C, C-P.
A-P Databases: Voldemort, Cassandra, Riak, CouchDB, Dynamo