Hello and Welcome from 360DigiTMG. This is a video playlist to explain the life cycle of a data science project. In the previous video we talked about Data Understanding. Now let us discuss about the next phase in the data science lifecycle – Data Preparation.


Data Preparation: Once we narrowed down the business problem and understood what data is available to address it, we now begin to prepare the data for analysis. Most of the data in the real-world is highly noisy and sometimes may come from a lot of disparate sources. ML models are only as good as the data that is used to train them. After the data is collected, the integration, annotation, preparation, and processing of that data is critical. An essential characteristic of suitable training data is that it’s provided in a way that is optimized for learning and generalization. Data preparation should start with a small, statistically valid sample, and iteratively be improved with different data preparation strategies, while continuously maintaining data integrity. This phase deals with how the dataset is prepared from relevant data sources by blended them together in a manner that makes the most sense. We also deal with missing data, handling null values, duplicate values, remove outliers all of which form part of the data cleansing activities. Data preparation contains the following sub modules:

1. Exploratory Data Analysis and Visualization
2. Feature Engineering

Let us first talk about Exploratory Data Analysis and Visualization:

Exploratory data analysis is not listed as one of the CRISP-DM phases but over the period of several years this has become a crucial part of the Data Science life cycle. It involves identifying correlations and relationships between variables by using visual as well as statistical techniques. For example, while dealing with a regression model you need to check for highly correlated variables, account for homoscedasticity and heteroscedasticity etc.

A key aspect to understanding your data is to identify patterns. These patterns are often not evident when you are only looking at data in tables. The correct visualization tool can help you quickly gain a deeper understanding of your data. Before creating any chart or graph, you must decide what you want to show. For example, charts can convey information such as key performance indicators (KPI), relationships, comparisons, distributions, or compositions.

That concludes this video. In the next video we will talk about the next phase in the life cycle – Feature Engineering and best practices of this stage.



SUBSCRIBE TO 360DigiTMG’s YOUTUBE CHANNEL NOW
https://www.youtube.com/channel/UCNGIDQ466bNY87eEeKeQuzA

We have specifically created a Facebook Group for all our Data Science aspirants. You can use the below link to join.
In addition to this, we are going to host 2 FREE training sessions Every Single Month on various topics inside this group.

Join FREE Data Science Facebook Group
https://www.facebook.com/groups/DataScience.MachineLearning.ArtificialIntellegence/


CONNECT WITH 360DigiTMG ON SOCIAL MEDIA

Facebook: https://www.facebook.com/360Digitmg/
Linkedin: https://www.linkedin.com/company/360digitmg/
Instagram: https://www.instagram.com/360digitmgindia/
YouTube: https://www.youtube.com/channel/UCNGIDQ466bNY87eEeKeQuzA

About 360DigiTMG
360digiTMG is a 5-year-old training & consulting organization led by stalwarts of the industry who are alumnus of premier institutions like the Indian Institute of Technology, Indian Institute of Management and Indian School of Business. 360digiTMG since its inception has been the forerunner in the space of management and niche programs that aid in up-skilling and cross skilling executives across various levels and domains. 360digiTMG has been conducting training programs across the globe for corporate and individuals alike.
360DigiTMG is one stop solution to all the trainings in emerging technologies such as Artificial Intelligence, Machine Learning, Big Data, Project Management, Quality Management, etc. 360DigiTMG is a training company, which is a division of the analytics consulting firm Innodatatics Inc.

For more Information Contact us @::
India : +91 99899 94319
Malaysia: +603 2092 9488

Email: [email protected]
Web: https://360digitmg.com/

Did you find this video helpful? Leave a comment below!
#DataScience #ArtificialIntelligence #Scholarship #DataAnalytics #Jumpstart #360DigiTMG #Malaysia