Decision tree could not convert string to float

12/14/2023

We don't get any errors because we correctly preprocess the data for the model. Now we shouldn't have any more trouble with the data to fit the model. Therefore, we go from 891 people: df_titanic We'll choose option 1 to simplify the tutorial. Who are the people who lack the information? mask_na = df_titanic.isna().sum(axis= 1) > 0 df_titanicĭrop the people (rows) who miss the age from the dataset.įill the age by the average age of other combinations (like males who survived) Survived 0 age 177 sex_male 0 embarked_Q 0 embarked_S 0 class_Second 0 class_Third 0 dtype: int64 Precisely 177 people from which we don't have the age: df_titanic.isna() The data passed to the function contains missing data ( NaN). ValueError: Input contains NaN, infinity or a value too large for dtype('float32'). Now we should be able to fit the model: model_dt.fit(X=explanatory, y=target) We separate the variables again to take into account the latest modification: explanatory = df_titanic.drop(columns= 'survived') Therefore, we need to convert the categorical columns to dummies (0s & 1s): pd.get_dummies(df_titanic, drop_first= True)ĭf_titanic = pd.get_dummies(df_titanic, drop_first= True) fit() does not accept values of string type like the ones in sex column: df_titanic The error says: ValueError: could not convert string to float: 'male'įrom which we can interpret that the function. So let's dig into why we got the previous error in the following sections. Most of the time, the data isn't prepared to fit the model. ValueError: could not convert string to float: 'male' Separate the Variables target = df_titanicĮxplanatory = df_titanic.drop(columns= 'survived')įit the Model model_dt.fit(X=explanatory, y=target) X: explanatory ~ dependent ~ feature variables Y: target ~ independent ~ label ~ class variable Why is it asking for two parameters: y and X? TypeError: fit() missing 2 required positional arguments: 'X' and 'y' TypeError Traceback (most recent call last) Therefore, the function should be called the same way: model_dt.fit() The theoretical action we'd like to perform is the same as we executed in the previous chapter. To create a copy of the original's code blueprint to not "modify" the source code. Import the Class from ee import DecisionTreeClassifier We should know from the previous chapter that we need a function accessible from a Class in the library sklearn. How do we compute a Decision Tree Model in Python? Import seaborn as sns #! import pandas as pdĭf_titanic = sns.load_dataset(name= 'titanic')] This dataset represents people (rows) aboard the TitanicĪnd their sociological characteristics (columns) □ How to compare Classification models to choose the best one? ⚠️ How to evaluate Classification models? □ How to interpret the nodes and leaf's values of a Decision Tree plot? □ How to visualize a Decision Tree model in Python step by step? □ How to preprocess/clean the data to fit a Machine Learning model? This tutorial will show you how to develop a Decision Tree to calculate the probability of a person surviving the Titanic and the different evaluation metrics we can calculate on Classification Models. In the end, you will choose the one that makes better predictions. You don't need to know the process behind each model because they all work the same way (see article). There are many Machine Learning Models of each type. Regressor evaluates the model based on the distance between real data and predictions (residuals) y−y^ The Machine wants to get the best numbers of a mathematical equation such that the difference between reality and predictions is minimum:Ĭlassifier evaluates the model based on prediction success rate y=?y^ This chapter covers the explanation of a Classification model: the Decision Tree. The previous chapter covered the explanation of a Regressor model: Linear Regression.

Regressors to predict Numerical Variables Which types of Machine Learning models can we distinguish so far?Ĭlassifiers to predict Categorical Variables Although not all Machine Learning models work the same way.

Machine Learning is a field that focuses on getting a mathematical equation to make predictions. Introduction to Supervised Classification Models Don't miss out on his posts on LinkedIn to become a more efficient Python developer.

0 Comments

Decision tree could not convert string to float

Leave a Reply.

Author

Archives

Categories