机器学习：基于 Sklearn 和 XGBoost 框架，使用逻辑回归、支持向量机和 XGBClassifier 诊断和预测一个人是否患有自闭症-4。模型训练

最编程 2024-05-06 13:43:47

...

现在，我们将特征和目标变量分开，并将它们拆分为训练和测试数据，通过这些数据，我们将选择在验证数据上表现最好的模型。

removal = ['ID', 'age_desc', 'used_app_before', 'austim']
features = df.drop(removal + ['Class/ASD'], axis=1)
target = df['Class/ASD']

让我们将数据拆分为训练数据和验证数据。此外，数据之前是不平衡的，现在我们将使用随机采样器来平衡它，在这种方法中，我们从少数类中抽取一些点并重复多次，以便两个类获得平衡。

X_train, X_val, \
		Y_train, Y_val = train_test_split(
		features, target,
		test_size = 0.2, random_state=10)

# As the data was highly imbalanced we will balance it by adding repetitive rows of minority class.
ros = RandomOverSampler(sampling_strategy='minority',random_state=0)
X, Y = ros.fit_resample(X_train,Y_train)
X.shape, Y.shape

输出

((1026, 20), (1026,))

现在，让我们对数据进行归一化，以获得稳定和快速的训练。

# Normalizing the features for stable and fast training.
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_val = scaler.transform(X_val)

现在，让我们训练一些最先进的机器学习模型，并将它们与我们的数据进行比较。

models = [LogisticRegression(), XGBClassifier(), SVC(kernel='rbf')]

for model in models:
	model.fit(X, Y)

	print(f'{model} : ')
	print('Training Accuracy : ', metrics.roc_auc_score(Y, model.predict(X)))
	print('Validation Accuracy : ', metrics.roc_auc_score(Y_val, model.predict(X_val)))
	print()

在这里插入图片描述

上一篇： MySql 数据库（概念）