超过的英文译语怎么说-兰德华庭
2023年3月31日发(作者:雪花怎么画)
机器学习决策树篇——解决离散变量的分类问题
机器学习决策树篇——解决离散变量的分类问题
摘要
本⽂通过python实现了熵增益和熵增益率的计算、实现了离散变量的决策树模型,并将代码进⾏了封装,⽅便读者调⽤。
熵增益和熵增益率计算
此对象⽤于计算离散变量的熵、条件熵、熵增益(互信息)和熵增益率
.cal_entropy():计算熵的函数
.cal_conditional_entropy():计算条件熵的函数
.cal_entropy_gain():计算熵增益(互信息)的函数
.cal_entropy_gain_ratio():计算熵增益率的函数
⽤法:先传⼊特征和标注创建对象,再调⽤相关函数计算就⾏
特征和标注的类型最好转⼊DataFrame、Series或者list格式
若想计算单个变量的熵,则特征和标注传同⼀个值就⾏
importnumpyasnp
importpandasaspd
importcopy
cessingimportLabelEncoder
tsimportload_wine,load_breast_cancer
classCyrusEntropy(object):
\"\"\"
此对象⽤于计算离散变量的熵、条件熵、熵增益(互信息)和熵增益率
.cal_entropy():计算熵的函数
.cal_conditional_entropy():计算条件熵的函数
.cal_entropy_gain:计算熵增益(互信息)的函数
.cal_entropy_gain_ratio():计算熵增益率的函数
⽤法:先传⼊特征和标注创建对象,再调⽤相关函数计算就⾏
特征和标注的类型最好转⼊DataFrame、Series或者list格式
若想计算单个变量的熵,则特征和标注传同⼀个值就⾏
\"\"\"
def__init__(self,x,y):
#特征进⾏标签编码
x=ame(x)
y=(y)
x0=(x)
y=(y)
foriinrange([1]):
[:,i]=LabelEncoder().fit_transform([:,i])
self.X=x0
self.Y=(LabelEncoder().fit_transform(y))
defcal_entropy(self):
x_entropy=[]
foriinrange([1]):
number=([:,i].value_counts())
p=number/()
x_((-p*2(p)))
number=(_counts())
p=number/()
y_entropy=(-p*2(p))
returnx_entropy,y_entropy
defcal_conditional_entropy(self)关于桃花最出名的诗句 :
y_x_conditional_entropy=[]
foriinrange([1]):
dict_flag={}
list_flag=[]
forjinrange([0]):
dict_flag[[j,i]]=dict_([j,i],list_flag)+[[j]]
condition_value=0
fory_valueindict_():
number=((y_value).value_counts())
p=number/()
condition_value+=(-p*2(p))*len(y_value)/([0])
y_x_conditional_(cond滋组词 ition_value)
returny_x_conditional_entropy
defcal_entropy_gain(self):
returnlist((_entropy()[1])-(_conditional_entropy()))
defcal_entropy_gain_ratio(self):
returnlist((_entropy_gain()蜡炬成灰泪始干的上一句是什么 )/(_entropy()[0]))
熵增益和熵增益率运⾏结果
使⽤kaggle上的⼀份离散变量数据进⾏模型验证,以下是kaggle上的数据描述:
TheLifetimerealitytelevisionshowandsocialexperiment,MarriedatFirstSight,featuresmenandwomenwhosignupto
marryacompletestrangerthey’arriage,
coupleshaveonlyafeavebeen10
fullseasonssofarwhichprovidesinterestingdatatolookatwhatfactorsmayormaynotplayaroleintheirdecisionsat
theendofeightweeksaswellaslonger-termoutcomessincetheshowaired.
if__name__==\"__main__\":
data=_csv(\"./\",header=0)
Y=
X=(labels=\"Couple\",axis=1)
X=(labels=\"Status\",axis=1)
print((2))
建⽴求取信息熵对象
求取各特征和标注的信息熵
#建⽴求取信息熵对象
entropy_model=CyrusEntropy(X,Y)
#求取各特征和标注的信息熵
entropy=entropy__entropy()
([3.29646716508619,3.08955,6.342,3.522,1.0,6.342,0.873981,0.0,0.833764
907210665,0.833764907210665,0.833764907210665,0.833764907210665,0.6722948170756379,0.873981,0.833764907210665],0.833
764907210665)
求取标注相对各特征的条件熵
#求取标注相对各特征的条件熵
conditon_entropy=entropy__conditional_entropy()
print(conditon_entropy)
[0.6655644259732555,0.699248162082863,0.0,0.7352336969711815,0.833764907210665,0.0,0.67371811971174,0.833764907210665,0.7982018
075321516,0.7982,0.7982,0.7982,0.825515,0.8736,0.8276667497383372]
求取标注相对于各特征的信息增益(互信息)
#求取标注相对于各特征的信息增益(互信息)
entropy_gain=entropy__entr十六字令三首翻译 opy_gain()
print(entropy_gain)
[0.1682,0.780198,0.833764907210665,三国演义电子书完整版免费 0.48352,0.0,0.833764907210665,0.16498,0.0,0.0
35563,0.513344,0.513344,0.513344,0.55338,0.091432,0.0
27781]
求取标注相对于各特征的信息增益率
#求取标注相对于各特征的信息增益率
entropy_gain_rate=entropy__entropy_gain_ratio()
print(entropy_gain_rate)
[0.64376,0.63559,0.39145,0.42902,0.0,0.39145,0.3287,nan,0.
35057,0.35057,0.35057,0.35057,0.381867,0.22165,0.0073
14]
离散变量的决策树模型
此对象为针对离散变量的分类问题建⽴决策树模型适⽤的。
.fit():拟合及训练模型的函数
.predict():模型预测函数
.tree_net:决策树⽹络
⽤法:先调⽤类创建实例对象,再调⽤fit函数训练模型,
再调⽤predict函数进⾏预测,且可通过tree_net属性查看决策树⽹络。
特征和标注的类型最好转⼊DataFrame、Series或者list格式
classCyrusDecisionTreeDiscrete(object):
\"\"\"
此对象为针对离散变量的分类问题建⽴决策树模型适⽤的。
.fit():拟合及训练模型的函数
.predict():模型预测函数
.tree_net:决策树⽹络
⽤法:先调⽤类创建实例对象,再调⽤fit函数训练模型,
再调⽤predict函数进⾏预测,且可通过tree_net属性查看决策树⽹络。
特征和标注的类型最好转⼊DataFrame、Series或者list格式
\"\"\"
X=None
Y=None
def__init__(self,algorithm=\"ID3\"):
=algorithm
_net={}
deftree(self,x,y,dict_):
entropy_model=CyrusEntropy(x,y)
index=(entropy__entropy_gain())
dict_[index]={}
dict_x_flag={}
dict_y_flag={}
foriinrange([0]):
dict_x_flag[[i,index]]=dict_x_([i,index],[])+[list([i,:])]
dict_y_flag[[i,index]]=dict_y_([i,index],[])+[([i])]
key_list=[]
forkey,valueindict_x_():
(dict_y_flag[key]).value_counts().shape[0]==1:
dict_[index][key]=dict_y_flag[key][0]
else:
key_(key)
dict_[index][key]={}
code=\"\"
iflen(key_list)!=0:
forkeyinkey_list:
code+=\"(ame(dict_x_flag[\'{}\']),(dict_y_flag[\'{}\']),dict_[{}][\'{}\']),\".format(key,key,index,key)
code=code[:-1]
returneval(code)
deffit(self,x,y):
self.X=ame(x)
self.Y=(y)
(self.X,self.Y,_net)
defcal_label(self,x,dict_):
in迢迢牵牛星原文及翻译注音 dex=list(dict_.keys())[0]
ifstr(type(dict_[index][x[index]]))!=\"
returndict_[index][x[index]]
else:
_label(x,dict_[index][x[index]])
defpredict(self,x):
x=ame(x)
y=[]
foriinrange([0]):
se=([湘夫人 i,:])
(_label(se,_net))
returny
决策树模型运⾏结果
建⽴决策树模型
训练并拟合模型
模型预测
#建⽴决策树模型
tree_model=CyrusDecisionTreeDiscrete()
#训练并拟合模型
tree_(X,Y)
#模型预测
y_pre=tree_t(X)
print过华清宫其一 (y_pre)
[\'Married\',\'Married\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'
Divorced\',\'Divorced\',\'Divorced\',青玉案元夕拼音版 \'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',发的多音字组词 \'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'
Married\',\'Married\',\'Married\',\'Married\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Married\',\'Married\',\'Divorced\',\'Divorced\',\'Marrie
d\',\'Married\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Married\',\'Married\',\'Divorced\',\'Divorced\',\'Married\',\'Married\',\'Married\',\'Married\',\'Divorced\',\'Div
orced\',\'Divorced\',\'Divorced\',\'Married\',\'Married\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\',\'Divorced\']
准确率检测
#准确率检测
result=[1ify_pre[i]==Y[i]else0foriinrange(len(y_pre))]
print(\"准确率为:\",(result).sum()/len(result))
准确率为:1.0
byCyrusMay20200520
时间如果可以倒流
我想我还是
会卯起来蹉跎
反正就这样吧
我知道我
努⼒过
——————五⽉天(⼀颗苹果)——————
更多推荐
divorced是什么意思orced在线翻译读音例
发布评论