《机器学习》（西瓜书）课后习题第一章

作者: print("") 分类: 未分类发布时间: 2026-01-13 00:11 阅读次数: 26 次

1.1 表1.1中若只包含编号为1和4的两个样例，试给出相应的版本空间

表 1.1 西瓜数据集

编号	色泽	根蒂	敲声	好瓜
1	青绿	蜷缩	浊响	是
4	乌黑	稍蜷	沉闷	否

首先是需要把所有的假设组成汇总成一张表得到了假设空间还需要注意需要+一个空，因为可能好瓜不成立

上表有3个属性，每个属性有两个取值。那么假设空间大小为3×3×3+1=28

版本空间=假设空间-不成立的条件

不成立的条件如下：

1.删除与正例不一致的假例

2.删除与反面样例相符的例子

得到所有的假设

1. (‘青绿’, ‘蜷缩’, ‘浊响’)
2. (‘青绿’, ‘蜷缩’, ‘沉闷’)
3. (‘青绿’, ‘蜷缩’, ‘*’)
4. (‘青绿’, ‘稍蜷’, ‘浊响’)
5. (‘青绿’, ‘稍蜷’, ‘沉闷’)
6. (‘青绿’, ‘稍蜷’, ‘*’)
7. (‘青绿’, ‘*’, ‘浊响’)
8. (‘青绿’, ‘*’, ‘沉闷’)
9. (‘青绿’, ‘*’, ‘*’)
10. (‘乌黑’, ‘蜷缩’, ‘浊响’)
11. (‘乌黑’, ‘蜷缩’, ‘沉闷’)
12. (‘乌黑’, ‘蜷缩’, ‘*’)
13. (‘乌黑’, ‘稍蜷’, ‘浊响’)
14. (‘乌黑’, ‘稍蜷’, ‘沉闷’)
15. (‘乌黑’, ‘稍蜷’, ‘*’)
16. (‘乌黑’, ‘*’, ‘浊响’)
17. (‘乌黑’, ‘*’, ‘沉闷’)
18. (‘乌黑’, ‘*’, ‘*’)
19. (‘*’, ‘蜷缩’, ‘浊响’)
20. (‘*’, ‘蜷缩’, ‘沉闷’)
21. (‘*’, ‘蜷缩’, ‘*’)
22. (‘*’, ‘稍蜷’, ‘浊响’)
23. (‘*’, ‘稍蜷’, ‘沉闷’)
24. (‘*’, ‘稍蜷’, ‘*’)
25. (‘*’, ‘*’, ‘浊响’)
26. (‘*’, ‘*’, ‘沉闷’)
27. (‘*’, ‘*’, ‘*’)
28. (‘∅’, ‘∅’, ‘∅’)

【版本空间】共 7 个假设
（与训练样例一致的假设集合）

1. (‘青绿’, ‘蜷缩’, ‘浊响’) – 通配符数量: 0
2. (‘青绿’, ‘蜷缩’, ‘*’) – 通配符数量: 1
3. (‘青绿’, ‘*’, ‘浊响’) – 通配符数量: 1
4. (‘青绿’, ‘*’, ‘*’) – 通配符数量: 2
5. (‘*’, ‘蜷缩’, ‘浊响’) – 通配符数量: 1
6. (‘*’, ‘蜷缩’, ‘*’) – 通配符数量: 2
7. (‘*’, ‘*’, ‘浊响’) – 通配符数量: 2

脚本如下：

# -*- coding: utf-8 -*-
"""
版本空间学习算法
基于西瓜数据集样例1和样例4
"""
from itertools import product

class VersionSpace:
    def __init__(self):
        # 定义属性及其可能的取值
        self.attributes = {
            '色泽': ['青绿', '乌黑'],
            '根蒂': ['蜷缩', '稍蜷'],
            '敲声': ['浊响', '沉闷']
        }
        
        # 训练样例
        self.positive_examples = [
            ('青绿', '蜷缩', '浊响')  # 样例1：正例
        ]
        
        self.negative_examples = [
            ('乌黑', '稍蜷', '沉闷')  # 样例4：反例
        ]
        
    def generate_hypothesis_space(self):
        """生成完整的假设空间"""
        hypotheses = []
        attr_names = list(self.attributes.keys())
        
        # 为每个属性生成可能的取值：具体值或通配符*
        for attr_values in self.attributes.values():
            attr_values.append('*')  # 添加通配符
        
        # 生成所有可能的组合
        all_combinations = product(
            self.attributes['色泽'],
            self.attributes['根蒂'],
            self.attributes['敲声']
        )
        
        for combo in all_combinations:
            hypotheses.append(combo)
        
        # 添加空假设
        hypotheses.append(('∅', '∅', '∅'))
        
        return hypotheses
    
    def is_consistent_with_positive(self, hypothesis, example):
        """检查假设是否与正例一致（能覆盖正例）"""
        if '∅' in hypothesis:
            return False
        
        for h_val, e_val in zip(hypothesis, example):
            if h_val != '*' and h_val != e_val:
                return False
        return True
    
    def is_consistent_with_negative(self, hypothesis, example):
        """检查假设是否与反例一致（不覆盖反例）"""
        if '∅' in hypothesis:
            return True
        
        # 至少有一个属性不匹配，则不会分类反例为正例
        for h_val, e_val in zip(hypothesis, example):
            if h_val != '*' and h_val != e_val:
                return True
        return False
    
    def find_version_space(self, hypotheses):
        """找出版本空间（与所有训练样例一致的假设集合）"""
        version_space = []
        
        for hyp in hypotheses:
            # 检查是否与所有正例一致
            consistent_with_positives = all(
                self.is_consistent_with_positive(hyp, pos) 
                for pos in self.positive_examples
            )
            
            # 检查是否与所有反例一致
            consistent_with_negatives = all(
                self.is_consistent_with_negative(hyp, neg) 
                for neg in self.negative_examples
            )
            
            if consistent_with_positives and consistent_with_negatives:
                version_space.append(hyp)
        
        return version_space
    
    def find_s_boundary(self, version_space):
        """找出特殊边界S（最特殊的假设）"""
        s_boundary = []
        
        for hyp in version_space:
            # 最特殊的假设：没有通配符*或通配符最少
            wildcard_count = hyp.count('*')
            if wildcard_count == 0:
                s_boundary.append(hyp)
        
        return s_boundary
    
    def find_g_boundary(self, version_space):
        """找出一般边界G（最一般的假设）"""
        g_boundary = []
        max_wildcards = 0
        
        # 找出通配符最多的假设
        for hyp in version_space:
            wildcard_count = hyp.count('*')
            max_wildcards = max(max_wildcards, wildcard_count)
        
        # 获取通配符数量为最大值的假设
        for hyp in version_space:
            if hyp.count('*') == max_wildcards:
                g_boundary.append(hyp)
        
        return g_boundary
    
    def print_results(self):

        # 生成假设空间
        hypotheses = self.generate_hypothesis_space()
        print(f"\n【假设空间】共 {len(hypotheses)} 个假设")
        
        print("\n所有假设：")
        for i, hyp in enumerate(hypotheses, 1):
            print(f"  {i:2d}. {hyp}")
        
        # 找出版本空间
        version_space = self.find_version_space(hypotheses)
        print(f"\n【版本空间】共 {len(version_space)} 个假设")
        print()
        for i, hyp in enumerate(version_space, 1):
            wildcard_count = hyp.count('*')
            print(f"  {i}. {hyp} - 通配符数量: {wildcard_count}")
    

if __name__ == "__main__":
    vs = VersionSpace()
    vs.print_results()

1.2 与使用单个合取式来进行假设表示相比，使用“析合范式”将使得假设空间具有更强的表示能力。若使用最多包含k个合取式的析合范式来表达1.1的西瓜分类问题的假设空间，试估算有多少种可能的假设。

合取范式析合范式

如果觉得我的文章对您有用，请随意打赏。您的支持将鼓励我继续创作！

《机器学习》（西瓜书）课后习题 第一章

1.1 表1.1中若只包含编号为1和4的两个样例，试给出相应的版本空间

1.2 与使用单个合取式来进行假设表示相比，使用“析合范式”将使得假设空间具有更强的表示能力。若使用最多包含k个合取式的析合范式来表达1.1的西瓜分类问题的假设空间，试估算有多少种可能的假设。

发表回复 取消回复

《机器学习》（西瓜书）课后习题第一章

发表回复取消回复