点击选择搜索分类

数据挖掘导论(英文版)/图灵原版计算机科学系列下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

数据挖掘导论(英文版)/图灵原版计算机科学系列电子书下载地址

》数据挖掘导论(英文版)/图灵原版计算机科学系列电子书籍版权问题请点击这里查看《

数据挖掘导论(英文版)/图灵原版计算机科学系列书籍详细信息

ISBN：9787115141446
作者：暂无作者
出版社：暂无出版社
出版时间：2006-01
页数：516
价格：50.10
纸张：胶版纸
装帧：平装
开本：暂无开本
语言：未知
丛书：暂无丛书
TAG：暂无
豆瓣评分：暂无豆瓣评分
豆瓣短评：点击查看
豆瓣讨论：点击查看
豆瓣目录：点击查看
读书笔记：点击查看
原文摘录：点击查看

内容简介：

　　本书对数据挖掘进行了全面介绍，旨在为读者提供将数据挖掘应用于实际问题所必需的知识。本书涵盖五个主题：数据、分类、关联分析、聚类和异常检测。除异常检测外，每个主题都有两章：前面一章讲述基本概念、代表性算法和评估技术，而后面一章较深入地讨论高级概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时，还能了解更多重要的高级主题。此外，书中还提供了大量例子、图表和习题。

本书适合作为相关专业高年级本科生和研究生数据挖掘课程的教材，同时也可作为从事数据挖掘研究和应用开发工作的技术人员的参考书。

书籍目录：

1　Introduction　1

1.1　What Is Data Mining?　2

1.2　Motivating Challenges　3

1.3　The Origins of Data Mining　4

1.4　Data Mining Tasks　5

1.5　Scope and Organization of the Book　8

1.6　Bibliographic Notes　9

1.7　Exercises　12

2　Data　13

2.1　Types of Data　15

2.1.1　Attributes and Measurement　15

2.1.2　Types of Data Sets　20

2.2　Data Quality　25

2.2.1　Measurement and Data Collection Issues　26

2.2.2　Issues Related to Applications　31

2.3　Data Preprocessing　32

2.3.1　Aggregation　32

2.3.2　Sampling　34

2.3.3　Dimensionality Reduction　36

2.3.4　Feature Subset Selection　37

2.3.5　Feature Creation　39

2.3.6　Discretization and Binarization　41

2.3.7　Variable Transformation　45

2.4　Measures of Similarity and Dissimilarity　47

2.4.1　Basics　47

2.4.2　Similarity and Dissimilarity between Simple Attributes　49

2.4.3　Dissimilarities between Data Objects　50

2.4.4　Similarities between Data Objects　52

2.4.5　Examples of Proximity Measures　53

2.4.6　Issues in Proximity Calculation　58

2.4.7　Selecting the Right Proximity Measure　60

2.5　Bibliographic Notes　61

2.6　Exercises　64

3　Exploring Data　71

3.1　The Iris Data Set　71

3.2　Summary Statistics　72

3.2.1　Frequencies and the Mode　72

3.2.2　Percentiles　73

3.2.3　Measures of Location: Mean and Median　73

3.2.4　Measures of Spread: Range and Variance　75

3.2.5　Multivariate Summary Statistics　76

3.2.6　Other Ways to Summarize the Data　77

3.3　Visualization　77

3.3.1　Motivations for Visualization　77

3.3.2　General Concepts　78

3.3.3　Techniques　81

3.3.4　Visualizing Higher-Dimensional Data　90

3.3.5　Do's and Don'ts　94

3.4　OLAP and Multidimensional Data Analysis　95

3.4.1　Representing Iris Data as a Multidimensional Array　95

3.4.2　Multidimensional Data: The General Case　97

3.4.3　Analyzing Multidimensional Data　98

3.4.4　Final Comments on Multidimensional Data Analysis　101

3.5　Bibliographic Notes　102

3.6　Exercises　103

4　Classification: Basic Concepts, Decision Trees, and Model Evaluation　105

4.1　Preliminaries　105

4.2　General Approach to Solving a Classification Problem　107

4.3　Decision Tree Induction　108

4.3.1　How a Decision Tree Works　108

4.3.2　How to Build a Decision Tree　110

4.3.3　Methods for Expressing Attribute Test Conditions　112

4.3.4　Measures for Selecting the Best Split　114

4.3.5　Algorithm for Decision Tree Induction　119

4.3.6　An Example: Web Robot Detection　120

4.3.7　Characteristics of Decision Tree Induction　122

4.4　Model Overfitting　125

4.4.1　Overfitting Due to Presence of Noise　127

4.4.2　Overfitting Due to Lack of Representative Samples　129

4.4.3　Overfitting and the Multiple Comparison Procedure　129

4.4.4　Estimation of Generalization Errors　131

4.4.5　Handling Overfitting in Decision Tree Induction　134

4.5　Evaluating the Performance of a Classifier　135

4.5.1　Holdout Method　136

4.5.2　Random Subsampling　136

4.5.3　Cross-Validation　136

4.5.4　Bootstrap　137

4.6　Methods for Comparing Classifiers　137

4.6.1　Estimating a Confidence Interval for Accuracy　138

4.6.2　Comparing the Performance of　Two Models　139

4.6.3　Comparing the Performance of Two Classifiers　140

4.7　Bibliographic Notes　141

4.8　Exercises　144

5　Classification: Alternative Techniques　151

5.1　Rule-Based Classifier　151

5.1.1　How a Rule-Based Classifier Works　153

5.1.2　Rule-Ordering Schemes　154

5.1.3　How to Build a Rule-Based Classifier　155

5.1.4　Direct Methods for Rule Extraction　155

5.1.5　Indirect Methods for Rule Extraction　161

5.1.6　Characteristics of Rule-Based Classifiers　163

5.2　Nearest-Neighbor classifiers　163

5.2.1　Algorithm　165

5.2.2　Characteristics of Nearest-Neighbor Classifiers　165

5.3　Bayesian Classifiers　166

5.3.1　Bayes Theorem　166

5.3.2　Using the Bayes Theorem for Classification　168

5.3.3　Na?ve Bayes Classifier　169

5.3.4　Bayes Error Rate　175

5.3.5　Bayesian Belief Networks　176

5.4　Artificial Neural Network (ANN)　181

5.4.1　Perceptron　181

5.4.2　Multilayer Artificial Neural Network　184

5.4.3　Characteristics of ANN　187

5.5　Support Vector Machine (SVM)　188

5.5.1　Maximum Margin Hyperplanes　188

5.5.2　Linear SVM: Separable Case　190

5.5.3　Linear SVM: Nonseparable Case　195

5.5.4　Nonlinear SVM　198

5.5.5　Characteristics of SVM　203

5.6　Ensemble Methods　203

5.6.1　Rationale for Ensemble Method　203

5.6.2　Methods for Constructing an Ensemble Classifier　204

5.6.3　Bias-Variance Decomposition　206

5.6.4　Bagging　209

5.6.5　Boosting　211

5.6.6　Random Forests　215

5.6.7　Empirical Comparison among Ensemble Methods　216

5.7　Class Imbalance Problem　217

5.7.1　Alternative Metrics　218

5.7.2　The Receiver Operating Characteristic Curve　220

5.7.3　Cost-Sensitive Learning　223

5.7.4　Sampling-Based Approaches　225

5.8　Multiclass Problem　226

5.9　Bibliographic Notes　228

5.10 Exercises　233

6　Association Analysis: Basic Concepts and Algorithms　241

6.1　Problem Definition　242

6.2　Frequent Itemset Generation　244

6.2.1　The Apriori Principle　246

6.2.2　Frequent Itemset Generation in the Apriori Algorithm　247

6.2.3　Candidate Generation and Pruning　249

6.2.4　Support Counting　252

6.2.5　Computational Complexity　255

6.3　Rule Generation　257

6.3.1　Confidence-Based Pruning　258

6.3.2　Rule Generation in Apriori Algorithm　258

6.3.3　An Example: Congressional Voting Records　259

6.4　Compact Representation of Frequent Itemsets　260

6.4.1　Maximal Frequent Itemsets　260

6.4.2　Closed Frequent Itemsets　262

6.5　Alternative Methods for Generating Frequent Itemsets　264

6.6　FP-Growth Algorithm　268

6.6.1　FP-Tree Representation　268

6.6.2　Frequent Itemset Generation in FP-Growth Algorithm　270

6.7　Evaluation of Association Patterns　273

6.7.1　Objective Measures of Interestingness　274

6.7.2　Measures beyond Pairs of Binary Variables　282

6.7.3　Simpson's Paradox　283

6.8　Effect of Skewed Support Distribution　285

6.9　Bibliographic Notes　288

6.10 Exercises　298

7　Association Analysis: Advanced Concepts　307

7.1　Handling Categorical Attributes　307

7.2　Handling Continuous Attributes　309

7.2.1　Discretization-Based Methods　310

7.2.2　Statistics-Based Methods　312

7.2.3　Non-discretization Methods　314

7.3　Handling a Concept Hierarchy　316

7.4　Sequential Patterns　318

7.4.1　Problem Formulation　318

7.4.2　Sequential Pattern Discovery　320

7.4.3　Timing Constraints　323

7.4.4　Alternative Counting Schemes　327

7.5　Subgraph Patterns　328

7.5.1　Graphs and Subgraphs　329

7.5.2　Frequent Subgraph Mining　330

7.5.3　Apriori-like Method　332

7.5.4　Candidate Generation　333

7.5.5　Candidate Pruning　338

7.5.6　Support Counting　340

7.6　Infrequent Patterns　340

7.6.1　Negative Patterns　341

7.6.2　Negatively Correlated Patterns　342

7.6.3　Comparisons among Infrequent Patterns, Negative Patterns, and Negatively Correlated Patterns　343

7.6.4　Techniques for Mining Interesting Infrequent Patterns　344

7.6.5　Techniques Based on Mining Negative Patterns　345

7.6.6　Techniques Based on Support Expectation　347

7.7　Bibliographic Notes　350

7.8　Exercises　353

8　Cluster Analysis: Basic Concepts and Algorithms　363

8.1　Overview　365

8.1.1　What Is Cluster Analysis?　365

8.1.2　Different Types of Clusterings　366

8.1.3　Different Types of Clusters　368

8.2　K-means　370

8.2.1　The Basic K-means Algorithm　371

8.2.2　K-means: Additional Issues　378

8.2.3　Bisecting K-means　380

8.2.4　K-means and Different Types of Clusters　381

8.2.5　Strengths and Weaknesses　383

8.2.6　K-means as an Optimization Problem　383

8.3　Agglomerative Hierarchical Clustering　385

8.3.1　Basic Agglomerative Hierarchical Clustering Algorithm　385

8.3.2　Specific Techniques　387

8.3.3　The Lance-Williams Formula for Cluster Proximity　391

8.3.4　Key Issues in Hierarchical Clustering　391

8.3.5　Strengths and Weaknesses　393

8.4　DBSCAN　393

8.4.1　Traditional Density: Center-Based Approach　393

8.4.2　The DBSCAN Algorithm　394

8.4.3　Strengths and Weaknesses　398

8.5　Cluster Evaluation　398

8.5.1　Overview　399

8.5.2　Unsupervised Cluster Evaluation Using Cohesion and Separation　401

8.5.3　Unsupervised Cluster Evaluation Using the Proximity Matrix　406

8.5.4　Unsupervised Evaluation of Hierarchical Clustering　408

8.5.5　Determining the Correct Number of Clusters　409

8.5.6　Clustering Tendency　410

8.5.7　Supervised Measures of Cluster Validity　411

8.5.8　Assessing the Significance of Cluster Validity Measures　414

8.6　Bibliographic Notes　416

8.7　Exercises　419

9　Cluster Analysis: Additional Issues and Algorithms　427

9.1　Characteristics of Data, Clusters, and Clustering Algorithms　427

9.1.1　Example: Comparing K-means and DBSCAN　428

9.1.2　Data Characteristics　429

9.1.3　Cluster Characteristics　430

9.1.4　General Characteristics of Clustering Algorithms　431

9.2　Prototype-Based Clustering　433

9.2.1　Fuzzy Clustering　433

9.2.2　Clustering Using Mixture Models　437

9.2.3　Self-Organizing Maps (SOM)　446

9.3　Density-Based Clustering　451

9.3.1　Grid-Based Clustering　451

9.3.2　Subspace Clustering　454

9.3.3　DENCLUE: A Kernel-Based Scheme for Density-Based Clustering　457

9.4　Graph-Based Clustering　460

9.4.1　Sparsification　461

9.4.2　Minimum Spanning Tree (MST) Clustering　462

9.4.3　OPOSSUM: Optimal Partitioning of Sparse Similarities Using METIS　463

9.4.4　Chameleon: Hierarchical Clustering with Dynamic Modeling　464

9.4.5　Shared Nearest Neighbor Similarity　468

9.4.6　The Jarvis-Patrick Clustering Algorithm　471

9.4.7　SNN Density　472

9.4.8　SNN Density-Based Clustering　473

9.5　Scalable Clustering Algorithms　475

9.5.1　Scalability: General Issues and Approaches　476

9.5.2　BIRCH　477

9.5.3　CURE　479

9.6　Which Clustering Algorithm?　482

9.7　Bibliographic Notes　484

9.8　Exercises　488

10　Anomaly Detection　491

10.1　Preliminaries　492

10.1.1　Causes of Anomalies　492

10.1.2　Approaches to Anomaly Detection　493

10.1.3　The Use of Class Labels　494

10.1.4　Issues　495

10.2　Statistical Approaches　496

10.2.1　Detecting Outliers in a Univariate Normal Distribution　497

10.2.2　Outliers in a Multivariate Normal Distribution　499

10.2.3　A Mixture Model Approach for Anomaly Detection　500

10.2.4　Strengths and Weaknesses　502

10.3　Proximity-Based Outlier Detection　502

10.3.1　Strengths and Weaknesses　503

10.4　Density-Based Outlier Detection　504

10.4.1　Detection of Outliers Using Relative Density　505

10.4.2　Strengths and Weaknesses　506

10.5　Clustering-Based Techniques　506

10.5.1　Assessing the Extent to Which an Object Belongs to a Cluster　507

10.5.2　Impact of Outliers on the Initial Clustering　509

10.5.3　The Number of Clusters to Use　509

10.5.4　Strengths and Weaknesses　509

10.6　Bibliographic Notes　510

10.7　Exercises　513

作者介绍：

Pang-Ning Tan 现为密歇根州立大学计算机与工程系助理教授，主要教授数据挖掘、数据库系统等课程。此前，他曾是明尼苏达大学美国陆军高性能计算研究中心副研究员（2002-2003）。

Michael Steinbach 明尼苏达大学计算机与工程系研究员，在读博士。

Vipin Kumar

出版社信息：

暂无出版社相关信息，正在全力查找中！

书籍摘录：

暂无相关书籍摘录，正在全力查找中！

在线阅读/听书/购买/PDF下载地址：

在线阅读地址：数据挖掘导论(英文版)/图灵原版计算机科学系列在线阅读

在线听书地址：数据挖掘导论(英文版)/图灵原版计算机科学系列在线收听

在线购买地址：数据挖掘导论(英文版)/图灵原版计算机科学系列在线购买

原文赏析：

data mining is an integral part of knowledge discovery in databases(KDD), which is the overall process of converting raw data into useful information

精度通常用值集合的标准差度量

例27 澳大利亚降水量

设x和y是两个点，其中y是原来的点，而x是它的某个失真或近似，例如，x可能是由于添加了一些随机噪声到y上而产生的。损失函数的目的是度量用x近似y导致的失真或损失。当然，x和y越类似，失真或损失就越小，因而Bregman散度可以用作相异性函数。

是一个函数与该函数线性近似之差

像最近邻这样的消极学习方法不需要建立模型，然而，分类测试样例的开销很大，因为需要逐个计算测试样例和训练样例之间的相似度。

最近邻分类器基于局部信息进行预测，而决策树和基于规则的分类器试图找到一个拟合整个输入空间的全局模型。正式因为这样的局部分类决策，最近邻分类器（k很小时）对噪声非常敏感。

最近邻分类器可以生成任意形状的决策边界，这样的决策边界与决策树和基于规则的分类器通常所局限的直线决策边界相比，能提供更灵活的模型表示。

除非采用适当的临近性度量和数据预处理，否则最近邻分类器可能做出错误的决策。

其它内容：

编辑推荐

“这是一本全新的数据挖掘教材，值得大力推荐。”

——Jiawei Han,伊利诺伊大学教授

本书全面介绍了数据挖掘，涵盖了五个主题：数据、分类、关联分析、聚类和异常检测。除异常检测外，每个主题都有两章：前一章涵盖基本概念、代表性算法和评估技术，而后一章讨论高级概念和算法。这样读者在透彻地理解数据挖掘的基础的同时，还能够了解更多重要的高级主题。

本书是明尼苏达大学和密歇根州立大学数据挖掘课程的教材，由于独具特色，正式出版之前就已经被斯坦福大学、得克萨斯大学奥斯汀分校等众多名校采用。

本书特色：

·与许多其他同类图书不同，本书将重点放在如何用数据挖掘知识解决各种实际问题。

·只要求具备很少的预备知识——不需要数据库背景，只需要很少的统计学或数学背景知识。

·书中包含大量的图表、综合示例和丰富的习题，并且使用示例、关键算法的简洁描述和习题，尽可能直接地聚集于数据挖掘的主要概念。

·教辅内容极为丰富，包括课程幻灯片、学生课题建议、数据挖掘资源（如数据挖掘算法和数据集）、联机指南（使用实际的数据集和数据分析软件，为本书介绍的部分数据挖掘技术提供例子讲解）。

·为采用本书作为教材的教师提供习题解答。

书籍介绍

《数据挖掘导论》(英文版)对数据挖掘进行了全面介绍，旨在为读者提供将数据挖掘应用于实际问题所必需的知识。《数据挖掘导论》(英文版)涵盖五个主题：数据、分类、关联分析、聚类和异常检测。除异常检测外，每个主题都有两章：前面一章讲述基本概念、代表性算法和评估技术，而后面一章较深入地讨论高级概念和算法。目的是在使读者透彻地理解数据挖掘基础的同时，还能了解更多重要的高级主题。此外，书中还提供了大量例子、图表和习题。

书籍真实打分

故事情节：9分
人物塑造：5分
主题深度：4分
文字风格：9分
语言运用：3分
文笔流畅：9分
思想传递：9分
知识深度：9分
知识广度：7分
实用性：5分
章节划分：7分
结构布局：8分
新颖与独特：9分
情感共鸣：9分
引人入胜：7分
现实相关：4分
沉浸感：3分
事实准确性：8分
文化贡献：5分

网站评分

书籍多样性：5分
书籍信息完全性：6分
网站更新速度：8分
使用便利性：9分
书籍清晰度：5分
书籍格式兼容性：3分
是否包含广告：9分
加载速度：3分
安全性：7分
稳定性：9分
搜索功能：3分
下载便捷性：4分

下载点评

无缺页(587+)
强烈推荐(487+)
实惠(410+)
三星好评(431+)
赞(60+)
排版满分(169+)
体验好(199+)
少量广告(271+)
无颠倒(529+)
内容齐全(157+)
超值(531+)
还行吧(99+)

下载评价

网友石***致：
挺实用的，给个赞！希望越来越好，一直支持。
网友索***宸：
书的质量很好。资源多
网友温***欣：
可以可以可以
网友冉***兮：
如果满分一百分，我愿意给你99分，剩下一分怕你骄傲
网友步***青：
。。。。。好
网友仰***兰：
喜欢！很棒！！超级推荐！
网友宫***凡：
一般般，只能说收费的比免费的强不少。
网友冯***卉：
听说内置一千多万的书籍，不知道真假的
网友芮***枫：
有点意思的网站，赞一个真心好好好哈哈
网友通***蕊：
五颗星、五颗星，大赞还觉得不错！~~
网友曹***雯：
为什么许多书都找不到？
网友蓬***之：
好棒good
网友丁***菱：
好好好好好好好好好好好好好好好好好好好好好好好好好
网友詹***萍：
好评的，这是自己一直选择的下载书的网站

喜欢"数据挖掘导论(英文版)/图灵原版计算机科学系列"的人也看了

全格式电子版 - 免费下载

数据挖掘导论(英文版)/图灵原版计算机科学系列

【点击查看】直接下载文件

数据挖掘导论(英文版)/图灵原版计算机科学系列分类索引数据信息

ISBN：9787115141446
出版社：暂无出版社
出版日期：2006-01
作者：暂无作者
TAGS：暂无

随机推荐

数据挖掘导论(英文版)/图灵原版计算机科学系列下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

数据挖掘导论(英文版)/图灵原版计算机科学系列电子书下载地址

数据挖掘导论(英文版)/图灵原版计算机科学系列书籍详细信息

内容简介：

书籍目录：

作者介绍：

出版社信息：

书籍摘录：

在线阅读/听书/购买/PDF下载地址：

原文赏析：

其它内容：

书籍真实打分

网站评分

下载点评

下载评价

喜欢"数据挖掘导论(英文版)/图灵原版计算机科学系列"的人也看了

2022考研英语<二>5大题型解题套路老蒋超精讲(第4版总第8版共3册) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

吴立岗作文教学研究论集下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

2014-中国高校实力大全-填报高考志愿指南-第6版-(本科) 王一多编 9787502639662 中国质检出版社，中国标准出版社下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

几米：星空(新版) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

象形太J十三势( 货号:750095633001) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

树冠英语·心灵成长11-12年级（进阶卷）商务印书馆牛津阅读树进阶读物17-18岁高二高三年级英语阅读读物下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

计算机应用基础(Windows 7+Office 2010)(工业和信息化高职高专“十二五”规划立项教材) 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

Oxford Bookworms Library: Level 2: The Importance of Being Earnest Playscript 牛津书虫分级读物2级：不可儿戏（英文原版）下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

项目采购管理（第2版）下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

历代名家碑帖经典：黄庭坚廉颇蔺相如传下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

数据挖掘导论(英文版)/图灵原版计算机科学系列 下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线

数据挖掘导论(英文版)/图灵原版计算机科学系列电子书下载地址

数据挖掘导论(英文版)/图灵原版计算机科学系列书籍详细信息

内容简介：

书籍目录：

作者介绍：

出版社信息：

书籍摘录：

在线阅读/听书/购买/PDF下载地址：

原文赏析：

其它内容：

书籍真实打分

网站评分

下载点评

下载评价

喜欢"数据挖掘导论(英文版)/图灵原版计算机科学系列"的人也看了

数据挖掘导论(英文版)/图灵原版计算机科学系列下载 pdf 百度网盘 epub 免费 2025 电子版 mobi 在线