


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Exam; Professor: Liu; Class: Data Mining and Text Mining; Subject: Computer Science; University: University of Illinois - Chicago; Term: Unknown 1989;
Typology: Exams
1 / 4
This page cannot be seen from the preview
Don't miss anything!
(b) (3%) How do you describe overfitting in classification?
(c) (3%) Given the following decision tree, generate all the rules from the tree. Note that
we have two classes, Yes and No.
(d) List three objective interestingness measures of rules, and list two subjective
interestingness measures of rules. No need to explain.
(e) (5) To build a naïve Bayesian classifier, we can make use of association rule mining.
How to compute P(A i
= a j
| C= c k
) from association rules, where A i
is an attribute and a j
is a value of A i
, and c k
is a class value of the class attribute C?
a1 a2 a
We want to mine all the large (or frequent) itemsets in the data. Assume the minimum
support is 30%. Following the Apriori algorithm, give the set of large itemsets in L 1
2
and candidate itemsets in C 2
3
, …. (after the join step and the prune step). What additional
pruning can be done in candidate generation and how?
Age
Sex income
job
Yes
= 40 < 40
M F
=50k
<50k
Yes
No
No Yes
y n
support to each item, called minimum item support (MIS). We define that an itemset, {item1,
item2, …}, is large (or frequent ) if its support is greater than or equal to
min(MIS(item1), MIS(item2), …..)
Given the transaction data:
{Beef, Bread}
{Bread, Cloth}
{Bread, Cloth, Milk}
{Cheese, Boots}
{Beef, Bread, Cheese, Shoes}
{Beef, Bread, Cheese, Milk}
{Bread, Milk, Cloth}
If we have the following minimum item support assignments for the items in the transaction
data,
MIS(Milk) = 50%,
MIS(Bread) = 70%
The MIS values for the rest of the items in the data are all 25%.
Following the MSapriori algorithm, give the set of large (or frequent) itemsets in L 1
2
compute all the probability values required to build a naïve bayesian classifier. Ignore
smoothing.
Answer:
P(C = y) = P(C= n) =
P(A=m | C=y) =
P(A=g | C=y) =
P(A=h | C=y) =
P(A=m | C=n) =
P(A=g | C=n) =
P(A=h | C=n) =
P(B=t | C=y) =
P(B=s | C=y) =
P(B=q | C=y) =
P(B=t | C=n) =
P(B=s | C=n) =
P(B=q | C=n) =
11, 20, 23, 27, 30, 34, 100, 120, 130. You are required to draw the cluster tree and write the
value of the cluster center represented by each node next to the node.
m t y
m s y
g q y
h s y
g q y
g q n
g s n
h t n
h q n
m t n
accuracy , precision , and recall scores of the positive data.
a1 a2 a
we want to mine all the large (or frequent) itemsets using the multiple minimum support
technique. If we have the following minimum item support assignments for the items,
MIS(a2=F) = 60%,
The MIS values for the rest of the items in the data are all 30%.
Following the MSapriori algorithm, give the set of large (or frequent) itemsets in L 1
2
and candidate itemsets in C
2
3
, … (after the join step and the prune step)?
Classified as
Correct
Positive Negative
50 10 Positive
5 200 Negative