求解：数据仓库与数据挖掘题1

tianxuepiao 2007-01-09 01:41:23

不好意思是英文版的。如果分嫌少可以在加！请各位达人帮忙，谢了！
每题10分！

一、Data warehouse design
(1) Enumerate three classes of schemas that are popularly used for modeling data warehouses.
(2) Draw a snowflake schema diagram for the Big_University data warehouse which consists of four dimensions: student, course, semester and instructor, and two measures: count, and avg_grade, where avg_grade is the actual grade of student in the lowest concept layer, whereas in the higher concept layers, avg_grade is the average grade for the given student, course, semester and instructor.
(3) Starting with the base cuboid (student, course, semester, instructor), what specific OLAP operations should be performed in order to list the average grade of each student taken the course of “CS”, eg, roll up from “semester” to “year”?
(4) If each dimension contains 5 layers(including all), eg, student < major < status < university < all, then how many cuboids in this data cube ( including base cuboid and apex cuboid)?

二、Data cube computation
Suppose a base cuboid has 3 dimensions, (A, B, C), with the number of cells shown below: |A| = 1,000,000, |B| = 100, and |C| = 1,000. Suppose each dimension is partitioned evenly into 10 portions for chunking.
(1) Assuming each dimension has only one level, draw the complete lattice of the cube.
(2) If each cube cell stores one measure with 4 bytes, what is the total size of the computed cube if the cube is dense?
(3) If the cube is very sparse, describe an effective multidimensional array structure to store the sparse cube.
(4) State the order for computing the chunks in the cube which requires the least amount of space, and compute the total amount of main memory space required for computing the 2-D planes.

三、Mining association rules
Suppose we have the following transactional data.
TID Items_bought
T100 {K, A, D, B}
T200 {D, A, C, E, B}
T300 {C, A, B, E}
T400 {B, A, D}
Assume that the minimum support and minimum confidence thresholds are 60% and 80%, respectively.
(1) Find the set of frequent itemsets using the Apriori algorithm and FP-tree respectively. Show the derivation of Ck and Lk for each iteration k in Apriori algorithm and show the “conditional pattern base, conditional FP-tree, frequent patterns” for each item in FP-tree as showed in Table 6-1 of textbook.
(2) Generate strong association rules from the frequent itemsets (with support and confidence) found above.

...全文

1130 9 打赏收藏转发到动态举报

写回复

用AI写文章

9 条回复

切换为时间正序

请发表友善的回复…

发表回复

guxiangdefeng 2007-10-22

打赏
举报

学习，加油！

w75251455 2007-02-01

打赏
举报

pengruihua 2007-01-31

打赏
举报

thanks

menmang 2007-01-25

打赏
举报

I think I can help you come out with the answers, if I still remember.
You've gotta give me 1-2 days, some of the answers will require drawing.

tianxuepiao 2007-01-11

打赏
举报

谢谢，不过其他的题目楼上的会吗？如果都做出来就分全给你，不够再加！

tianxuepiao 2007-01-10

打赏
举报

老大，我不是让你翻译，是请你把答案给我，最好是英文！谢谢了！^_^!

amu0528 2007-01-10

打赏
举报

1) Enumerate three classes of schemas that are popularly used for modeling data warehouses.
a:
star schema,snowflake
其他的记不住了
建议看oracle帮助 data warehousing guide 里头都有

amu0528 2007-01-09