




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
1、DATA WAREHOUSEData warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. A large number of organizations have found that data warehouse systems are valuable tools in today's competitive, fast e
2、volving world. In the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon a way to keep customers by learning more about
3、 their needs.“So", you may ask, full of intrigue, “what exactly is a data warehouse?"Data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. Loosely speaking, a data warehouse refers to a database that is maintained separately from an organiz
4、ation's operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated, historical data for analysis.According to W. H. Inmon, a leading architect in the construction o
5、f data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management's decision making process." This short, but comprehensive definition presents the major features of a data warehouse. The four keywords, s
6、ubject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. Let's take a closer look at each of these key features.(1).Subject-oriented: A data w
7、arehouse is organized around major subjects, such as customer, vendor, product, and sales. Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses t
8、ypically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2) Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-li
9、ne transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on. (3).Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5-10 years). Every
10、 key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does
11、not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decis
12、ion support data model and stores the information on which an enterprise needs to make strategic decisions. A data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting,
13、 and decision making.“OK", you now ask, “what, then, is data warehousing?"Based on the above, we view data warehousing as the process of constructing and using data warehouses. The construction of a data warehouse requires data integration, data cleaning, and data consolidation. The utiliz
14、ation of a data warehouse often necessitates a collection of decision support technologies. This allows “knowledge workers" (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on informatio
15、n in the warehouse. Some authors use the term “data warehousing" to refer only to the process of data warehouse construction, while the term warehouse DBMS is used to refer to the management and utilization of data warehouses. We will not make this distinction here.“How are organizations using
16、the information from data warehouses?" Many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appet
17、ites for spending), (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies, (3) analyzing operations and looking for sources of profit, (4) managing the customer rela
18、tionships, making environmental corrections, and managing the cost of corporate assets.Data warehousing is also very useful from the point of view of heterogeneous database integration. Many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneo
19、us, autonomous, and distributed information sources. To integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. Much effort has been spent in the database industry and research community towards achieving this goal.The traditional database approach to h
20、eterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. A variety of data joiner and data blade products belong to this category. When a query is posed to a client site, a metadata dictionary is used to translate the query in
21、to queries appropriate for the individual heterogeneous sites involved. These queries are then mapped and sent to local query processors. The results returned from the different sites are integrated into a global answer set. This query-driven approach requires complex information filtering and integ
22、ration processes, and competes for resources with processing at local sources. It is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.Data warehousing provides an interesting alternative to the traditional approach of heterogeneous database in
23、tegration described above. Rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. Unlike on-line transaction processin
24、g databases, data warehouses do not contain the most current information. However, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. Furthermo
25、re, query processing in data warehouses does not interfere with the processing at local sources. Moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. As a result, data warehousing has become very popular in industry.1.
26、0; Differences between operational database systems and data warehousesSince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.The major task of on-line operational database syst
27、ems is to perform on-line transaction and query processing. These systems are called on-line transaction processing (OLTP) systems. They cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. Data war
28、ehouse systems, on the other hand, serve users or “knowledge workers" in the role of data analysis and decision making. Such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. These systems are known as on-line analytical p
29、rocessing (OLAP) systems.The major distinguishing features between OLTP and OLAP are summarized as follows.(1). Users and system orientation: An OLTP system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. An OLAP sys
30、tem is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2). Data contents: An OLTP system manages current data that, typically, are too detailed to be easily used for decision making. An OLAP system manages large amounts of historical
31、data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. These features make the data easier for use in informed decision making.(3). Database design: An OLTP system usually adopts an entity-relationship (ER) data model and a
32、n application -oriented database design. An OLAP system typically adopts either a star or snowflake model, and a subject-oriented database design.(4). View: An OLTP system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different
33、 organizations. In contrast, an OLAP system often spans multiple versions of a database schema, due to the evolutionary process of an organization. OLAP systems also deal with information that originates from different organizations, integrating information from many data stores. Because of their hu
34、ge volume, OLAP data are stored on multiple storage media.(5). Access patterns: The access patterns of an OLTP system consist mainly of short, atomic transactions. Such a system requires concurrency control and recovery mechanisms. However, accesses to OLAP systems are mostly read-only operations (s
35、ince most data warehouses store historical rather than up-to-date information), although many could be complex queries. Other features which distinguish between OLTP and OLAP systems include database size, frequency of operations, and performance metrics and so on.2. Bu
36、t, why have a separate data warehouse?“Since operational databases store huge amounts of data", you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?"A major reason fo
37、r such a separation is to help promote the high performance of both systems. An operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned" queries. On the other hand, data
38、warehouse queries are often complex. They involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. Processing OLAP queries in operational databases would substantial
39、ly degrade the performance of operational tasks.Moreover, an operational database supports the concurrent processing of several transactions. Concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. An OLAP query
40、 often needs read-only access of data records for summarization and aggregation. Concurrency control and recovery mechanisms, if applied for such OLAP operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an OLTP system.Finally, the separ
41、ation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. Decision support requires historical data, whereas operational databases do not typically maintain historical data. In this context, the data in operational
42、databases, though abundant, is usually far from complete for decision making. Decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. In contrast, operational databases contain only d
43、etailed raw data, such as transactions, which need to be consolidated before analysis. Since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.數據倉庫數據倉庫為商務運作提供結構與工具,以便系統地組織、理解和使用數據進行決策。大量組織機構已經發現,在當今這個充滿競爭、快速發展的
44、世界,數據倉庫是一個有價值的工具。在過去的幾年中,許多公司已花費數百萬美元,建立企業范圍的數據倉庫。許多人感到,隨著工業競爭的加劇,數據倉庫成了必備的最新營銷武器通過更多地了解客戶需求而保住客戶的途徑。“那么”,你可能會充滿神秘地問,“到底什么是數據倉庫?”數據倉庫已被多種方式定義,使得很難嚴格地定義它。寬松地講,數據倉庫是一個數據庫,它與組織機構的操作數據庫分別維護。數據倉庫系統允許將各種應用系統集成在一起,為統一的歷史數據分析提供堅實的平臺,對信息處理提供支持。按照W. H. Inmon,一位數據倉庫系統構造方面的領頭建筑師的說法,“數據倉庫是一個面向主題的、集成的、時變的、非易失的數據集
45、合,支持管理決策制定”。這個簡短、全面的定義指出了數據倉庫的主要特征。四個關鍵詞,面向主題的、集成的、時變的、非易失的,將數據倉庫與其它數據存儲系統(如,關系數據庫系統、事務處理系統、和文件系統)相區別。讓我們進一步看看這些關鍵特征。(1)、 面向主題的:數據倉庫圍繞一些主題,如顧客、供應商、產品和銷售組織。數據倉庫關注決策者的數據建模與分析,而不是構造組織機構的日常操作和事務處理。因此,數據倉庫排除對于決策無用的數據,提供特定主題的簡明視圖。(2)、集成的:通常,構造數據倉庫是將多個異種數據源,如關系數據庫、一般文件和聯機事務處理記錄,集成在一起。使用數據清理和數據集成技術,確保命名約定、編
46、碼結構、屬性度量的一致性等。(3)、時變的:數據存儲從歷史的角度(例如,過去5-10 年)提供信息。數據倉庫中的關鍵結構,隱式或顯式地包含時間元素。(4)、 非易失的:數據倉庫總是物理地分離存放數據;這些數據源于操作環境下的應用數據。由于這種分離,數據倉庫不需要事務處理、恢復和并行控制機制。通常,它只需要兩種數據訪問:數據的初始化裝入和數據訪問。概言之,數據倉庫是一種語義上一致的數據存儲,它充當決策支持數據模型的物理實現,并存放企業決策所需信息。數據倉庫也常常被看作一種體系結構,通過將異種數據源中的數據集成在一起而構造,支持結構化和啟發式查詢、分析報告和決策制定。“好”,你現在問,“那么,什么
47、是建立數據倉庫(data warehousing)?”根據上面的討論,我們把建立數據倉庫看作構造和使用數據倉庫的過程。數據倉庫的構造需要數據集成、數據清理、和數據統一。利用數據倉庫常常需要一些決策支持技術。這使得“知識工人”(例如,經理、分析人員和主管)能夠使用數據倉庫,快捷、方便地得到數據的總體視圖,根據數據倉庫中的信息做出準確的決策。有些作者使用術語“建立數據倉庫”表示構造數據倉庫的過程,而用術語“倉庫DBMS”表示管理和使用數據倉庫。我們將不區分二者。“組織機構如何使用數據倉庫中的信息?”許多組織機構正在使用這些信息支持商務決策活動,包括:(1)、增加顧客關注,包括分析顧客購買模式(如,
48、喜愛買什么、購買時間、預算周期、消費習慣);(2)、根據季度、年、地區的營銷情況比較,重新配置產品和管理投資,調整生產策略;(3)、分析運作和查找利潤源;(4)、管理顧客關系、進行環境調整、管理合股人的資產開銷。從異種數據庫集成的角度看,數據倉庫也是十分有用的。許多組織收集了形形色色數據,并由多個異種的、自治的、分布的數據源維護大型數據庫。集成這些數據,并提供簡便、有效的訪問是非常希望的,并且也是一種挑戰。數據庫工業界和研究界都正朝著實現這一目標竭盡全力。對于異種數據庫的集成,傳統的數據庫做法是:在多個異種數據庫上,建立一個包裝程序和一個集成程序(或仲裁程序)。這方面的例子包括IBM 的數據連
49、接程序 (Data Joiner) 和Informix的數據刀(DataBlade)。當一個查詢提交客戶站點,首先使用元數據字典對查詢進行轉換,將它轉換成相應異種站點上的查詢。然后,將這些查詢映射和發送到局部查詢處理器。由不同站點返回的結果被集成為全局回答。這種查詢驅動的方法需要復雜的信息過濾和集成處理,并且與局部數據源上的處理競爭資源。這種方法是低效的,并且對于頻繁的查詢,特別是需要聚集操作的查詢,開銷很大。對于異種數據庫集成的傳統方法,數據倉庫提供了一個有趣的替代方案。數據倉庫使用更新驅動的方法,而不是查詢驅動的方法。這種方法將來自多個異種源的信息預先集成,并存儲在數據倉庫中,供直接查詢和分析。與聯機事務處理數據庫不同,數據倉庫不包含最近的信息。然而,數據倉庫為集成的異種數據庫系
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 中國家用物聯網行業發展監測及投資戰略研究報告
- 2025年智能電網成套設備項目綜合評估報告
- 中國無線路由器行業市場前景預測及投資價值評估分析報告
- 四川垃圾箱項目投資分析報告參考范文
- 聚氨酯粘合劑項目投資價值分析報告
- 中國梯支行業市場發展前景及發展趨勢與投資戰略研究報告(2024-2030)
- 中國氣象服務市場發展現狀調查及投資趨勢前景分報告
- 專利實施許可合同
- 管道疏通服務合同
- 2025年中國面粉包裝袋行業市場發展前景及發展趨勢與投資戰略研究報告
- KRONES灌裝檢測工作原理及工藝參數調整
- SJG 01-2010 深圳市地基基礎勘察設計規范
- 裝修業務居間推廣合同
- 物業維修流程培訓
- 大學美育(同濟大學)學習通測試及答案
- 2024年中考模擬試卷數學(湖南卷)
- 醫院培訓課件:《便攜式血糖儀臨床操作和質量管理》
- 持續葡萄糖監測臨床應用專家共識2024解讀
- 充電樁工程施工技術方案
- 《冠心病的規范化診》課件
- 《數據挖掘與機器學習》 課件7.2.1 K-Means聚類
評論
0/150
提交評論