Star Schema

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 1908 Experts worldwide ranked by ideXlab platform

Il-yeol Song - One of the best experts on this subject based on the ideXlab platform.

  • ER - SAMStar: An Automatic Tool for Generating Star Schemas from an Entity-Relationship Diagram
    Lecture Notes in Computer Science, 2008
    Co-Authors: Il-yeol Song, Ritu Khare, Suan Lee, Sang-pil Kim, Jinho Kim, Yangsae Moon
    Abstract:

    While online transaction processing (OLTP) databases are modeled with Entity-Relationship Diagrams (ERDs), data warehouses constructed from these OLTP DBs are usually represented as Star Schema. Designing data warehouse Schemas, however, is very time consuming. We present a prototype system, SAMStar, which automatically generates Star Schemas from an ERD. The system takes an ERD drawn by ERwin Data Modeler as an input and generates Star Schemas. SAMStar uses the Connection Topology Value [1] which is the syntactic structural information embedded in an ERD. SAMStar displays the resulting Star Schemas on a computer screen graphically. With this automatic generation of Star Schema, this system helps designers reduce their efforts and time in building data warehouse Schemas.

  • DOLAP - SAMStar: a semi-automated lexical method for generating Star Schemas from an entity-relationship diagram
    Proceedings of the ACM tenth international workshop on Data warehousing and OLAP - DOLAP '07, 2007
    Co-Authors: Il-yeol Song, Ritu Khare, Bing Dai
    Abstract:

    The Star Schema is widely accepted as the de facto data model for data warehouse design. A popular approach for developing a Star Schema is to develop it from an entity-relationship diagram with some heuristics. Most of the existing approaches analyze the semantics of an ERD to generate a Star Schema. In this paper, we present the SAMStar method, which semi-automatically generates Star Schemas from an ERD by analyzing its semantics as well as structure. The novel features of SAMStar are (1) the use of the notion of Connection Topology Value (CTV) in identifying the candidates of facts and dimensions and (2) the use of Annotated Dimensional Design Patterns (A_DDP) as well as WordNet to extend the list of dimensions. We illustrate our method by applying it to the examples from existing literature. We prove that the outputs of our method are a superset of those of the existing methods. The SAMStar method simplifies the work of experienced designers and gives a smooth head-Start to novices.

  • DMDW - An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling
    2001
    Co-Authors: Il-yeol Song, William Rowen, Carl E. Medsker, Edward Ewen
    Abstract:

    Star Schema, which maintains one-to-many relationships between dimensions and a fact table, is widely accepted as the most viable data representation for dimensional analysis. Realworld DW Schema, however, frequently includes many-to-many relationships between a dimension and a fact table. Having those relationships in a dimensional model causes several difficult issues, such as losing the simplicity of the Star Schema structure, increasing complexity in forming queries, and degrading query performance by adding more joins. Therefore, it is desirable to represent the many-to-many relationships with correct semantics while still keeping the structure of the Star Schema. In this paper, we analyze many-to-many relationships between a dimension table and a fact table in dimensional modeling. We illustrate six different approaches and show the advantages and disadvantages of each. We propose two ad-hoc methods that maintain a Star Schema structure by denormalizing the dimensions to avoid many-tomany relationships. This method allows quick query processing by using a concatenated attribute with minimal overhead. Other issues addressed are data redundancy, weighting factors, storage requirements, and performance concerns.

  • data warehouse design for e commerce environments
    Evolution and Change in Data Management, 1999
    Co-Authors: Il-yeol Song, Kelly Levanshultz
    Abstract:

    Data warehousing and electronic commerce are two of the most rapidly expanding fields in recent information technologies. In this paper, we discuss the design of data warehouses for e-commerce environments. We discuss requirement analysis, logical design, and aggregation in e-commerce environments. We have collected an extensive set of interesting OLAP queries for e-commerce environments, and classified them into categories. Based on these OLAP queries, we illustrate our design with data warehouse bus architecture, dimension table structures, a base Star Schema, and an aggregation Star Schema. We finally present various physical design considerations for implementing the dimensional models. We believe that our collection of OLAP queries and dimensional models would be very useful in developing any real-world data warehouses in e-commerce environments.

  • ER (Workshops) - Data Warehouse Design for E-Commerce Environments
    Lecture Notes in Computer Science, 1999
    Co-Authors: Il-yeol Song, Kelly Levan-shultz
    Abstract:

    Data warehousing and electronic commerce are two of the most rapidly expanding fields in recent information technologies. In this paper, we discuss the design of data warehouses for e-commerce environments. We discuss requirement analysis, logical design, and aggregation in e-commerce environments. We have collected an extensive set of interesting OLAP queries for e-commerce environments, and classified them into categories. Based on these OLAP queries, we illustrate our design with data warehouse bus architecture, dimension table structures, a base Star Schema, and an aggregation Star Schema. We finally present various physical design considerations for implementing the dimensional models. We believe that our collection of OLAP queries and dimensional models would be very useful in developing any real-world data warehouses in e-commerce environments.

Bolong Zheng - One of the best experts on this subject based on the ideXlab platform.

  • Efficiently Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs
    IEEE Transactions on Knowledge and Data Engineering, 2020
    Co-Authors: Lu Chen, Yunjun Gao, Xingrui Huang, Christian S. Jensen, Bolong Zheng
    Abstract:

    Clustering graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be considered, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarities. We employ DBSCAN for clustering, and update edge weights iteratively to balance the importance of different attributes. The rapidly growing volume of data nowadays challenges traditional clustering algorithms, and thus, a distributed method is required. Hence, we adopt a popular distributed graph computing system Blogel, based on which, we develop four exact and approximate approaches that enable efficient PPR score computation when edge weights are updated. To improve the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. Also, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of our proposals.

  • efficient and incremental clustering algorithms on Star Schema heterogeneous graphs
    International Conference on Data Engineering, 2019
    Co-Authors: Lu Chen, Yunjun Gao, Christian S. Jensen, Yuanliang Zhang, Bolong Zheng
    Abstract:

    Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

  • ICDE - Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs
    2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019
    Co-Authors: Lu Chen, Yunjun Gao, Christian S. Jensen, Yuanliang Zhang, Bolong Zheng
    Abstract:

    Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

Stephen Revilak - One of the best experts on this subject based on the ideXlab platform.

  • the Star Schema benchmark and augmented fact table indexing
    Lecture Notes in Computer Science, 2009
    Co-Authors: Patrick Oneil, Elizabeth Oneil, Xuedong Chen, Stephen Revilak
    Abstract:

    We provide a benchmark measuring Star Schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2's Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a Star Schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.

  • TPCTC - The Star Schema Benchmark and Augmented Fact Table Indexing
    Lecture Notes in Computer Science, 2009
    Co-Authors: Patrick O'neil, Xuedong Chen, Elizabeth O'neil, Stephen Revilak
    Abstract:

    We provide a benchmark measuring Star Schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2's Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a Star Schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.

Lu Chen - One of the best experts on this subject based on the ideXlab platform.

  • Efficiently Distributed Clustering Algorithms on Star-Schema Heterogeneous Graphs
    IEEE Transactions on Knowledge and Data Engineering, 2020
    Co-Authors: Lu Chen, Yunjun Gao, Xingrui Huang, Christian S. Jensen, Bolong Zheng
    Abstract:

    Clustering graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be considered, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarities. We employ DBSCAN for clustering, and update edge weights iteratively to balance the importance of different attributes. The rapidly growing volume of data nowadays challenges traditional clustering algorithms, and thus, a distributed method is required. Hence, we adopt a popular distributed graph computing system Blogel, based on which, we develop four exact and approximate approaches that enable efficient PPR score computation when edge weights are updated. To improve the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. Also, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets demonstrate the effectiveness and efficiency of our proposals.

  • efficient and incremental clustering algorithms on Star Schema heterogeneous graphs
    International Conference on Data Engineering, 2019
    Co-Authors: Lu Chen, Yunjun Gao, Christian S. Jensen, Yuanliang Zhang, Bolong Zheng
    Abstract:

    Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

  • ICDE - Efficient and Incremental Clustering Algorithms on Star-Schema Heterogeneous Graphs
    2019 IEEE 35th International Conference on Data Engineering (ICDE), 2019
    Co-Authors: Lu Chen, Yunjun Gao, Christian S. Jensen, Yuanliang Zhang, Bolong Zheng
    Abstract:

    Many datasets including social media data and bibliographic data can be modeled as graphs. Clustering such graphs is able to provide useful insights into the structure of the data. To improve the quality of clustering, node attributes can be taken into account, resulting in attributed graphs. Existing attributed graph clustering methods generally consider attribute similarity and structural similarity separately. In this paper, we represent attributed graphs as Star-Schema heterogeneous graphs, where attributes are modeled as different types of graph nodes. This enables the use of personalized pagerank (PPR) as a unified distance measure that captures both structural and attribute similarity. We employ DBSCAN for clustering, and we update edge weights iteratively to balance the importance of different attributes. To improve the efficiency of the clustering, we develop two incremental approaches that aim to enable efficient PPR score computation when edge weights are updated. To boost the effectiveness of the clustering, we propose a simple yet effective edge weight update strategy based on entropy. In addition, we present a game theory based method that enables trading efficiency for result quality. Extensive experiments on real-life datasets offer insight into the effectiveness and efficiency of our proposals, compared with existing methods.

Marcin Gorawski - One of the best experts on this subject based on the ideXlab platform.

  • PPAM (1) - Extended cascaded Star Schema for distributed spatial data warehouse
    Parallel Processing and Applied Mathematics, 2010
    Co-Authors: Marcin Gorawski
    Abstract:

    In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded Star Schema in spatial distributed data warehouse DSDW was defined. Research proven that there is a strong need for building many SDW's extended cascaded Star Schemas as an outcome of separate spatio-temporal telemetric conceptual models. For one of these new data Schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.

  • extended cascaded Star Schema and ecolap operations for spatial data warehouse
    Intelligent Data Engineering and Automated Learning, 2009
    Co-Authors: Marcin Gorawski
    Abstract:

    In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded Star Schema in spatial telemetric data warehouse SDW(t) was defined. Research proven that there is a strong need for building many SDW's extended cascaded Star Schemas as an outcome of separate spatio-temporal conceptual models. For one of these new data Schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.

  • extended cascaded Star Schema for distributed spatial data warehouse
    Parallel Processing and Applied Mathematics, 2009
    Co-Authors: Marcin Gorawski
    Abstract:

    In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded Star Schema in spatial distributed data warehouse DSDW was defined. Research proven that there is a strong need for building many SDW's extended cascaded Star Schemas as an outcome of separate spatio-temporal telemetric conceptual models. For one of these new data Schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.

  • IDEAL - Extended cascaded Star Schema and ECOLAP operations for spatial data warehouse
    Intelligent Data Engineering and Automated Learning - IDEAL 2009, 2009
    Co-Authors: Marcin Gorawski
    Abstract:

    In this paper several new aspects of spatial data warehouse modeling are presented. The extended cascaded Star Schema in spatial telemetric data warehouse SDW(t) was defined. Research proven that there is a strong need for building many SDW's extended cascaded Star Schemas as an outcome of separate spatio-temporal conceptual models. For one of these new data Schemas, the definitions of cascaded ECOLAP operations were presented. These operations base on a relation algebra, and make possible ad-hoc queries executing.

  • Distributed spatial data warehouse
    2004
    Co-Authors: Marcin Gorawski, Rafal Malczok
    Abstract:

    Data warehouses are used to store large amounts of data. A data model makes possible separating data categories and establishing relations between them. In this paper we introduce for the first time the new concept of distributed spatial data warehouse based on the multi-dimensional data model called cascaded Star Schema [1]. We decided to use the idea of new aggregation tree, that indexes our model in order to fully exploit capabilities of the cascaded Star. After close discussion on the cascaded Star Schema and aggregation tree, we introduce the new idea of distributing data warehouse based on the cascaded Star Schema. Using Java we implemented both system running on a single computer as well as distributed system. Then we carried out the tests which results allow us to compare the performance of both systems. The tests results show that by distribution one may improve the performance of spatial data warehouse.