Attribute Record

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 18 Experts worldwide ranked by ideXlab platform

J. Srivastava - One of the best experts on this subject based on the ideXlab platform.

  • ICDE - Performance evaluation of grid based multi-Attribute Record declustering methods
    Proceedings of 1994 IEEE 10th International Conference on Data Engineering, 1994
    Co-Authors: B. Himatsingka, J. Srivastava
    Abstract:

    We focus on multi-Attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of Attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods. >

  • Performance evaluation of grid based multi-Attribute Record declustering methods
    Proceedings of 1994 IEEE 10th International Conference on Data Engineering, 1994
    Co-Authors: B. Himatsingka, J. Srivastava
    Abstract:

    We focus on multi-Attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of Attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods.

B. Himatsingka - One of the best experts on this subject based on the ideXlab platform.

  • ICDE - Performance evaluation of grid based multi-Attribute Record declustering methods
    Proceedings of 1994 IEEE 10th International Conference on Data Engineering, 1994
    Co-Authors: B. Himatsingka, J. Srivastava
    Abstract:

    We focus on multi-Attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of Attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods. >

  • Performance evaluation of grid based multi-Attribute Record declustering methods
    Proceedings of 1994 IEEE 10th International Conference on Data Engineering, 1994
    Co-Authors: B. Himatsingka, J. Srivastava
    Abstract:

    We focus on multi-Attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of Attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods.

Peng Zhen - One of the best experts on this subject based on the ideXlab platform.

  • Applications of GIS-based cable resource management system in power communication network
    Electric Power, 2020
    Co-Authors: Peng Zhen
    Abstract:

    In traditional cable resource management,the Attribute Record forms such as tower coordinates,connected equipments and way of installation are mainly updated manually.The daily maintenance workload is high and the accuracy cannot be guaranteed.By using the superiority in the spatial data management,a Geographic Information System(GIS) based cable resource management system is established.The system can not only achieve the basic function of data management such as fiber network spatial and relevant Attribute data,but also support the comprehensive data analysis for the operation and maintenance management and network planning.The system has the ability to implement the cable resources lean management,improve the utilization rate of fiber optic cables,and improve the operation and maintenance level.

Tobias Vogel - One of the best experts on this subject based on the ideXlab platform.

  • Grundlagen von Datenbanken - Self-Adaptive Data Quality Web Services.
    2020
    Co-Authors: Tobias Vogel
    Abstract:

    Data Quality Web Services are services that enhance the quality of data in the sense of making it fit for use. This paper concentrates on duplicate detection, which identifies multiple representations of real-world objects within large datasets where these representations are similar to a certain degree. Measures to estimate this similarity are one of the major research efforts in the community since many years. However for Web Services, these heuristics differ in the amount of underlaying meta information, which is usually much poorer. The type of missing meta data is analyzed and classified in this paper. It also examines on different ways of making the duplicate-containing data available to the Data Quality Web Service. 1. DATA QUALITY Data quality plays an important role for entrepreneurial success. However, many companies do not recognize the importance of data quality in their ERP or CRM systems, as recent studies show. Many different technical measures can be employed to increase data quality, e.g., data normalization, duplicate detection, and data fusion (Fig. 1). While the need to clean data is ubiquitous, particularly big enterprises spend money on dedicated data cleansing techniques and policies. To achieve this, they buy data cleansing frameworks and install and maintain them within their environments, not only spending money on license fees, but also paying the consultants or IT staff permanently, even if batch processing of their data only occurs once a month. However, smaller enterprises share the same needs for good data quality, too. They need cleansing actions in the same frequency, while the amount of data might be smaller, but still too much for manual processing. E.g., for looking in a dataset 1,000 of, say, customer profiles, 500,000 comparisons are needed for finding duplicate (or near-duplicate) entries. Due to the aforementioned difficulties and costs, measures Figure 1: Data cleansing workflow ensuring data quality are often omitted. This can also be formulated as a need for ad-hoc, fair-priced, simple, lowconfiguration data cleansing. The Software as a Service paradigm promises to be a valid solution for this need since it employs services (most frequently in the shape of Web Services) as basic building blocks. 2. DUPLICATE DETECTION Detection of duplicates is the process of identifying multiple representations of same real world objects. Traditional duplicate detection (also called deduplication or Record linkage) employs well-established algorithms and heuristics, see Elmagarmid [2] and Winkler [11] for surveys. Typically, those algorithms concentrate on (a) the selection of duplication candidates and/or on (b) a measure to estimate the similarity between two items. For (a) the goal is to find an elaborate selection of candidate pairs to avoid comparison of all (O(n)) pairs of elements, where the overwhelming number of comparisons will not be promising. For (b) the goal is to compare elements efficiently, i.e., with a good estimation of the actual similarity or in short time. 2.1 Web Services for Duplicate Detection The comparison of two elements bases on data type and value of their Attributes and additional information to identify these pairs as possible duplicates. However, sometimes the amount of available information is restricted or – as in Web Services – just not available: the schema might not be up-to-date, the field-mapping is unclear, privacy issues prevent full access to all the data, etc. Thus, the question is how a good similarity measure can be created under the described conditions while still achieving appropriate results and while remaining as general as possible. Therefore, it has to be examined which information is essential for a duplicate detection process and which information therefore has to be inferred from the data or retrieved from other sources. Web Service implementations of data cleansing – resp. duplication detection – methods (Data Quality Web Services) are invoked on-demand with exactly the information that is to be decided about, e.g., in case of only a small number of items that shall be tested for similarity in an ad-hoc manner. Further, they provide a clearly specified functionality while remaining as general as possible to ensure a broad number of possible service requesters. These properties turn Web Services into the ideal foundation for evaluating duplicate detection algorithms with the limitations described above. 2.2 Large Scale Duplicate Detection Without the assumption of a Web Service to deduplicate only a small number of elements, further problems arise. The issue of how the data have to be made accessible to the services is covered by Faruquie et al. [3]. Furthermore, without this restriction, goal (a) becomes relevant in a Web Service scenario, too. 3. SIMILARITY MEASURES Successful duplicate detection within unforeseeable data and structure requires some efforts before the actual detection of duplicates can take place. Different pieces of information may be missing, i.e., “semantics” of data, Attribute names of data, mapping between fields, and Attribute separators. All those terms are explained in the following. The lack of information can be seen as different levels of challenges for a good similarity measure. These levels are presented in the following. 3.1 Challenges for Similarity Measure Selection Level 0 This represents the “traditional” scenario for similarity measures as depicted in Table 1. The Record’s Attributes have proper names, the mapping is reliable (the last name of the first Record has to be compared to the last name of the second Record), and, of course, separators are available (represented by the table grid). Finally, the “semantics” are also clear due to the fact that the database managers are at hand and can provide this kind of information. Therefore, specific measures such as a fine-grained heuristic for birthdates would detect that 1955|10|05 and 10.05.1955 mean the same date. Furthermore, it might be clear that the person’s title is often guessed or omitted and thus, Mr. and an empty entry would match. The only requirement here is that these specialized similarity measures are available. Level 1 If the data are provided without expert knowledge, e.g. in the XML-like shape illustrated in Figure 2, specialized similarity measures cannot be applied directly. First, Attributes have to be classified. The result of this classification is some degree of certainty that a similarity measure is appropriate. To achieve this knowledge, the occurring values as well as the Attribute names can Table 1: Homer Simpson (Relational Data) Attribute Record 1 Record 2 Title Mr. First Name Homer Jay H. J. Last Name Simpson Simpspn Birthday 1955|10|05 10.05.1955

William K Kuykendall - One of the best experts on this subject based on the ideXlab platform.

  • Update of the Non-State Trunk Inventory
    2000
    Co-Authors: Christian Collier, Somitra Saxena, William K Kuykendall
    Abstract:

    Geographical Information System (GIS)/Trans Ltd. provided consultant services to integrate the existing Non-state Trunk Roadway Inventory System (NSTRI) database with updated information from a recently completed Global Positioning System (GPS) inventory. The first step was to review the current environment of the NSTRI and the data items contained in it, as well as the data items contained in the GPS inventory. Recommendations were made regarding the suitability of retaining certain data items and all data items were ranked in order of importance. Since there was no common key field in the databases, they could not be joined using standard database techniques. It was determined that using a GIS to join the databases based on the location of the Attribute sections was the most appropriate method to use. The GPS Inventory collected the road centerlines (basemap) with accurate road lengths (measures), and certain road characteristics as Attributes. The GPS Attributes could be placed directly on the basemap since their measures matched those of the basemap. The NSTRI method of locating Attribute sections however, had to be converted into the same measurement system as the GPS. A Linear Referencing System (LRS) was created for the NSTRI which allowed the researchers to place the NSTRI data on the same basemap as the GPS data. Finally, the two databases could be joined by matching the route identifiers, and the measures for each Attribute Record. Upon determining that the databases could in-fact be joined, procedures for maintaining and updating the combined database were recommended. Resources required to maintain the database were also recommended.