Terabyte

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 11871 Experts worldwide ranked by ideXlab platform

Mary Brady - One of the best experts on this subject based on the ideXlab platform.

  • background intensity correction for Terabyte sized time lapse images
    Journal of Microscopy, 2015
    Co-Authors: Joe Chalfoun, Michael Majurski, Kiran Bhadriraju, Steven P. Lund, Peter Bajcsy, Mary Brady
    Abstract:

    Summary Several computational challenges associated with large-scale background image correction of Terabyte-sized fluorescent images are discussed and analysed in this paper. Dark current, flat-field and background correction models are applied over a mosaic of hundreds of spatially overlapping fields of view (FOVs) taken over the course of several days, during which the background diminishes as cell colonies grow. The motivation of our work comes from the need to quantify the dynamics of OCT-4 gene expression via a fluorescent reporter in human stem cell colonies. Our approach to background correction is formulated as an optimization problem over two image partitioning schemes and four analytical correction models. The optimization objective function is evaluated in terms of (1) the minimum root mean square (RMS) error remaining after image correction, (2) the maximum signal-to-noise ratio (SNR) reached after downsampling and (3) the minimum execution time. Based on the analyses with measured dark current noise and flat-field images, the most optimal GFP background correction is obtained by using a data partition based on forming a set of submosaic images with a polynomial surface background model. The resulting image after correction is characterized by an RMS of about 8, and an SNR value of a 4 × 4 downsampling above 5 by Rose criterion. The new technique generates an image with half RMS value and double SNR value when compared to an approach that assumes constant background throughout the mosaic. We show that the background noise in Terabyte-sized fluorescent image mosaics can be corrected computationally with the optimized triplet (data partition, model, SNR driven downsampling) such that the total RMS value from background noise does not exceed the magnitude of the measured dark current noise. In this case, the dark current noise serves as a benchmark for the lowest noise level that an imaging system can achieve. In comparison to previous work, the past fluorescent image background correction methods have been designed for single FOV and have not been applied to Terabyte-sized images with large mosaic FOVs, low SNR and diminishing access to background information over time as cell colonies span entirely multiple FOVs. The code is available as open-source from the following link https://isg.nist.gov/. Lay Description In this paper we present background intensity correction for Terabyte-sized time-lapse fluorescent images. The motivation of our work comes from the need to quantify the dynamics of OCT-4 gene expression via a fluorescent reporter in human stem cell colonies. The challenges lie in correcting time-lapse fluorescent images of individual size about 462 Megapixels that have been assembled from hundreds of spatially overlapping smaller fields of view (FOVs) taken over the course of several days. Furthermore, the background diminishes as cell colonies grow over time as observed during the acquisition of three time-lapse replicates equal about 2.6 Terabytes. Our approach to background correction is formulated as an optimization problem where the objective function is evaluated in terms of (1) the remaining error after image correction, (2) the maximum Signal-to-Noise Ratio (SNR) reached after binning the image, and (3) the minimum execution time. The new technique generates an image with half Root-Mean-Square (RMS) value and double SNR value when compared to a typical approach that assumes constant background throughout the mosaic. We show that the background noise after correction does not exceed the magnitude of the measured dark current noise. In this case, the dark current noise serves as a benchmark for the lowest noise level that an imaging system can achieve.

  • Background intensity correction for Terabyte‐sized time‐lapse images
    Journal of microscopy, 2014
    Co-Authors: Joe Chalfoun, Michael Majurski, Kiran Bhadriraju, Steven P. Lund, Peter Bajcsy, Mary Brady
    Abstract:

    Summary Several computational challenges associated with large-scale background image correction of Terabyte-sized fluorescent images are discussed and analysed in this paper. Dark current, flat-field and background correction models are applied over a mosaic of hundreds of spatially overlapping fields of view (FOVs) taken over the course of several days, during which the background diminishes as cell colonies grow. The motivation of our work comes from the need to quantify the dynamics of OCT-4 gene expression via a fluorescent reporter in human stem cell colonies. Our approach to background correction is formulated as an optimization problem over two image partitioning schemes and four analytical correction models. The optimization objective function is evaluated in terms of (1) the minimum root mean square (RMS) error remaining after image correction, (2) the maximum signal-to-noise ratio (SNR) reached after downsampling and (3) the minimum execution time. Based on the analyses with measured dark current noise and flat-field images, the most optimal GFP background correction is obtained by using a data partition based on forming a set of submosaic images with a polynomial surface background model. The resulting image after correction is characterized by an RMS of about 8, and an SNR value of a 4 × 4 downsampling above 5 by Rose criterion. The new technique generates an image with half RMS value and double SNR value when compared to an approach that assumes constant background throughout the mosaic. We show that the background noise in Terabyte-sized fluorescent image mosaics can be corrected computationally with the optimized triplet (data partition, model, SNR driven downsampling) such that the total RMS value from background noise does not exceed the magnitude of the measured dark current noise. In this case, the dark current noise serves as a benchmark for the lowest noise level that an imaging system can achieve. In comparison to previous work, the past fluorescent image background correction methods have been designed for single FOV and have not been applied to Terabyte-sized images with large mosaic FOVs, low SNR and diminishing access to background information over time as cell colonies span entirely multiple FOVs. The code is available as open-source from the following link https://isg.nist.gov/. Lay Description In this paper we present background intensity correction for Terabyte-sized time-lapse fluorescent images. The motivation of our work comes from the need to quantify the dynamics of OCT-4 gene expression via a fluorescent reporter in human stem cell colonies. The challenges lie in correcting time-lapse fluorescent images of individual size about 462 Megapixels that have been assembled from hundreds of spatially overlapping smaller fields of view (FOVs) taken over the course of several days. Furthermore, the background diminishes as cell colonies grow over time as observed during the acquisition of three time-lapse replicates equal about 2.6 Terabytes. Our approach to background correction is formulated as an optimization problem where the objective function is evaluated in terms of (1) the remaining error after image correction, (2) the maximum Signal-to-Noise Ratio (SNR) reached after binning the image, and (3) the minimum execution time. The new technique generates an image with half Root-Mean-Square (RMS) value and double SNR value when compared to a typical approach that assumes constant background throughout the mosaic. We show that the background noise after correction does not exceed the magnitude of the measured dark current noise. In this case, the dark current noise serves as a benchmark for the lowest noise level that an imaging system can achieve.

  • Terabyte sized image computations on hadoop cluster platforms
    International Conference on Big Data, 2013
    Co-Authors: Peter Bajcsy, Joe Chalfoun, Antoine Vandecreme, Julien Amelot, Phuong Nguyen, Mary Brady
    Abstract:

    We present a characterization of four basic Terabyte-sized image computations on a Hadoop cluster in terms of their relative efficiency according to the modified Amdahl's law. The work is motivated by the lack of standard benchmarks and stress tests for big image processing operations on a Hadoop computer cluster platform. Our benchmark design and evaluations were performed on one of the three microscopy image sets, each consisting of over one half Terabyte. All image processing benchmarks executed on the NIST Raritan cluster with Hadoop were compared against baseline measurements, such as the Terasort/Teragen designed for Hadoop testing previously, image processing executions on a multiprocessor desktop and on NIST Raritan cluster using Java Remote Method Invocation (RMI) with multiple configurations. By applying our methodology to assessing efficiencies of computations on computer cluster configurations, we could rank computation configurations and aid scientists in measuring the benefits of running image processing on a Hadoop cluster.

  • BigData Conference - Re-projection of Terabyte-sized images
    2013 IEEE International Conference on Big Data, 2013
    Co-Authors: Peter Bajcsy, Antoine Vandecreme, Mary Brady
    Abstract:

    This work addresses the problem of re-projecting a Terabyte-sized 3D data set represented as a set of 2D Deep Zoom pyramids. In general, a re-projection for small 3D data sets is executed directly in RAM. However, RAM becomes a limiting factor for Terabyte-sized 3D volumes formed by a stack of hundreds of megapixel to gigapixel 2D frames. We have benchmarked three methods to perform the re-projection computation in order to overcome the RAM limitation.

  • BigData Conference - Terabyte-sized image computations on Hadoop cluster platforms
    2013 IEEE International Conference on Big Data, 2013
    Co-Authors: Peter Bajcsy, Joe Chalfoun, Antoine Vandecreme, Julien Amelot, Phuong Nguyen, Mary Brady
    Abstract:

    We present a characterization of four basic Terabyte-sized image computations on a Hadoop cluster in terms of their relative efficiency according to the modified Amdahl's law. The work is motivated by the lack of standard benchmarks and stress tests for big image processing operations on a Hadoop computer cluster platform. Our benchmark design and evaluations were performed on one of the three microscopy image sets, each consisting of over one half Terabyte. All image processing benchmarks executed on the NIST Raritan cluster with Hadoop were compared against baseline measurements, such as the Terasort/Teragen designed for Hadoop testing previously, image processing executions on a multiprocessor desktop and on NIST Raritan cluster using Java Remote Method Invocation (RMI) with multiple configurations. By applying our methodology to assessing efficiencies of computations on computer cluster configurations, we could rank computation configurations and aid scientists in measuring the benefits of running image processing on a Hadoop cluster.

Peter Wilkins - One of the best experts on this subject based on the ideXlab platform.

  • TREC - Dublin City University at the TREC 2006 Terabyte Track
    2006
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Peter Wilkins
    Abstract:

    For the 2006 Terabyte track in TREC, Dublin City University’s participation was focussed on the ad hoc search task. As per the pervious two years [7, 4], our experiments on the Terabyte track have concentrated on the evaluation of a sorted inverted index, the aim of which is to sort the postings within each posting list in such a way, that allows only a limited number of postings to be processed from each list, while at the same time minimising the loss of eectiveness in terms of

  • TREC - Dublin City University at the TREC 2005 Terabyte Track
    2005
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Cathal Gurrin, Peter Wilkins
    Abstract:

    For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track. This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms of query precision. We also compare a distributed version of our Fisreal search system [1][2] against the same system deployed on a single machine.

  • dublin city university at the trec 2005 Terabyte track
    Text REtrieval Conference, 2005
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Cathal Gurrin, Peter Wilkins
    Abstract:

    For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track. This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms of query precision. We also compare a distributed version of our Fisreal search system [1][2] against the same system deployed on a single machine.

  • fisreal a low cost Terabyte search engine
    European Conference on Information Retrieval, 2005
    Co-Authors: Paul Ferguson, Peter Wilkins, Cathal Gurrin, Alan F. Smeaton
    Abstract:

    In this poster we describe the development of a distributed search engine, referred to as Fisreal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure of HTML, as well as the use of anchor text documents to increase retrieval performance.

  • ECIR - Físréal: a low cost Terabyte search engine
    Lecture Notes in Computer Science, 2005
    Co-Authors: Paul Ferguson, Peter Wilkins, Cathal Gurrin, Alan F. Smeaton
    Abstract:

    In this poster we describe the development of a distributed search engine, referred to as Fisreal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure of HTML, as well as the use of anchor text documents to increase retrieval performance.

Bruce W Croft - One of the best experts on this subject based on the ideXlab platform.

  • indri at trec 2005 Terabyte track
    Text REtrieval Conference, 2005
    Co-Authors: Donald Metzler, Trevor Strohman, Yun Zhou, Bruce W Croft
    Abstract:

    This work details the experiments carried out using the Indri search engine during the TREC 2005 Terabyte Track. Results are presented for each of the three tasks, including eciency , ad hoc, and named page nding. Our eciency runs focused on query optimization techniques, our ad hoc runs look at the importance of term proximity and document quality, and our named-page nding runs investigate the use of document priors and document structure.

  • indri at trec 2004 Terabyte track
    Text REtrieval Conference, 2004
    Co-Authors: Donald Metzler, Trevor Strohman, Howard R. Turtle, Bruce W Croft
    Abstract:

    This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We describe our approaches to the Terabyte Track, all of which involved automatically constructing structured queries from the title portions of the TREC topics. Our methods use term proximity information and HTML document structure. In addition, a number of optimization procedures for efficient query processing are explained.

Donald Metzler - One of the best experts on this subject based on the ideXlab platform.

  • Indri at TREC 2006: Lessons Learned From Three Terabyte Tracks
    2006
    Co-Authors: Donald Metzler, Trevor Strohman, W. B. Croft
    Abstract:

    This report describes the lessons learned using the Indri search system during the 2004-2006 TREC Terabyte Tracks. We provide an overview of Indri, and, for the ad hoc and named page nding tasks, discuss our general approach to the problem, what worked, what did not work, and what could possibly work in the future.

  • indri at trec 2005 Terabyte track
    Text REtrieval Conference, 2005
    Co-Authors: Donald Metzler, Trevor Strohman, Yun Zhou, Bruce W Croft
    Abstract:

    This work details the experiments carried out using the Indri search engine during the TREC 2005 Terabyte Track. Results are presented for each of the three tasks, including eciency , ad hoc, and named page nding. Our eciency runs focused on query optimization techniques, our ad hoc runs look at the importance of term proximity and document quality, and our named-page nding runs investigate the use of document priors and document structure.

  • TREC - Indri at TREC 2005: Terabyte Track.
    2005
    Co-Authors: Donald Metzler, Trevor Strohman, Yun Zhou, W. Bruce Croft
    Abstract:

    This work details the experiments carried out using the Indri search engine during the TREC 2005 Terabyte Track. Results are presented for each of the three tasks, including eciency , ad hoc, and named page nding. Our eciency runs focused on query optimization techniques, our ad hoc runs look at the importance of term proximity and document quality, and our named-page nding runs investigate the use of document priors and document structure.

  • TREC - Indri at TREC 2004: Terabyte Track
    2004
    Co-Authors: Donald Metzler, Trevor Strohman, Howard R. Turtle, W. Bruce Croft
    Abstract:

    This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We describe our approaches to the Terabyte Track, all of which involved automatically constructing structured queries from the title portions of the TREC topics. Our methods use term proximity information and HTML document structure. In addition, a number of optimization procedures for efficient query processing are explained.

  • indri at trec 2004 Terabyte track
    Text REtrieval Conference, 2004
    Co-Authors: Donald Metzler, Trevor Strohman, Howard R. Turtle, Bruce W Croft
    Abstract:

    This paper provides an overview of experiments carried out at the TREC 2004 Terabyte Track using the Indri search engine. Indri is an efficient, effective distributed search engine. Like INQUERY, it is based on the inference network framework and supports structured queries, but unlike INQUERY, it uses language modeling probabilities within the network which allows for added flexibility. We describe our approaches to the Terabyte Track, all of which involved automatically constructing structured queries from the title portions of the TREC topics. Our methods use term proximity information and HTML document structure. In addition, a number of optimization procedures for efficient query processing are explained.

Paul Ferguson - One of the best experts on this subject based on the ideXlab platform.

  • TREC - Dublin City University at the TREC 2006 Terabyte Track
    2006
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Peter Wilkins
    Abstract:

    For the 2006 Terabyte track in TREC, Dublin City University’s participation was focussed on the ad hoc search task. As per the pervious two years [7, 4], our experiments on the Terabyte track have concentrated on the evaluation of a sorted inverted index, the aim of which is to sort the postings within each posting list in such a way, that allows only a limited number of postings to be processed from each list, while at the same time minimising the loss of eectiveness in terms of

  • TREC - Dublin City University at the TREC 2005 Terabyte Track
    2005
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Cathal Gurrin, Peter Wilkins
    Abstract:

    For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track. This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms of query precision. We also compare a distributed version of our Fisreal search system [1][2] against the same system deployed on a single machine.

  • dublin city university at the trec 2005 Terabyte track
    Text REtrieval Conference, 2005
    Co-Authors: Paul Ferguson, Alan F. Smeaton, Cathal Gurrin, Peter Wilkins
    Abstract:

    For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track. This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms of query precision. We also compare a distributed version of our Fisreal search system [1][2] against the same system deployed on a single machine.

  • fisreal a low cost Terabyte search engine
    European Conference on Information Retrieval, 2005
    Co-Authors: Paul Ferguson, Peter Wilkins, Cathal Gurrin, Alan F. Smeaton
    Abstract:

    In this poster we describe the development of a distributed search engine, referred to as Fisreal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure of HTML, as well as the use of anchor text documents to increase retrieval performance.

  • ECIR - Físréal: a low cost Terabyte search engine
    Lecture Notes in Computer Science, 2005
    Co-Authors: Paul Ferguson, Peter Wilkins, Cathal Gurrin, Alan F. Smeaton
    Abstract:

    In this poster we describe the development of a distributed search engine, referred to as Fisreal, which utilises inexpensive workstations, yet attains fast retrieval performance for Terabyte-sized collections. We also discuss the process of leveraging additional meaning from the structure of HTML, as well as the use of anchor text documents to increase retrieval performance.