Image Collections

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 35592 Experts worldwide ranked by ideXlab platform

Jan Zahálka - One of the best experts on this subject based on the ideXlab platform.

  • ii 20 intelligent and pragmatic analytic categorization of Image Collections
    IEEE Transactions on Visualization and Computer Graphics, 2021
    Co-Authors: Jan Zahálka, Marcel Worring, Jarke J Van Wijk
    Abstract:

    In this paper, we introduce 11–20 (Image Insight 2020), a multimedia analytics approach for analytic categorization of Image Collections. Advanced visualizations for Image Collections exist, but they need tight integration with a machine model to support the task of analytic categorization. Directly employing computer vision and interactive learning techniques gravitates towards search. Analytic categorization, however, is not machine classification (the difference between the two is called the pragmatic gap): a human adds/redefines/deletes categories of relevance on the fly to build insight, whereas the machine classifier is rigid and non-adaptive. Analytic categorization that truly brings the user to insight requires a flexible machine model that allows dynamic sliding on the exploration-search axis, as well as semantic interactions: a human thinks about Image data mostly in semantic terms. 11–20 brings three major contributions to multimedia analytics on Image Collections and towards closing the pragmatic gap. Firstly, a new machine model that closely follows the user's interactions and dynamically models her categories of relevance. II-20's machine model, in addition to matching and exceeding the state of the art's ability to produce relevant suggestions, allows the user to dynamically slide on the exploration-search axis without any additional input from her side. Secondly, the dynamic, 1-Image-at-a-time Tetris metaphor that synergizes with the model. It allows a well-trained model to analyze the collection by itself with minimal interaction from the user and complements the classic grid metaphor. Thirdly, the fast-forward interaction, allowing the user to harness the model to quickly expand (“fast-forward”) the categories of relevance, expands the multimedia analytics semantic interaction dictionary. Automated experiments show that II-20's machine model outperforms the existing state of the art and also demonstrate the Tetris metaphor's analytic quality. User studies further confirm that II–20 is an intuitive, efficient, and effective multimedia analytics tool.

  • ii 20 intelligent and pragmatic analytic categorization of Image Collections
    arXiv: Multimedia, 2020
    Co-Authors: Jan Zahálka, Marcel Worring, Jarke J Van Wijk
    Abstract:

    We introduce II-20 (Image Insight 2020), a multimedia analytics approach for analytic categorization of Image Collections. Advanced visualizations for Image Collections exist, but they need tight integration with a machine model to support analytic categorization. Directly employing computer vision and interactive learning techniques gravitates towards search. Analytic categorization, however, is not machine classification (the difference between the two is called the pragmatic gap): a human adds/redefines/deletes categories of relevance on the fly to build insight, whereas the machine classifier is rigid and non-adaptive. Analytic categorization that brings the user to insight requires a flexible machine model that allows dynamic sliding on the exploration-search axis, as well as semantic interactions. II-20 brings 3 major contributions to multimedia analytics on Image Collections and towards closing the pragmatic gap. Firstly, a machine model that closely follows the user's interactions and dynamically models her categories of relevance. II-20's model, in addition to matching and exceeding the state of the art w. r. t. relevance, allows the user to dynamically slide on the exploration-search axis without additional input from her side. Secondly, the dynamic, 1-Image-at-a-time Tetris metaphor that synergizes with the model. It allows the model to analyze the collection by itself with minimal interaction from the user and complements the classic grid metaphor. Thirdly, the fast-forward interaction, allowing the user to harness the model to quickly expand ("fast-forward") the categories of relevance, expands the multimedia analytics semantic interaction dictionary. Automated experiments show that II-20's model outperforms the state of the art and also demonstrate Tetris's analytic quality. User studies confirm that II-20 is an intuitive, efficient, and effective multimedia analytics tool.

  • Multimedia Pivot Tables for Multimedia Analytics on Image Collections
    IEEE Transactions on Multimedia, 2016
    Co-Authors: Marcel Worring, Dennis Koelma, Jan Zahálka
    Abstract:

    We propose a multimedia analytics solution for getting insight into Image Collections by extending the powerful analytic capabilities of pivot tables, found in the ubiquitous spreadsheets, to multimedia. We formalize the concept of multimedia pivot tables and give design rules and methods for the multimodal summarization, structuring, and browsing of the collection based on these tables, all optimized to support an analyst in getting structural and conclusive insights. Our proposed solution provides truly interactive analytics on the visual content of Image Collections through concept detection results, as well as tags, geolocation, time, and other metadata. We have performed user experiments with novice users on a dataset from Flickr to improve the initial design and with expert users in marketing and multimedia analysis on two domain-specific datasets collected from Instagram. The results show that analysts are indeed capable of deriving structural and conclusive insights using the proposed multimedia analytics solution. On our website, videos of the system in action are available.

Marcel Worring - One of the best experts on this subject based on the ideXlab platform.

  • ii 20 intelligent and pragmatic analytic categorization of Image Collections
    IEEE Transactions on Visualization and Computer Graphics, 2021
    Co-Authors: Jan Zahálka, Marcel Worring, Jarke J Van Wijk
    Abstract:

    In this paper, we introduce 11–20 (Image Insight 2020), a multimedia analytics approach for analytic categorization of Image Collections. Advanced visualizations for Image Collections exist, but they need tight integration with a machine model to support the task of analytic categorization. Directly employing computer vision and interactive learning techniques gravitates towards search. Analytic categorization, however, is not machine classification (the difference between the two is called the pragmatic gap): a human adds/redefines/deletes categories of relevance on the fly to build insight, whereas the machine classifier is rigid and non-adaptive. Analytic categorization that truly brings the user to insight requires a flexible machine model that allows dynamic sliding on the exploration-search axis, as well as semantic interactions: a human thinks about Image data mostly in semantic terms. 11–20 brings three major contributions to multimedia analytics on Image Collections and towards closing the pragmatic gap. Firstly, a new machine model that closely follows the user's interactions and dynamically models her categories of relevance. II-20's machine model, in addition to matching and exceeding the state of the art's ability to produce relevant suggestions, allows the user to dynamically slide on the exploration-search axis without any additional input from her side. Secondly, the dynamic, 1-Image-at-a-time Tetris metaphor that synergizes with the model. It allows a well-trained model to analyze the collection by itself with minimal interaction from the user and complements the classic grid metaphor. Thirdly, the fast-forward interaction, allowing the user to harness the model to quickly expand (“fast-forward”) the categories of relevance, expands the multimedia analytics semantic interaction dictionary. Automated experiments show that II-20's machine model outperforms the existing state of the art and also demonstrate the Tetris metaphor's analytic quality. User studies further confirm that II–20 is an intuitive, efficient, and effective multimedia analytics tool.

  • ii 20 intelligent and pragmatic analytic categorization of Image Collections
    arXiv: Multimedia, 2020
    Co-Authors: Jan Zahálka, Marcel Worring, Jarke J Van Wijk
    Abstract:

    We introduce II-20 (Image Insight 2020), a multimedia analytics approach for analytic categorization of Image Collections. Advanced visualizations for Image Collections exist, but they need tight integration with a machine model to support analytic categorization. Directly employing computer vision and interactive learning techniques gravitates towards search. Analytic categorization, however, is not machine classification (the difference between the two is called the pragmatic gap): a human adds/redefines/deletes categories of relevance on the fly to build insight, whereas the machine classifier is rigid and non-adaptive. Analytic categorization that brings the user to insight requires a flexible machine model that allows dynamic sliding on the exploration-search axis, as well as semantic interactions. II-20 brings 3 major contributions to multimedia analytics on Image Collections and towards closing the pragmatic gap. Firstly, a machine model that closely follows the user's interactions and dynamically models her categories of relevance. II-20's model, in addition to matching and exceeding the state of the art w. r. t. relevance, allows the user to dynamically slide on the exploration-search axis without additional input from her side. Secondly, the dynamic, 1-Image-at-a-time Tetris metaphor that synergizes with the model. It allows the model to analyze the collection by itself with minimal interaction from the user and complements the classic grid metaphor. Thirdly, the fast-forward interaction, allowing the user to harness the model to quickly expand ("fast-forward") the categories of relevance, expands the multimedia analytics semantic interaction dictionary. Automated experiments show that II-20's model outperforms the state of the art and also demonstrate Tetris's analytic quality. User studies confirm that II-20 is an intuitive, efficient, and effective multimedia analytics tool.

  • Multimedia Pivot Tables for Multimedia Analytics on Image Collections
    IEEE Transactions on Multimedia, 2016
    Co-Authors: Marcel Worring, Dennis Koelma, Jan Zahálka
    Abstract:

    We propose a multimedia analytics solution for getting insight into Image Collections by extending the powerful analytic capabilities of pivot tables, found in the ubiquitous spreadsheets, to multimedia. We formalize the concept of multimedia pivot tables and give design rules and methods for the multimodal summarization, structuring, and browsing of the collection based on these tables, all optimized to support an analyst in getting structural and conclusive insights. Our proposed solution provides truly interactive analytics on the visual content of Image Collections through concept detection results, as well as tags, geolocation, time, and other metadata. We have performed user experiments with novice users on a dataset from Flickr to improve the initial design and with expert users in marketing and multimedia analysis on two domain-specific datasets collected from Instagram. The results show that analysts are indeed capable of deriving structural and conclusive insights using the proposed multimedia analytics solution. On our website, videos of the system in action are available.

  • A Multimedia Analytics Framework for Browsing Image Collections in Digital Forensics
    2013
    Co-Authors: Marcel Worring, Andreas Engl, Camelia Smeria
    Abstract:

    Searching through large Collections of Images to find patterns of use or to find sets of relevant items is difficult, especially when the information to consider is not only the content of the Images itself, but also the associated metadata. Multimedia analytics is a new approach to such problems. We consider the case of forensic experts facing Image Collections of growing size during digital forensic investigations. We answer the forensic challenge by developing specialised novel interactive visualisations which employ content-based Image clusters in both the analysis as well as in all visualizations. Their synergy makes the task of manually browsing these Collections more effective and efficient. Evaluation of such multimedia analytics is a notoriously hard problem as there are so many factors influencing the result. As a controlled evaluation, we developed a user simulation framework to create Image Collections with time and directory information as metadata. We apply it in a number of scenarios to illustrate its use. The simulation tool is available to other researchers via our website

  • Interactive access to large Image Collections using similarity-based visualization
    Journal of Visual Languages & Computing, 2008
    Co-Authors: G. P. Nguyen, Marcel Worring
    Abstract:

    Image Collections are getting larger and larger. To access those Collections, systems for managing, searching, and browsing are necessary. Visualization plays an essential role in such systems. Existing visualization systems do not analyze all the problems occurring when dealing with large visual Collections. In this paper, we make these problems explicit. From there, we establish three general requirements: overview, visibility, and structure preservation. Solutions for each requirement are proposed, as well as functions balancing the different requirements. We present an optimal visualization scheme, supporting users in interacting with large Image Collections. Experimental results with a collection of 10,000 Corel Images, using simulated user actions, show that the proposed scheme significantly improves performance for a given task compared to the 2D grid-based visualizations commonly used in content-based Image retrieval.

Janmichael Frahm - One of the best experts on this subject based on the ideXlab platform.

  • geo registered 3d models from crowdsourced Image Collections
    Geo-spatial Information Science, 2013
    Co-Authors: Janmichael Frahm, Jared Heinly, Enliang Zheng, Enrique Dunn, Pierre Fitegeorgel, Marc Pollefeys
    Abstract:

    In this article we present our system for scalable, robust, and fast city-scale reconstruction from Internet photo Collections (IPC) obtaining geo-registered dense 3D models. The major achievements of our system are the efficient use of coarse appearance descriptors combined with strong geometric constraints to reduce the computational complexity of the Image overlap search. This unique combination of recognition and geometric constraints allows our method to reduce from quadratic complexity in the number of Images to almost linear complexity in the IPC size. Accordingly, our 3D-modeling framework is inherently better scalable than other state of the art methods and in fact is currently the only method to support modeling from millions of Images. In addition, we propose a novel mechanism to overcome the inherent scale ambiguity of the reconstructed models by exploiting geo-tags of the Internet photo collection Images and readily available StreetView panoramas for fully automatic geo-registration of the 3D...

  • modeling and recognition of landmark Image Collections using iconic scene graphs
    International Journal of Computer Vision, 2011
    Co-Authors: Rahul Raguram, Changchang Wu, Janmichael Frahm, Svetlana Lazebnik
    Abstract:

    This article presents an approach for modeling landmarks based on large-scale, heavily contaminated Image Collections gathered from the Internet. Our system efficiently combines 2D appearance and 3D geometric constraints to extract scene summaries and construct 3D models. In the first stage of processing, Images are clustered based on low-dimensional global appearance descriptors, and the clusters are refined using 3D geometric constraints. Each valid cluster is represented by a single iconic view, and the geometric relationships between iconic views are captured by an iconic scene graph. Using structure from motion techniques, the system then registers the iconic Images to efficiently produce 3D models of the different aspects of the landmark. To improve coverage of the scene, these 3D models are subsequently extended using additional, non-iconic views. We also demonstrate the use of iconic Images for recognition and browsing. Our experimental results demonstrate the ability to process datasets containing up to 46,000 Images in less than 20 hours, using a single commodity PC equipped with a graphics card. This is a significant advance towards Internet-scale operation.

  • modeling and recognition of landmark Image Collections using iconic scene graphs
    European Conference on Computer Vision, 2008
    Co-Authors: Christopher Zach, Svetlana Lazebnik, Janmichael Frahm
    Abstract:

    This paper presents an approach for modeling landmark sites such as the Statue of Liberty based on large-scale contaminated Image Collections gathered from the Internet. Our system combines 2D appearance and 3D geometric constraints to efficiently extract scene summaries, build 3D models, and recognize instances of the landmark in new test Images. We start by clustering Images using low-dimensional global "gist" descriptors. Next, we perform geometric verification to retain only the clusters whose Images share a common 3D structure. Each valid cluster is then represented by a single iconic view, and geometric relationships between iconic views are captured by an iconic scene graph. In addition to serving as a compact scene summary, this graph is used to guide structure from motion to efficiently produce 3D models of the different aspects of the landmark. The set of iconic Images is also used for recognition, i.e., determining whether new test Images contain the landmark. Results on three data sets consisting of tens of thousands of Images demonstrate the potential of the proposed approach.

Pietro Perona - One of the best experts on this subject based on the ideXlab platform.

  • unsupervised learning of categorical segments in Image Collections
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
    Co-Authors: M Andreetto, Lihi Zelnikmanor, Pietro Perona
    Abstract:

    Which one comes first: segmentation or recognition? We propose a unified framework for carrying out the two simultaneously and without supervision. The framework combines a flexible probabilistic model, for representing the shape and appearance of each segment, with the popular “bag of visual words” model for recognition. If applied to a collection of Images, our framework can simultaneously discover the segments of each Image and the correspondence between such segments, without supervision. Such recurring segments may be thought of as the “parts” of corresponding objects that appear multiple times in the Image collection. Thus, the model may be used for learning new categories, detecting/classifying objects, and segmenting Images, without using expensive human annotation.

  • unsupervised organization of Image Collections taxonomies and beyond
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
    Co-Authors: E Bart, Max Welling, Pietro Perona
    Abstract:

    We introduce a nonparametric Bayesian model, called TAX, which can organize Image Collections into a tree-shaped taxonomy without supervision. The model is inspired by the Nested Chinese Restaurant Process (NCRP) and associates each Image with a path through the taxonomy. Similar Images share initial segments of their paths and thus share some aspects of their representation. Each internal node in the taxonomy represents information that is common to multiple Images. We explore the properties of the taxonomy through experiments on a large (~104) Image collection with a number of users trying to locate quickly a given Image. We find that the main benefits are easier navigation through Image Collections and reduced description length. A natural question is whether a taxonomy is the optimal form of organization for natural Images. Our experiments indicate that although taxonomies can organize Images in a useful manner, more elaborate structures may be even better suited for this task.

  • indexing in large scale Image Collections scaling properties and benchmark
    Workshop on Applications of Computer Vision, 2011
    Co-Authors: Mohamed Aly, Mario E Munich, Pietro Perona
    Abstract:

    Indexing quickly and accurately in a large collection of Images has become an important problem with many applications. Given a query Image, the goal is to retrieve matching Images in the collection. We compare the structure and properties of seven different methods based on the two leading approaches: voting from matching of local descriptors vs. matching histograms of visual words, including some new methods. We derive theoretical estimates of how the memory and computational cost scale with the number of Images in the database. We evaluate these properties empirically on four real-world datasets with different statistics. We discuss the pros and cons of the different methods and suggest promising directions for future research.

  • searching large scale Image Collections
    2011
    Co-Authors: Pietro Perona, Mohamed Aly
    Abstract:

    Searching quickly and accurately in a large collection of Images has become an increasingly important problem. The ultimate goal is to make visual search possible: allow users to search using Images in addition to typing text. The typical approach is to index all the Images of interest (e.g., Images of landmarks, books, or DVDs) in a database and let users question the system with query Images. Such a database can reach billions of Images, and this poses challenges in terms of memory and computational requirements and recognition performance. In this work we provide an in depth study of systems used for searching large-scale Image Collections. Specifically, we provide a thorough comparison of the two leading Image search approaches: Full Representation (FR) vs. Bag of Words (BoW). We derive theoretical estimates of how the memory and computational cost scale with the number of Images in the database, and empirically evaluate the performance and run time on four real-world datasets. Our experiments suggest that FR provides better recognition performance than BoW, though it requires more memory. Therefore, we address these shortcomings by presenting novel methods that increase the recognition performance of BoW and decrease the memory requirements of FR. Finally, we present a novel way to parallelize FR on multiple machines and scale up database sizes to 100 million Images with interactive run time.

  • distributed kd trees for retrieval from very large Image Collections
    2011
    Co-Authors: Mohamed Aly, Mario E Munich, Pietro Perona
    Abstract:

    Distributed Kd-Trees is a method for building Image retrieval systems that can handle hundreds of millions of Images. It is based on dividing the Kd-Tree into a “root subtree” that resides on a root machine, and several “leaf subtrees”, each residing on a leaf machine. The root machine handles incoming queries and farms out feature matching to an appropriate small subset of the leaf machines. Our implementation employs the MapReduce architecture to efficiently build and distribute the Kd-Tree for millions of Images. It can run on thousands of machines, and provides orders of magnitude more throughput than the state-of-the-art, with better recognition performance. We show experiments with up to 100 million Images running on 2048 machines, with run time of a fraction of a second for each query Image.

Svetlana Lazebnik - One of the best experts on this subject based on the ideXlab platform.

  • modeling and recognition of landmark Image Collections using iconic scene graphs
    International Journal of Computer Vision, 2011
    Co-Authors: Rahul Raguram, Changchang Wu, Janmichael Frahm, Svetlana Lazebnik
    Abstract:

    This article presents an approach for modeling landmarks based on large-scale, heavily contaminated Image Collections gathered from the Internet. Our system efficiently combines 2D appearance and 3D geometric constraints to extract scene summaries and construct 3D models. In the first stage of processing, Images are clustered based on low-dimensional global appearance descriptors, and the clusters are refined using 3D geometric constraints. Each valid cluster is represented by a single iconic view, and the geometric relationships between iconic views are captured by an iconic scene graph. Using structure from motion techniques, the system then registers the iconic Images to efficiently produce 3D models of the different aspects of the landmark. To improve coverage of the scene, these 3D models are subsequently extended using additional, non-iconic views. We also demonstrate the use of iconic Images for recognition and browsing. Our experimental results demonstrate the ability to process datasets containing up to 46,000 Images in less than 20 hours, using a single commodity PC equipped with a graphics card. This is a significant advance towards Internet-scale operation.

  • modeling and recognition of landmark Image Collections using iconic scene graphs
    European Conference on Computer Vision, 2008
    Co-Authors: Christopher Zach, Svetlana Lazebnik, Janmichael Frahm
    Abstract:

    This paper presents an approach for modeling landmark sites such as the Statue of Liberty based on large-scale contaminated Image Collections gathered from the Internet. Our system combines 2D appearance and 3D geometric constraints to efficiently extract scene summaries, build 3D models, and recognize instances of the landmark in new test Images. We start by clustering Images using low-dimensional global "gist" descriptors. Next, we perform geometric verification to retain only the clusters whose Images share a common 3D structure. Each valid cluster is then represented by a single iconic view, and geometric relationships between iconic views are captured by an iconic scene graph. In addition to serving as a compact scene summary, this graph is used to guide structure from motion to efficiently produce 3D models of the different aspects of the landmark. The set of iconic Images is also used for recognition, i.e., determining whether new test Images contain the landmark. Results on three data sets consisting of tens of thousands of Images demonstrate the potential of the proposed approach.