Multiple Correspondence Analysis

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 16923 Experts worldwide ranked by ideXlab platform

Subasish Das - One of the best experts on this subject based on the ideXlab platform.

  • Factor Association with Multiple Correspondence Analysis in Vehicle–Pedestrian Crashes:
    Transportation Research Record, 2015
    Co-Authors: Subasish Das, Xiaoduan Sun
    Abstract:

    In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...

  • factor association with Multiple Correspondence Analysis in vehicle pedestrian crashes
    Transportation Research Record, 2015
    Co-Authors: Subasish Das, Xiaoduan Sun
    Abstract:

    In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...

Xiaoduan Sun - One of the best experts on this subject based on the ideXlab platform.

  • Factor Association with Multiple Correspondence Analysis in Vehicle–Pedestrian Crashes:
    Transportation Research Record, 2015
    Co-Authors: Subasish Das, Xiaoduan Sun
    Abstract:

    In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...

  • factor association with Multiple Correspondence Analysis in vehicle pedestrian crashes
    Transportation Research Record, 2015
    Co-Authors: Subasish Das, Xiaoduan Sun
    Abstract:

    In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...

Shu-ching Chen - One of the best experts on this subject based on the ideXlab platform.

  • MCA-NN: Multiple Correspondence Analysis Based Neural Network for Disaster Information Detection
    2017 IEEE Third International Conference on Multimedia Big Data (BigMM), 2017
    Co-Authors: Haiman Tian, Shu-ching Chen
    Abstract:

    This paper proposes a semantic content Analysis framework for reliable video event detection. In this work, we target to improve the concept detection results by feeding the learnt results from individual shallow learning models into a generic model to dig out of the similarities in deeper layers. Compared to the deep learning models, the shallow learning models are memorizing rather than understanding the features. The proposed framework tackles the issue in shallow learning by integrating the strength of Multiple Correspondence Analysis (MCA) and Multilayer Perceptron (MLP) neural network. The low-level features are taken as the initial inputs for MCA-based models to abstract higher-level feature values. The output values further involve interaction in the neural network for better understanding. It earns the ability to put forward the arguments. The framework provides final decisions of video classifications by analyzing the decisions of every single frame from the network outputs.

  • Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management
    2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 2016
    Co-Authors: Samira Pouyanfar, Shu-ching Chen
    Abstract:

    Multimedia semantic concept detection is an emerging research area in recent years. One of the prominent challenges in multimedia concept detection is data imbalance. In this study, a multimedia data mining framework for interesting concept detection in videos is presented. First, the Minimum Description Length (MDL) discretization algorithm is extended to handle the imbalanced data. Thereafter, a novel Weighted Discretization Multiple Correspondence Analysis (WD-MCA) algorithm based on the Multiple Correspondence Analysis (MCA) approach is proposed to maximize the correlation between the feature value pairs and concept classes by incorporating the discretization information captured from the MDL module. The proposed framework achieves promising performance to videos containing disaster events. The experimental results demonstrate the effectiveness of the WD-MCA algorithm, specifically for imbalanced datasets, compared to several existing methods.

  • IRI - Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management
    2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 2016
    Co-Authors: Samira Pouyanfar, Shu-ching Chen
    Abstract:

    Multimedia semantic concept detection is an emerging research area in recent years. One of the prominent challenges in multimedia concept detection is data imbalance. In this study, a multimedia data mining framework for interesting concept detection in videos is presented. First, the Minimum Description Length (MDL) discretization algorithm is extended to handle the imbalanced data. Thereafter, a novel Weighted Discretization Multiple Correspondence Analysis (WD-MCA) algorithm based on the Multiple Correspondence Analysis (MCA) approach is proposed to maximize the correlation between the feature value pairs and concept classes by incorporating the discretization information captured from the MDL module. The proposed framework achieves promising performance to videos containing disaster events. The experimental results demonstrate the effectiveness of the WD-MCA algorithm, specifically for imbalanced datasets, compared to several existing methods.

  • Temporal Multiple Correspondence Analysis for Big Data Mining in Soccer Videos
    2015 IEEE International Conference on Multimedia Big Data, 2015
    Co-Authors: Yimin Yang, Shu-ching Chen, Mei-ling Shyu
    Abstract:

    A multimedia big data mining framework consisting of two phases for interesting event detection in soccer videos has been proposed in this paper. In the pre-processing phase, it utilizes the multi-modal multi-filtering content Analysis techniques for shot boundary detection and feature extraction. A pre-filtering process based on domain knowledge Analysis is then applied to clean the noise and obtain a candidate set. In the event detection phase, a temporal Multiple Correspondence Analysis (TMCA) algorithm that adopts an indicator weighting scheme is proposed to efficiently and effectively incorporate the temporal semantic information for improving the detection results. Furthermore, another enhanced MCA (EN-MCA) approach is presented to better capture the Correspondence between feature items and classes by thoroughly utilizing the pair-wise principal components. Finally, a re-ranking procedure is performed to retrieve the missed interesting event. Our proposed semantic re-ranking framework is evaluated on a large collection of soccer videos for interesting event detection. The experimental results demonstrate the effectiveness of the proposed framework.

  • Correlation-based Video Semantic Concept Detection using Multiple Correspondence Analysis
    Tenth IEEE International Symposium on Multimedia, 2008
    Co-Authors: Lin Lin Lin Lin, G. Ravitz, Mei-ling Shyu, Shu-ching Chen
    Abstract:

    Semantic concept detection has emerged as an intriguing topic\nin multimedia research recently. The ability to interpret high-level\nsemantics from low-level features has been the long desired goal of\nmany researchers. In this paper, we propose a novel framework that\nutilizes the ability of Multiple Correspondence Analysis\n(MCA) to explore the correlation between different items\n(feature-value pairs) and classes (concepts) to bridge the gap\nbetween the extracted low-level features and high-level semantic\nconcepts. Using the concepts and benchmark data identified and\nprovided by the TRECVID project, we have shown that our proposed\nframework demonstrates promising results and performs better\nthan the Decision Tree (DT), Support vector machine (SVM), and \nNaive Bayesian (NB) classifiers that are commonly applied to the \nTRECVID datasets.

Julie Josse - One of the best experts on this subject based on the ideXlab platform.

  • Multiple Correspondence Analysis and the multilogit bilinear model
    Journal of Multivariate Analysis, 2017
    Co-Authors: William Fithian, Julie Josse
    Abstract:

    Multiple Correspondence Analysis is a dimension reduction technique which plays a large role in the Analysis of tables with categorical nominal variables, such as survey data. Though it is usually motivated and derived using geometric considerations, we prove that in fact, it can be seen as a single proximal Newton step of a natural bilinear exponential family model for categorical data: the multinomial logit bilinear model. We compare and contrast the behavior of Multiple Correspondence Analysis with that of this model on simulated data, and discuss new insights into both approaches and their cognate models. Consequently, Multiple Correspondence Analysis can be used to approximate the parameters of the multilogit model. Indeed, estimating the model’s parameters is non-trivial, whereas Multiple Correspondence Analysis has the advantage of being easily solved by a singular value decomposition, and scalable to large data sets. We illustrate the methods on a survey of the drinking habits in France in the context of European policies against the harmful effects of alcohol on society.

  • Multiple Correspondence Analysis and the multilogit bilinear model
    Journal of Multivariate Analysis, 2017
    Co-Authors: William Fithian, Julie Josse
    Abstract:

    Multiple Correspondence Analysis is a dimension reduction technique which plays a large role in the Analysis of tables with categorical nominal variables, such as survey data. Though it is usually motivated and derived using geometric considerations, we prove that in fact, it can be seen as a single proximal Newton step of a natural bilinear exponential family model for categorical data: the multinomial logit bilinear model. We compare and contrast the behavior of Multiple Correspondence Analysis with that of this model on simulated data, and discuss new insights into both approaches and their cognate models. Consequently, Multiple Correspondence Analysis can be used to approximate the parameters of the multilogit model. Indeed, estimating the model’s parameters is non-trivial, whereas Multiple Correspondence Analysis has the advantage of being easily solved by a singular value decomposition, and scalable to large data sets. We illustrate the methods on a survey of the drinking habits in France in the context of European policies against the harmful effects of alcohol on society.

  • Multinomial Multiple Correspondence Analysis
    arXiv: Methodology, 2016
    Co-Authors: Patricia J T A Groenen, Julie Josse
    Abstract:

    Relations between categorical variables can be analyzed conveniently by Multiple Correspondence Analysis (MCA). %It is well suited to discover relations that may exist between categories of different variables. The graphical representation of MCA results in so-called biplots makes it easy to interpret the most important associations. However, a major drawback of MCA is that it does not have an underlying probability model for an individual selecting a category on a variable. In this paper, we propose such probability model called multinomial Multiple Correspondence Analysis (MMCA) that combines the underlying low-rank representation of MCA with maximum likelihood. An efficient majorization algorithm that uses an elegant bound for the second derivative is derived to estimate the parameters. The proposed model can easily lead to overfitting causing some of the parameters to wander of to infinity. We add the nuclear norm penalty to counter this issue and discuss ways of selecting regularization parameters. The proposed approach is well suited to study and vizualise the dependences for high dimensional data.

  • Multiple Correspondence Analysis & the Multilogit Bilinear Model
    2016
    Co-Authors: William Fithian, Julie Josse
    Abstract:

    Multiple Correspondence Analysis (MCA) is a dimension reduction method which plays a large role in the Analysis of tables with categorical nominal variables such as survey data. Though it is usually motivated and derived using geometric considerations, in fact we prove that it amounts to a single proximal Newtown step of a natural bilinear exponential family model for categorical data the multinomial logit bilinear model. We compare and contrast the behavior of MCA with that of the model on simulations and discuss new insights on the properties of both exploratory multivariate methods and their cognate models. One main conclusion is that we could recommend to approximate the multilogit model parameters using MCA. Indeed, estimating the parameters of the model is not a trivial task whereas MCA has the great advantage of being easily solved by singular value decomposition and scalable to large data.

  • MIMCA: Multiple imputation for categorical variables with Multiple Correspondence Analysis
    arXiv: Methodology, 2015
    Co-Authors: Vincent Audigier, François Husson, Julie Josse
    Abstract:

    We propose a Multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: Multiple Correspondence Analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small the number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (Multiple imputation using the loglinear model, Multiple imputation by logistic regressions) as well to the latest works on the topic (Multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method shows good performances in terms of bias and coverage for an Analysis model such as a main effects logistic regression model. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other Multiple imputation methods.

François Husson - One of the best experts on this subject based on the ideXlab platform.

  • MIMCA: Multiple imputation for categorical variables with Multiple Correspondence Analysis
    arXiv: Methodology, 2015
    Co-Authors: Vincent Audigier, François Husson, Julie Josse
    Abstract:

    We propose a Multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: Multiple Correspondence Analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small the number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (Multiple imputation using the loglinear model, Multiple imputation by logistic regressions) as well to the latest works on the topic (Multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method shows good performances in terms of bias and coverage for an Analysis model such as a main effects logistic regression model. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other Multiple imputation methods.

  • Multiple Correspondence Analysis
    2014
    Co-Authors: François Husson, Julie Josse
    Abstract:

    Multiple Correspondence Analysis (MCA) is a method of analyse des donnees used to describe, explore, summarize, and visualize information contained within a data table of N individuals described by Q categorical variables. This method is often used to analyse questionnaire data. It can be seen as an analogue of principal components Analysis (PCA) for categorical variables (rather than quantitative variables) or even as an extension of Correspondence Analysis (CA) to the case of more than two categorical variables. The main objectives of MCA can be defined as follows: (1) to provide a typology of the individuals, that is, to study the similarities between the individuals from a multidimensional perspective; (2) to assess the relationships between the variables and study the associations between the categories; and (3) to link together the study of individuals and that of variables in order to characterize the individuals using the variables.

  • handling missing values with regularized iterative Multiple Correspondence Analysis
    Journal of Classification, 2012
    Co-Authors: Julie Josse, Marie Chavent, Benoit Liquet, François Husson
    Abstract:

    A common approach to deal with missing values in multivariate exploratory data Analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative Multiple Correspondence Analysis, to handle missing values in Multiple Correspondence Analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity Analysis framework.

  • Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis
    Journal of Classification, 2012
    Co-Authors: Julie Josse, Marie Chavent, Benoit Liquet, François Husson
    Abstract:

    A common approach to deal with missing values in multivariate exploratory data Analysis consists in minimizing the loss function over all non-missing elements. This can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative Multiple Correspondence Analysis, to handle missing values in Multiple Correspondence Analysis (MCA). This algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the over tting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modi ed margin method, an adaptation of the missing passive method used in Gi 's Homogeneity Analysis framework.

  • Multiple Correspondence Analysis with missing values
    2011
    Co-Authors: Julie Josse, François Husson, Marie Chavent, Benoit Liquet
    Abstract:

    Multiple Correspondence Analysis with missing values