The Experts below are selected from a list of 16923 Experts worldwide ranked by ideXlab platform
Subasish Das - One of the best experts on this subject based on the ideXlab platform.
-
Factor Association with Multiple Correspondence Analysis in Vehicle–Pedestrian Crashes:
Transportation Research Record, 2015Co-Authors: Subasish Das, Xiaoduan SunAbstract:In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...
-
factor association with Multiple Correspondence Analysis in vehicle pedestrian crashes
Transportation Research Record, 2015Co-Authors: Subasish Das, Xiaoduan SunAbstract:In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...
Xiaoduan Sun - One of the best experts on this subject based on the ideXlab platform.
-
Factor Association with Multiple Correspondence Analysis in Vehicle–Pedestrian Crashes:
Transportation Research Record, 2015Co-Authors: Subasish Das, Xiaoduan SunAbstract:In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...
-
factor association with Multiple Correspondence Analysis in vehicle pedestrian crashes
Transportation Research Record, 2015Co-Authors: Subasish Das, Xiaoduan SunAbstract:In the United States, about 14% of total crash fatalities are pedestrian related. In 2012, 4,743 pedestrians were killed, and 76,000 pedestrians were injured in vehicle–pedestrian crashes in the United States. Vehicle–pedestrian crashes have become a key concern in Louisiana as a result of the high percentage of fatalities there in recent years. In 2012, pedestrians accounted for 17% of total crash fatalities in the state. This study used Multiple Correspondence Analysis (MCA), an exploratory data Analysis method used to detect and represent underlying structures in a categorical data set, to analyze 8 years (2004 to 2011) of vehicle–pedestrian crashes in Louisiana. Pedestrian crash data are best represented as transactions of Multiple categorical variables, so the use of MCA was a unique choice to determine the relationship of the variables and their significance. The findings indicated several nontrivial focus groups (e.g., drivers with high-occupancy vehicles, female drivers in bad weather conditions, ...
Shu-ching Chen - One of the best experts on this subject based on the ideXlab platform.
-
MCA-NN: Multiple Correspondence Analysis Based Neural Network for Disaster Information Detection
2017 IEEE Third International Conference on Multimedia Big Data (BigMM), 2017Co-Authors: Haiman Tian, Shu-ching ChenAbstract:This paper proposes a semantic content Analysis framework for reliable video event detection. In this work, we target to improve the concept detection results by feeding the learnt results from individual shallow learning models into a generic model to dig out of the similarities in deeper layers. Compared to the deep learning models, the shallow learning models are memorizing rather than understanding the features. The proposed framework tackles the issue in shallow learning by integrating the strength of Multiple Correspondence Analysis (MCA) and Multilayer Perceptron (MLP) neural network. The low-level features are taken as the initial inputs for MCA-based models to abstract higher-level feature values. The output values further involve interaction in the neural network for better understanding. It earns the ability to put forward the arguments. The framework provides final decisions of video classifications by analyzing the decisions of every single frame from the network outputs.
-
Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management
2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 2016Co-Authors: Samira Pouyanfar, Shu-ching ChenAbstract:Multimedia semantic concept detection is an emerging research area in recent years. One of the prominent challenges in multimedia concept detection is data imbalance. In this study, a multimedia data mining framework for interesting concept detection in videos is presented. First, the Minimum Description Length (MDL) discretization algorithm is extended to handle the imbalanced data. Thereafter, a novel Weighted Discretization Multiple Correspondence Analysis (WD-MCA) algorithm based on the Multiple Correspondence Analysis (MCA) approach is proposed to maximize the correlation between the feature value pairs and concept classes by incorporating the discretization information captured from the MDL module. The proposed framework achieves promising performance to videos containing disaster events. The experimental results demonstrate the effectiveness of the WD-MCA algorithm, specifically for imbalanced datasets, compared to several existing methods.
-
IRI - Semantic Concept Detection Using Weighted Discretization Multiple Correspondence Analysis for Disaster Information Management
2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), 2016Co-Authors: Samira Pouyanfar, Shu-ching ChenAbstract:Multimedia semantic concept detection is an emerging research area in recent years. One of the prominent challenges in multimedia concept detection is data imbalance. In this study, a multimedia data mining framework for interesting concept detection in videos is presented. First, the Minimum Description Length (MDL) discretization algorithm is extended to handle the imbalanced data. Thereafter, a novel Weighted Discretization Multiple Correspondence Analysis (WD-MCA) algorithm based on the Multiple Correspondence Analysis (MCA) approach is proposed to maximize the correlation between the feature value pairs and concept classes by incorporating the discretization information captured from the MDL module. The proposed framework achieves promising performance to videos containing disaster events. The experimental results demonstrate the effectiveness of the WD-MCA algorithm, specifically for imbalanced datasets, compared to several existing methods.
-
Temporal Multiple Correspondence Analysis for Big Data Mining in Soccer Videos
2015 IEEE International Conference on Multimedia Big Data, 2015Co-Authors: Yimin Yang, Shu-ching Chen, Mei-ling ShyuAbstract:A multimedia big data mining framework consisting of two phases for interesting event detection in soccer videos has been proposed in this paper. In the pre-processing phase, it utilizes the multi-modal multi-filtering content Analysis techniques for shot boundary detection and feature extraction. A pre-filtering process based on domain knowledge Analysis is then applied to clean the noise and obtain a candidate set. In the event detection phase, a temporal Multiple Correspondence Analysis (TMCA) algorithm that adopts an indicator weighting scheme is proposed to efficiently and effectively incorporate the temporal semantic information for improving the detection results. Furthermore, another enhanced MCA (EN-MCA) approach is presented to better capture the Correspondence between feature items and classes by thoroughly utilizing the pair-wise principal components. Finally, a re-ranking procedure is performed to retrieve the missed interesting event. Our proposed semantic re-ranking framework is evaluated on a large collection of soccer videos for interesting event detection. The experimental results demonstrate the effectiveness of the proposed framework.
-
Correlation-based Video Semantic Concept Detection using Multiple Correspondence Analysis
Tenth IEEE International Symposium on Multimedia, 2008Co-Authors: Lin Lin Lin Lin, G. Ravitz, Mei-ling Shyu, Shu-ching ChenAbstract:Semantic concept detection has emerged as an intriguing topic\nin multimedia research recently. The ability to interpret high-level\nsemantics from low-level features has been the long desired goal of\nmany researchers. In this paper, we propose a novel framework that\nutilizes the ability of Multiple Correspondence Analysis\n(MCA) to explore the correlation between different items\n(feature-value pairs) and classes (concepts) to bridge the gap\nbetween the extracted low-level features and high-level semantic\nconcepts. Using the concepts and benchmark data identified and\nprovided by the TRECVID project, we have shown that our proposed\nframework demonstrates promising results and performs better\nthan the Decision Tree (DT), Support vector machine (SVM), and \nNaive Bayesian (NB) classifiers that are commonly applied to the \nTRECVID datasets.
Julie Josse - One of the best experts on this subject based on the ideXlab platform.
-
Multiple Correspondence Analysis and the multilogit bilinear model
Journal of Multivariate Analysis, 2017Co-Authors: William Fithian, Julie JosseAbstract:Multiple Correspondence Analysis is a dimension reduction technique which plays a large role in the Analysis of tables with categorical nominal variables, such as survey data. Though it is usually motivated and derived using geometric considerations, we prove that in fact, it can be seen as a single proximal Newton step of a natural bilinear exponential family model for categorical data: the multinomial logit bilinear model. We compare and contrast the behavior of Multiple Correspondence Analysis with that of this model on simulated data, and discuss new insights into both approaches and their cognate models. Consequently, Multiple Correspondence Analysis can be used to approximate the parameters of the multilogit model. Indeed, estimating the model’s parameters is non-trivial, whereas Multiple Correspondence Analysis has the advantage of being easily solved by a singular value decomposition, and scalable to large data sets. We illustrate the methods on a survey of the drinking habits in France in the context of European policies against the harmful effects of alcohol on society.
-
Multiple Correspondence Analysis and the multilogit bilinear model
Journal of Multivariate Analysis, 2017Co-Authors: William Fithian, Julie JosseAbstract:Multiple Correspondence Analysis is a dimension reduction technique which plays a large role in the Analysis of tables with categorical nominal variables, such as survey data. Though it is usually motivated and derived using geometric considerations, we prove that in fact, it can be seen as a single proximal Newton step of a natural bilinear exponential family model for categorical data: the multinomial logit bilinear model. We compare and contrast the behavior of Multiple Correspondence Analysis with that of this model on simulated data, and discuss new insights into both approaches and their cognate models. Consequently, Multiple Correspondence Analysis can be used to approximate the parameters of the multilogit model. Indeed, estimating the model’s parameters is non-trivial, whereas Multiple Correspondence Analysis has the advantage of being easily solved by a singular value decomposition, and scalable to large data sets. We illustrate the methods on a survey of the drinking habits in France in the context of European policies against the harmful effects of alcohol on society.
-
Multinomial Multiple Correspondence Analysis
arXiv: Methodology, 2016Co-Authors: Patricia J T A Groenen, Julie JosseAbstract:Relations between categorical variables can be analyzed conveniently by Multiple Correspondence Analysis (MCA). %It is well suited to discover relations that may exist between categories of different variables. The graphical representation of MCA results in so-called biplots makes it easy to interpret the most important associations. However, a major drawback of MCA is that it does not have an underlying probability model for an individual selecting a category on a variable. In this paper, we propose such probability model called multinomial Multiple Correspondence Analysis (MMCA) that combines the underlying low-rank representation of MCA with maximum likelihood. An efficient majorization algorithm that uses an elegant bound for the second derivative is derived to estimate the parameters. The proposed model can easily lead to overfitting causing some of the parameters to wander of to infinity. We add the nuclear norm penalty to counter this issue and discuss ways of selecting regularization parameters. The proposed approach is well suited to study and vizualise the dependences for high dimensional data.
-
Multiple Correspondence Analysis & the Multilogit Bilinear Model
2016Co-Authors: William Fithian, Julie JosseAbstract:Multiple Correspondence Analysis (MCA) is a dimension reduction method which plays a large role in the Analysis of tables with categorical nominal variables such as survey data. Though it is usually motivated and derived using geometric considerations, in fact we prove that it amounts to a single proximal Newtown step of a natural bilinear exponential family model for categorical data the multinomial logit bilinear model. We compare and contrast the behavior of MCA with that of the model on simulations and discuss new insights on the properties of both exploratory multivariate methods and their cognate models. One main conclusion is that we could recommend to approximate the multilogit model parameters using MCA. Indeed, estimating the parameters of the model is not a trivial task whereas MCA has the great advantage of being easily solved by singular value decomposition and scalable to large data.
-
MIMCA: Multiple imputation for categorical variables with Multiple Correspondence Analysis
arXiv: Methodology, 2015Co-Authors: Vincent Audigier, François Husson, Julie JosseAbstract:We propose a Multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: Multiple Correspondence Analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small the number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (Multiple imputation using the loglinear model, Multiple imputation by logistic regressions) as well to the latest works on the topic (Multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method shows good performances in terms of bias and coverage for an Analysis model such as a main effects logistic regression model. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other Multiple imputation methods.
François Husson - One of the best experts on this subject based on the ideXlab platform.
-
MIMCA: Multiple imputation for categorical variables with Multiple Correspondence Analysis
arXiv: Methodology, 2015Co-Authors: Vincent Audigier, François Husson, Julie JosseAbstract:We propose a Multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: Multiple Correspondence Analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small the number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (Multiple imputation using the loglinear model, Multiple imputation by logistic regressions) as well to the latest works on the topic (Multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method shows good performances in terms of bias and coverage for an Analysis model such as a main effects logistic regression model. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other Multiple imputation methods.
-
Multiple Correspondence Analysis
2014Co-Authors: François Husson, Julie JosseAbstract:Multiple Correspondence Analysis (MCA) is a method of analyse des donnees used to describe, explore, summarize, and visualize information contained within a data table of N individuals described by Q categorical variables. This method is often used to analyse questionnaire data. It can be seen as an analogue of principal components Analysis (PCA) for categorical variables (rather than quantitative variables) or even as an extension of Correspondence Analysis (CA) to the case of more than two categorical variables. The main objectives of MCA can be defined as follows: (1) to provide a typology of the individuals, that is, to study the similarities between the individuals from a multidimensional perspective; (2) to assess the relationships between the variables and study the associations between the categories; and (3) to link together the study of individuals and that of variables in order to characterize the individuals using the variables.
-
handling missing values with regularized iterative Multiple Correspondence Analysis
Journal of Classification, 2012Co-Authors: Julie Josse, Marie Chavent, Benoit Liquet, François HussonAbstract:A common approach to deal with missing values in multivariate exploratory data Analysis consists in minimizing the loss function over all non-missing elements, which can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative Multiple Correspondence Analysis, to handle missing values in Multiple Correspondence Analysis (MCA). The algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the overfitting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modified margin method, an adaptation of the missing passive method used in Gifi’s Homogeneity Analysis framework.
-
Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis
Journal of Classification, 2012Co-Authors: Julie Josse, Marie Chavent, Benoit Liquet, François HussonAbstract:A common approach to deal with missing values in multivariate exploratory data Analysis consists in minimizing the loss function over all non-missing elements. This can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative Multiple Correspondence Analysis, to handle missing values in Multiple Correspondence Analysis (MCA). This algorithm, based on an iterative PCA algorithm, is described and its properties are studied. We point out the over tting problem and propose a regularized version of the algorithm to overcome this major issue. Finally, performances of the regularized iterative MCA algorithm (implemented in the R-package named missMDA) are assessed from both simulations and a real dataset. Results are promising with respect to other methods such as the missing-data passive modi ed margin method, an adaptation of the missing passive method used in Gi 's Homogeneity Analysis framework.
-
Multiple Correspondence Analysis with missing values
2011Co-Authors: Julie Josse, François Husson, Marie Chavent, Benoit LiquetAbstract:Multiple Correspondence Analysis with missing values