Small Data Set

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 240672 Experts worldwide ranked by ideXlab platform

Tung-i Tsai - One of the best experts on this subject based on the ideXlab platform.

  • Considering Relationship of Proteins for Radiotherapy Prognosis of Bladder Cancer Cells in Small Data Set.
    Methods of information in medicine, 2018
    Co-Authors: Tung-i Tsai, Yaofeng Zhang, Zhigang Zhang, Gy-yi Chao, Cheng-chieh Tsai
    Abstract:

    Radiotherapy has serious side effects and thus requires prudent and cautious evaluation. However, obtaining protein expression profiles is expensive and timeconsuming, making it necessary to develop a theoretical and rational procedure for predicting the radiotherapy outcome for bladder cancer when working with limited Data. A procedure for estimating the performance of radiotherapy is proposed in this research. The population domain (range of the population) of proteins and the relationships among proteins are considered to increase prediction accuracy. This research uses modified extreme value theory (MEVT), which is used to estimate the population domain of proteins, and correlation coefficients and prediction intervals to overcome the lack of knowledge regarding relationships among proteins. When the size of the training Data Set was 5 samples, the mean absolute percentage error rate (MAPE) was 31.6200%; MAPE fell to 13.5505% when the number of samples was increased to 30. The standard deviation (SD) of forecasting error fell from 3.0609% for 5 samples to 1.2415% for 30 samples. These results show that the proposed procedure yields accurate and stable results, and is suitable for use with Small Data Sets. The results show that considering the relationships among proteins is necessary when predicting the outcome of radiotherapy. Georg Thieme Verlag KG Stuttgart · New York.

  • Considering Relationship of Proteins for Radiotherapy Prognosis of Bladder Cancer Cells in Small Data Set.
    Methods of Information in Medicine, 2018
    Co-Authors: Tung-i Tsai, Yaofeng Zhang, Zhigang Zhang, Gy-yi Chao, Cheng-chieh Tsai
    Abstract:

    Background: Radiotherapy has serious side effects and thus requires prudent and cautious evaluation. However, obtaining protein expression profiles is expensive and timeconsuming, making it necessary to develop a theoretical and rational procedure for predicting the radiotherapy outcome for bladder cancer when working with limited Data. Objective: A procedure for estimating the performance of radiotherapy is proposed in this research. The population domain (range of the population) of proteins and the relationships among proteins are considered to increase prediction accuracy. Methods: This research uses modified extreme value theory (MEVT), which is used to estimate the population domain of proteins, and correlation coefficients and prediction intervals to overcome the lack of knowledge regarding relationships among proteins. Results: When the size of the training Data Set was 5 samples, the mean absolute percentage error rate (MAPE) was 31.6200%; MAPE fell to 13.5505% when the number of samples was increased to 30. The standard deviation (SD) of forecasting error fell from 3.0609% for 5 samples to 1.2415% for 30 samples. These results show that the proposed procedure yields accurate and stable results, and is suitable for use with Small Data Sets. Conclusions: The results show that considering the relationships among proteins is necessary when predicting the outcome of radiotherapy.

  • Utilize bootstrap in Small Data Set learning for pilot run modeling of manufacturing systems
    Expert Systems with Applications, 2008
    Co-Authors: Tung-i Tsai
    Abstract:

    If the production process, production equipment, or material changes, it becomes necessary to execute pilot runs before mass production in manufacturing systems. Using the limited Data obtained from pilot runs to shorten the lead time to predict future production is this worthy of study. Although, artificial neural networks are widely utilized to extract management knowledge from acquired Data, sufficient training Data is the fundamental assumption. Unfortunately, this is often not achievable for pilot runs because there are few Data obtained during trial stages and theoretically this means that the knowledge obtained is fragile. The purpose of this research is to utilize bootstrap to generate virtual samples to fill the information gaps of sparse Data. The results of this research indicate that the prediction error rate can be significantly decreased by applying the proposed method to a very Small Data Set.

  • Using mega-trend-diffusion and artificial samples in Small Data Set learning for early flexible manufacturing system scheduling knowledge
    Computers & Operations Research, 2007
    Co-Authors: Tung-i Tsai, Yao-san Lina
    Abstract:

    Abstract Neural networks are widely utilized to extract management knowledge from acquired Data, but having enough real Data is not always possible. In the early stages of dynamic flexible manufacturing system (FMS) environments, only a litter Data is obtained, and this means that the scheduling knowledge is often unreliable. The purpose of this research is to utilize Data expansion techniques for an obtained Small Data Set to improve the accuracy of machine learning for FMS scheduling. This research proposes a mega-trend-diffusion technique to estimate the domain range of a Small Data Set and produce artificial samples for training the modified backpropagation neural network (BPNN). The tool used is the Pythia software. The results of the FMS simulation model indicate that learning accuracy can be significantly improved when the proposed method is applied to a very Small Data Set.

  • Using mega-fuzzification and Data trend estimation in Small Data Set learning for early FMS scheduling knowledge
    Computers & Operations Research, 2006
    Co-Authors: Tung-i Tsai, Fengming M. Chang
    Abstract:

    Provided with plenty of Data (experience), Data mining techniques are widely used to extract suitable management skills from the Data. Nevertheless, in the early stages of a manufacturing system, only rare Data can be obtained, and built scheduling knowledge is usually fragile. Using Small Data Sets, this research's purpose is improving the accuracy of machine learning for flexible manufacturing system (FMS) scheduling. The study develops a Data trend estimation technique and combines it with mega-fuzzification and adaptive-network-based fuzzy inference systems (ANFIS). The results of the simulated FMS scheduling problem indicate that learning accuracy can be significantly improved using the proposed method involving a very Small Data Set.

Wen-chih Chen - One of the best experts on this subject based on the ideXlab platform.

  • A novel Data transformation model for Small Data-Set learning
    International Journal of Production Research, 2016
    Co-Authors: I-hsiang Wen, Wen-chih Chen
    Abstract:

    In most highly competitive manufacturing industries, the sample sizes are usually very Small in pilot runs, in order to quickly launch new products. However, it is always difficult for engineers to improve the quality in mass production runs based on the limited Data obtained in this way. Past research has demonstrated that adding artificial samples can be an effective approach when learning with Small Data-Sets. However, a prior analysis of the Data is needed to deduce the appropriate sample distributions within which the artificial samples are generated. Johnson transformation is one of the well-known models that can be applied to bring Data close to a normal distribution with the satisfaction of certain statistical assumptions. The sample size required for such Data transformation methods is usually large, and this thus motivates the efforts of the current study to develop a new method which is suitable for Small Data-Sets. Accordingly, this research proposes the Small Johnson Data Transformation metho...

  • a multi model approach to determine early manufacturing parameters for Small Data Set prediction
    International Journal of Production Research, 2012
    Co-Authors: Chiao Wen Liu, Wen-chih Chen
    Abstract:

    Constructing an accurate prediction model from a Small training Data Set is an important but difficult task in the field of forecasting. This is because when the Data size is Small, the incomplete Data may mean that the model produced cannot sufficiently represent the true Data structure or cause the model training to be overfitted. To address this issue, this paper presents an approach that combines multiple prediction models to extract Data information in multiple facets. In the multi-model approach, a compromise weight method is proposed to determine the relative reliability of each of the prediction model. The methods used include multiple regression, artificial neural network, and support vector machines for regression. A thin-film transistor liquid crystal display manufacturing case study is used to illustrate the details of this research. The empirical results not only show that the proposed multi-model can reduce the manufacturing variation and increase the production yield, but also can propose a...

  • A grey-based fitting coefficient to build a hybrid forecasting model for Small Data Sets
    Applied Mathematical Modelling, 2012
    Co-Authors: Che-jung Chang, Chien Chih Chen, Wen-chih Chen
    Abstract:

    Abstract In the current rapidly changing manufacturing conditions, controlling manufacturing systems effectively and efficiently is a critical issue for enterprises, especially in their early stages. However, it is often difficult to make correct decisions, with the insufficient information available at such times. We thus develop a two-stage modeling procedure to build a predictive model using few samples. We first use three conventional approaches to establish forecasting models, and then implement pre-testing with the proposed grey-based fitness measuring index to determine the weights to create a hybrid model. Two DataSets, including color filter manufacturing Data and the Asia-Pacific Economic Cooperation energy Database, are evaluated in the experiment, and the results show that the proposed method not only has good forecasting performance, but also reduces the influence forecasting errors. Accordingly, the proposed procedure is thus considered a feasible approach for Small-Data-Set forecasting.

Chiao Wen Liu - One of the best experts on this subject based on the ideXlab platform.

  • a multi model approach to determine early manufacturing parameters for Small Data Set prediction
    International Journal of Production Research, 2012
    Co-Authors: Chiao Wen Liu, Wen-chih Chen
    Abstract:

    Constructing an accurate prediction model from a Small training Data Set is an important but difficult task in the field of forecasting. This is because when the Data size is Small, the incomplete Data may mean that the model produced cannot sufficiently represent the true Data structure or cause the model training to be overfitted. To address this issue, this paper presents an approach that combines multiple prediction models to extract Data information in multiple facets. In the multi-model approach, a compromise weight method is proposed to determine the relative reliability of each of the prediction model. The methods used include multiple regression, artificial neural network, and support vector machines for regression. A thin-film transistor liquid crystal display manufacturing case study is used to illustrate the details of this research. The empirical results not only show that the proposed multi-model can reduce the manufacturing variation and increase the production yield, but also can propose a...

  • Extending attribute information for Small Data Set classification
    IEEE Transactions on Knowledge and Data Engineering, 2012
    Co-Authors: Der-chiang Li, Chiao Wen Liu
    Abstract:

    Data quantity is the main issue in the Small Data Set problem, because usually insufficient Data will not lead to a robust classification performance. How to extract more effective information from a Small Data Set is thus of considerable interest. This paper proposes a new attribute construction approach which converts the original Data attributes into a higher dimensional feature space to extract more attribute information by a similarity-based algorithm using the classification-oriented fuzzy membership function. Seven Data Sets with different attribute sizes are employed to examine the performance of the proposed method. The results show that the proposed method has a superior classification performance when compared to principal component analysis (PCA), kernel principal component analysis (KPCA), and kernel independent component analysis (KICA) with a Gaussian kernel in the support vector machine (SVM) classifier.

  • A neural network weight determination model designed uniquely for Small Data Set learning
    Expert Systems with Applications, 2009
    Co-Authors: Chiao Wen Liu
    Abstract:

    Environment characteristics are dynamic and changeable. In customized or flexible manufacturing systems, the collected Data used for analysis is often Small. There are many studies on Small Data Set problems. However, most papers attack the problem by developing Data pre-treatment methods which normally require abstruse mathematical knowledge, deterring engineers from applying the methods in practice. This paper develops a unique neural network to accurately predict Small Data Sets. This neural network is developed based on the concept of the Data central location tracking method (CLTM) to determine net weights as the learning rules. It not only makes accurate forecasts using Small Data Sets but it also facilitates knowledge learning for engineers.

Chien Chih Chen - One of the best experts on this subject based on the ideXlab platform.

  • a novel gray forecasting model based on the box plot for Small manufacturing Data Sets
    Applied Mathematics and Computation, 2015
    Co-Authors: Che-jung Chang, Yihsiang Huang, Chien Chih Chen
    Abstract:

    Efficiently controlling the early stages of a manufacturing system is an important issue for enterprises. However, the number of samples collected at this point is usually limited due to time and cost issues, making it difficult to understand the real situation in the production process. One of the ways to solve this problem is to use a Small Data Set forecasting tool, such as the various gray approaches. The gray model is a popular forecasting technique for use with Small Data Sets, and while it has been successfully adopted in various fields, it can still be further improved. This paper thus uses a box plot to analyze Data features and proposes a new formula for the background values in the gray model to improve forecasting accuracy. The new forecasting model is called BGM(1,1). In the experimental study, one public DataSet and one real case are used to confirm the effectiveness of the proposed model, and the experimental results show that it is an appropriate tool for Small Data Set forecasting. Small-Data-Set forecasting problem is difficult for most manufacturing environments.A forecasting tool using limited Data for engineers and managers is more effective and efficient.The proposed method base on the box plot can analyze Data features to improve forecasting accuracy with Small Data Sets.The proposed method is considered an appropriate procedure in general to forecast manufacturing outputs based on Small samples.

  • A Novel Procedure for Multimodel Development Using the Grey Silhouette Coefficient for Small-Data-Set Forecasting
    Journal of the Operational Research Society, 2015
    Co-Authors: Che-jung Chang, Wen Li Dai, Chien Chih Chen
    Abstract:

    Small-Data-Set forecasting problems are a critical issue in various fields, with the early stage of a manufacturing system being a good example. Manufacturers require sufficient knowledge to minimize overall production costs, but this is difficult to achieve due to limited number of samples available at such times. This research was thus conducted to develop a modelling procedure to assist managers or decision makers in acquiring stable prediction results from Small Data Sets. The proposed method is a two-stage procedure. First, we assessed some single models to determine whether the tendency of a real sequence can be reflected using grey incidence analysis, and we then evaluated their forecasting stability based on the relative ratio of error range. Second, a grey silhouette coefficient was developed to create an applicable hybrid forecasting model for Small samples. Two real cases were analysed to confirm the effectiveness and practical value of the proposed method. The empirical results showed that the multimodel procedure can minimize forecasting errors and improve forecasting results with limited Data. Consequently, the proposed procedure is considered a feasible tool for Small-Data-Set forecasting problems.

  • A latent information function to extend domain attributes to improve the accuracy of Small-Data-Set forecasting
    Neurocomputing, 2014
    Co-Authors: Che-jung Chang, Wen Li Dai, Chien Chih Chen
    Abstract:

    In the current highly competitive manufacturing environment, it is important to have effective and efficient control of manufacturing systems to obtain and maintain competitive advantages. However, developing appropriate forecasting models for such systems can be challenging in their early stages, as the sample sizes are usually very Small, and thus there is limited Data available for analysis. The technique of virtual sample generation is one way to address this issue, but this method is usually not directly applied to time series Data. This research thus develops a Latent Information function to analyze Data features and extract hidden information, in order to learn from Small Data Sets considering timing factors. The experimental results obtained using the Synthetic Control Chart Time Series and aluminum price DataSets show that the proposed method can significantly improve forecasting accuracy, and thus is considered an appropriate procedure to forecast manufacturing outputs based on Small samples.

  • A grey-based fitting coefficient to build a hybrid forecasting model for Small Data Sets
    Applied Mathematical Modelling, 2012
    Co-Authors: Che-jung Chang, Chien Chih Chen, Wen-chih Chen
    Abstract:

    Abstract In the current rapidly changing manufacturing conditions, controlling manufacturing systems effectively and efficiently is a critical issue for enterprises, especially in their early stages. However, it is often difficult to make correct decisions, with the insufficient information available at such times. We thus develop a two-stage modeling procedure to build a predictive model using few samples. We first use three conventional approaches to establish forecasting models, and then implement pre-testing with the proposed grey-based fitness measuring index to determine the weights to create a hybrid model. Two DataSets, including color filter manufacturing Data and the Asia-Pacific Economic Cooperation energy Database, are evaluated in the experiment, and the results show that the proposed method not only has good forecasting performance, but also reduces the influence forecasting errors. Accordingly, the proposed procedure is thus considered a feasible approach for Small-Data-Set forecasting.

Cheng-chieh Tsai - One of the best experts on this subject based on the ideXlab platform.

  • Considering Relationship of Proteins for Radiotherapy Prognosis of Bladder Cancer Cells in Small Data Set.
    Methods of information in medicine, 2018
    Co-Authors: Tung-i Tsai, Yaofeng Zhang, Zhigang Zhang, Gy-yi Chao, Cheng-chieh Tsai
    Abstract:

    Radiotherapy has serious side effects and thus requires prudent and cautious evaluation. However, obtaining protein expression profiles is expensive and timeconsuming, making it necessary to develop a theoretical and rational procedure for predicting the radiotherapy outcome for bladder cancer when working with limited Data. A procedure for estimating the performance of radiotherapy is proposed in this research. The population domain (range of the population) of proteins and the relationships among proteins are considered to increase prediction accuracy. This research uses modified extreme value theory (MEVT), which is used to estimate the population domain of proteins, and correlation coefficients and prediction intervals to overcome the lack of knowledge regarding relationships among proteins. When the size of the training Data Set was 5 samples, the mean absolute percentage error rate (MAPE) was 31.6200%; MAPE fell to 13.5505% when the number of samples was increased to 30. The standard deviation (SD) of forecasting error fell from 3.0609% for 5 samples to 1.2415% for 30 samples. These results show that the proposed procedure yields accurate and stable results, and is suitable for use with Small Data Sets. The results show that considering the relationships among proteins is necessary when predicting the outcome of radiotherapy. Georg Thieme Verlag KG Stuttgart · New York.

  • Considering Relationship of Proteins for Radiotherapy Prognosis of Bladder Cancer Cells in Small Data Set.
    Methods of Information in Medicine, 2018
    Co-Authors: Tung-i Tsai, Yaofeng Zhang, Zhigang Zhang, Gy-yi Chao, Cheng-chieh Tsai
    Abstract:

    Background: Radiotherapy has serious side effects and thus requires prudent and cautious evaluation. However, obtaining protein expression profiles is expensive and timeconsuming, making it necessary to develop a theoretical and rational procedure for predicting the radiotherapy outcome for bladder cancer when working with limited Data. Objective: A procedure for estimating the performance of radiotherapy is proposed in this research. The population domain (range of the population) of proteins and the relationships among proteins are considered to increase prediction accuracy. Methods: This research uses modified extreme value theory (MEVT), which is used to estimate the population domain of proteins, and correlation coefficients and prediction intervals to overcome the lack of knowledge regarding relationships among proteins. Results: When the size of the training Data Set was 5 samples, the mean absolute percentage error rate (MAPE) was 31.6200%; MAPE fell to 13.5505% when the number of samples was increased to 30. The standard deviation (SD) of forecasting error fell from 3.0609% for 5 samples to 1.2415% for 30 samples. These results show that the proposed procedure yields accurate and stable results, and is suitable for use with Small Data Sets. Conclusions: The results show that considering the relationships among proteins is necessary when predicting the outcome of radiotherapy.