Data Preparation Process

14,000,000 Leading Edge Experts on the ideXlab platform

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

The Experts below are selected from a list of 66843 Experts worldwide ranked by ideXlab platform

Yibo Ren - One of the best experts on this subject based on the ideXlab platform.

Mohammed Alsuwaiket - One of the best experts on this subject based on the ideXlab platform.

  • Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education
    arXiv: Computers and Society, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen
    Abstract:

    Various studies have shown that students tend to get higher marks when assessed through coursework based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining studies that preProcess Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled modules assessment methods. They must rather be investigated thoroughly and considered during EDMs Data preProcessing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio, is proposed to be used in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students second year averages based on their first year results.

  • Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques
    arXiv: Computers and Society, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh
    Abstract:

    Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods which include units that are fully assessed by varying the duration of study or a combination of courses and exams than by exams alone. Many Educational Data Mining studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation stage. The results of this work show that students final marks should not be isolated from the nature of the enrolled module assessment methods. They must rather be investigated thoroughly and considered during EDM Data preProcessing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students second year averages based on their first year averages.

  • Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques
    Engineering Technology & Applied Science Research, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh
    Abstract:

    Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods - which include units that are fully assessed by varying the duration of study or a combination of courses and exams - than by exams alone. Many Educational Data Mining (EDM) studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230,000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation stage. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index (MAI) on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students’ second year averages based on their first-year averages.

  • Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education
    Engineering Technology & Applied Science Research, 2019
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen
    Abstract:

    The choice of an effective student assessment method is an issue of interest in Higher Education. Various studies [1] have shown that students tend to get higher marks when assessed through coursework-based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining (EDM) studies that pre-Process Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students’ marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation Process. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio (CAR), is proposed to be used in order to take the different modules’ assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students’ second-year averages based on their first-year results.

  • Measuring academic performance of students in Higher Education using Data mining techniques
    2018
    Co-Authors: Mohammed Alsuwaiket
    Abstract:

    Educational Data Mining (EDM) is a developing discipline, concerned with expanding the classical Data Mining (DM) methods and developing new methods for discovering the Data that originate from educational systems. It aims to use those methods to achieve a logical understanding of students, and the educational environment they should have for better learning. These Data are characterized by their large size and randomness and this can make it difficult for educators to extract knowledge from these Data. Additionally, knowledge extracted from Data by means of counting the occurrence of certain events is not always reliable, since the counting Process sometimes does not take into consideration other factors and parameters that could affect the extracted knowledge. Student attendance in Higher Education has always been dealt with in a classical way, i.e. educators rely on counting the occurrence of attendance or absence building their knowledge about students as well as modules based on this count. This method is neither credible nor does it necessarily provide a real indication of a student s performance. On other hand, the choice of an effective student assessment method is an issue of interest in Higher Education. Various studies (Romero, et al., 2010) have shown that students tend to get higher marks when assessed through coursework-based assessment methods - which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of Educational Data Mining (EDM) studies that pre-Processed Data through the conventional Data Mining Processes including the Data Preparation Process, but they are using transcript Data as it stands without looking at examination and coursework results weighting which could affect prediction accuracy. This thesis explores the above problems and tries to formulate the extracted knowledge in a way that guarantees achieving accurate and credible results. Student attendance Data, gathered from the educational system, were first cleaned in order to remove any randomness and noise, then various attributes were studied so as to highlight the most significant ones that affect the real attendance of students. The next step was to derive an equation that measures the Student Attendance s Credibility (SAC) considering the attributes chosen in the previous step. The reliability of the newly developed measure was then evaluated in order to examine its consistency. In term of transcripts Data, this thesis proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled module s assessment methods; rather they must be investigated thoroughly and considered during EDM s Data pre-Processing phases. More generally, it is concluded that Educational Data should not be prepared in the same way as exist Data due to the differences such as sources of Data, applications, and types of errors in them. Therefore, an attribute, Coursework Assessment Ratio (CAR), is proposed to use in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR and SAC on prediction Process using Data mining classification techniques such as Random Forest, Artificial Neural Networks and k-Nears Neighbors have been investigated. The results were generated by applying the DM techniques on our Data set and evaluated by measuring the statistical differences between Classification Accuracy (CA) and Root Mean Square Error (RMSE) of all models. Comprehensive evaluation has been carried out for all results in the experiments to compare all DM techniques results, and it has been found that Random forest (RF) has the highest CA and lowest RMSE. The importance of SAC and CAR in increasing the prediction accuracy has been proved in Chapter 5. Finally, the results have been compared with previous studies that predicted students final marks, based on students marks at earlier stages of their study. The comparisons have taken into consideration similar Data and attributes, whilst first excluding average CAR and SAC and secondly by including them, and then measuring the prediction accuracy between both. The aim of this comparison is to ensure that the new Preparation Process stage will positively affect the final results.

Taowei Wang - One of the best experts on this subject based on the ideXlab platform.

Cyrille Artho - One of the best experts on this subject based on the ideXlab platform.

  • Automated Dataset Construction from Web Resources with Tool Kayur
    International Journal of Networking and Computing, 2017
    Co-Authors: Alexander Kohan, Mitsuharu Yamamoto, Cyrille Artho
    Abstract:

    Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preProcessing of textual Data, but combining them with the Data Processing tool into one working tool chain can be time consuming. The preProcessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper, we propose the simplification of Data Preparation Process for cases when Data come from wide range of web resources. We developed an open-source tool, called Kayur, that greatly minimizes time and effort required for routine Data preProcessing steps, allowing to quickly proceed to the main task of Data analysis. The Datasets generated by the tool are ready to be loaded into a Data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other Data mining tasks.

  • CANDAR - Automated Dataset Construction from Web Resources with Tool Kayur
    2016 Fourth International Symposium on Computing and Networking (CANDAR), 2016
    Co-Authors: Alexander Kohan, Mitsuharu Yamamoto, Cyrille Artho
    Abstract:

    Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preProcessing of textual Data, but combining them in one working tool chain can be time consuming. The preProcessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper we propose the simplification of Data Preparation Process for cases when Data come from wide range of web resources. We developed an open-sourced tool, called Kayur, that greatly minimizes time and effort required for routine Data preProcessing steps, allowing to quickly proceed to the main task of Data analysis. The Datasets generated by the tool are ready to be loaded into a Data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other Data mining tasks.

Anas H. Blasi - One of the best experts on this subject based on the ideXlab platform.

  • Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education
    arXiv: Computers and Society, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen
    Abstract:

    Various studies have shown that students tend to get higher marks when assessed through coursework based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining studies that preProcess Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled modules assessment methods. They must rather be investigated thoroughly and considered during EDMs Data preProcessing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio, is proposed to be used in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students second year averages based on their first year results.

  • Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques
    arXiv: Computers and Society, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh
    Abstract:

    Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods which include units that are fully assessed by varying the duration of study or a combination of courses and exams than by exams alone. Many Educational Data Mining studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation stage. The results of this work show that students final marks should not be isolated from the nature of the enrolled module assessment methods. They must rather be investigated thoroughly and considered during EDM Data preProcessing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students second year averages based on their first year averages.

  • Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques
    Engineering Technology & Applied Science Research, 2020
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh
    Abstract:

    Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods - which include units that are fully assessed by varying the duration of study or a combination of courses and exams - than by exams alone. Many Educational Data Mining (EDM) studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230,000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation stage. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index (MAI) on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students’ second year averages based on their first-year averages.

  • Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education
    Engineering Technology & Applied Science Research, 2019
    Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen
    Abstract:

    The choice of an effective student assessment method is an issue of interest in Higher Education. Various studies [1] have shown that students tend to get higher marks when assessed through coursework-based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining (EDM) studies that pre-Process Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students’ marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation Process. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio (CAR), is proposed to be used in order to take the different modules’ assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students’ second-year averages based on their first-year results.