Data Preparation Process

The Experts below are selected from a list of 66843 Experts worldwide ranked by ideXlab platform

Yibo Ren - One of the best experts on this subject based on the ideXlab platform.

Study on personalized recommendation based on collaborative filtering

2009

Co-Authors: Taowei Wang, Aimin Yang, Yibo Ren

Abstract:

Collaborative filtering is the most successful technology for building personalized recommendation system and is extensively used in many fields. In the paper, a system architecture of personalized recommendation using collaborative filtering based on web log is proposed and Data Preparation Process is detailedly described. The paper also gives an improved k-means algorithm for clustering user transactions. Experimental results show that our proposed algorithm could increase recommendation precision.

15 days free trial to Access Article
Research on personalized recommendation based on web usage mining using collaborative filtering technique

WSEAS Transactions on Information Science and Applications archive, 2009

Co-Authors: Taowei Wang, Yibo Ren

Abstract:

Collaborative filtering is the most successful technology for building personalized recommendation system and is extensively used in many fields. This paper presents a system architecture of personalized recommendation using collaborative filtering based on web usage mining and describes detailedly Data Preparation Process. To improve recommending quantity, a new personalized recommendaton model is proposed in which takes the good consideration of URL related analysis and combines the K-means algorithm. Experimental results show that our proposed model is effective and can enhance the performance of recommendation.

15 days free trial to Access Article

Mohammed Alsuwaiket - One of the best experts on this subject based on the ideXlab platform.

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

arXiv: Computers and Society, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen

Abstract:

Various studies have shown that students tend to get higher marks when assessed through coursework based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining studies that preProcess Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled modules assessment methods. They must rather be investigated thoroughly and considered during EDMs Data preProcessing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio, is proposed to be used in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students second year averages based on their first year results.

15 days free trial to Access Article
Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques

arXiv: Computers and Society, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh

Abstract:

Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods which include units that are fully assessed by varying the duration of study or a combination of courses and exams than by exams alone. Many Educational Data Mining studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation stage. The results of this work show that students final marks should not be isolated from the nature of the enrolled module assessment methods. They must rather be investigated thoroughly and considered during EDM Data preProcessing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students second year averages based on their first year averages.

15 days free trial to Access Article
Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques

Engineering Technology & Applied Science Research, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh

Abstract:

Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods - which include units that are fully assessed by varying the duration of study or a combination of courses and exams - than by exams alone. Many Educational Data Mining (EDM) studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230,000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation stage. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index (MAI) on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students’ second year averages based on their first-year averages.

15 days free trial to Access Article
Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

Engineering Technology & Applied Science Research, 2019

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen

Abstract:

The choice of an effective student assessment method is an issue of interest in Higher Education. Various studies [1] have shown that students tend to get higher marks when assessed through coursework-based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining (EDM) studies that pre-Process Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students’ marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation Process. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio (CAR), is proposed to be used in order to take the different modules’ assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students’ second-year averages based on their first-year results.

15 days free trial to Access Article
Measuring academic performance of students in Higher Education using Data mining techniques

2018

Co-Authors: Mohammed Alsuwaiket

Abstract:

Educational Data Mining (EDM) is a developing discipline, concerned with expanding the classical Data Mining (DM) methods and developing new methods for discovering the Data that originate from educational systems. It aims to use those methods to achieve a logical understanding of students, and the educational environment they should have for better learning. These Data are characterized by their large size and randomness and this can make it difficult for educators to extract knowledge from these Data. Additionally, knowledge extracted from Data by means of counting the occurrence of certain events is not always reliable, since the counting Process sometimes does not take into consideration other factors and parameters that could affect the extracted knowledge. Student attendance in Higher Education has always been dealt with in a classical way, i.e. educators rely on counting the occurrence of attendance or absence building their knowledge about students as well as modules based on this count. This method is neither credible nor does it necessarily provide a real indication of a student s performance. On other hand, the choice of an effective student assessment method is an issue of interest in Higher Education. Various studies (Romero, et al., 2010) have shown that students tend to get higher marks when assessed through coursework-based assessment methods - which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of Educational Data Mining (EDM) studies that pre-Processed Data through the conventional Data Mining Processes including the Data Preparation Process, but they are using transcript Data as it stands without looking at examination and coursework results weighting which could affect prediction accuracy. This thesis explores the above problems and tries to formulate the extracted knowledge in a way that guarantees achieving accurate and credible results. Student attendance Data, gathered from the educational system, were first cleaned in order to remove any randomness and noise, then various attributes were studied so as to highlight the most significant ones that affect the real attendance of students. The next step was to derive an equation that measures the Student Attendance s Credibility (SAC) considering the attributes chosen in the previous step. The reliability of the newly developed measure was then evaluated in order to examine its consistency. In term of transcripts Data, this thesis proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled module s assessment methods; rather they must be investigated thoroughly and considered during EDM s Data pre-Processing phases. More generally, it is concluded that Educational Data should not be prepared in the same way as exist Data due to the differences such as sources of Data, applications, and types of errors in them. Therefore, an attribute, Coursework Assessment Ratio (CAR), is proposed to use in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR and SAC on prediction Process using Data mining classification techniques such as Random Forest, Artificial Neural Networks and k-Nears Neighbors have been investigated. The results were generated by applying the DM techniques on our Data set and evaluated by measuring the statistical differences between Classification Accuracy (CA) and Root Mean Square Error (RMSE) of all models. Comprehensive evaluation has been carried out for all results in the experiments to compare all DM techniques results, and it has been found that Random forest (RF) has the highest CA and lowest RMSE. The importance of SAC and CAR in increasing the prediction accuracy has been proved in Chapter 5. Finally, the results have been compared with previous studies that predicted students final marks, based on students marks at earlier stages of their study. The comparisons have taken into consideration similar Data and attributes, whilst first excluding average CAR and SAC and secondly by including them, and then measuring the prediction accuracy between both. The aim of this comparison is to ensure that the new Preparation Process stage will positively affect the final results.

15 days free trial to Access Article

Taowei Wang - One of the best experts on this subject based on the ideXlab platform.

Study on personalized recommendation based on collaborative filtering

2009

Co-Authors: Taowei Wang, Aimin Yang, Yibo Ren

Abstract:

Collaborative filtering is the most successful technology for building personalized recommendation system and is extensively used in many fields. In the paper, a system architecture of personalized recommendation using collaborative filtering based on web log is proposed and Data Preparation Process is detailedly described. The paper also gives an improved k-means algorithm for clustering user transactions. Experimental results show that our proposed algorithm could increase recommendation precision.

15 days free trial to Access Article
Research on personalized recommendation based on web usage mining using collaborative filtering technique

WSEAS Transactions on Information Science and Applications archive, 2009

Co-Authors: Taowei Wang, Yibo Ren

Abstract:

Collaborative filtering is the most successful technology for building personalized recommendation system and is extensively used in many fields. This paper presents a system architecture of personalized recommendation using collaborative filtering based on web usage mining and describes detailedly Data Preparation Process. To improve recommending quantity, a new personalized recommendaton model is proposed in which takes the good consideration of URL related analysis and combines the K-means algorithm. Experimental results show that our proposed model is effective and can enhance the performance of recommendation.

15 days free trial to Access Article

Cyrille Artho - One of the best experts on this subject based on the ideXlab platform.

Automated Dataset Construction from Web Resources with Tool Kayur

International Journal of Networking and Computing, 2017

Co-Authors: Alexander Kohan, Mitsuharu Yamamoto, Cyrille Artho

Abstract:

Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preProcessing of textual Data, but combining them with the Data Processing tool into one working tool chain can be time consuming. The preProcessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper, we propose the simplification of Data Preparation Process for cases when Data come from wide range of web resources. We developed an open-source tool, called Kayur, that greatly minimizes time and effort required for routine Data preProcessing steps, allowing to quickly proceed to the main task of Data analysis. The Datasets generated by the tool are ready to be loaded into a Data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other Data mining tasks.

15 days free trial to Access Article
CANDAR - Automated Dataset Construction from Web Resources with Tool Kayur

2016 Fourth International Symposium on Computing and Networking (CANDAR), 2016

Co-Authors: Alexander Kohan, Mitsuharu Yamamoto, Cyrille Artho

Abstract:

Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preProcessing of textual Data, but combining them in one working tool chain can be time consuming. The preProcessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper we propose the simplification of Data Preparation Process for cases when Data come from wide range of web resources. We developed an open-sourced tool, called Kayur, that greatly minimizes time and effort required for routine Data preProcessing steps, allowing to quickly proceed to the main task of Data analysis. The Datasets generated by the tool are ready to be loaded into a Data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other Data mining tasks.

15 days free trial to Access Article

Anas H. Blasi - One of the best experts on this subject based on the ideXlab platform.

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

arXiv: Computers and Society, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen

Abstract:

Various studies have shown that students tend to get higher marks when assessed through coursework based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining studies that preProcess Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230000 student records in order to prepare students marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation Process. The results of this work show that students final marks should not be isolated from the nature of the enrolled modules assessment methods. They must rather be investigated thoroughly and considered during EDMs Data preProcessing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio, is proposed to be used in order to take the different modules assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students second year averages based on their first year results.

15 days free trial to Access Article
Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques

arXiv: Computers and Society, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh

Abstract:

Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods which include units that are fully assessed by varying the duration of study or a combination of courses and exams than by exams alone. Many Educational Data Mining studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students module marks are refined during the Data Preparation stage. The results of this work show that students final marks should not be isolated from the nature of the enrolled module assessment methods. They must rather be investigated thoroughly and considered during EDM Data preProcessing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students second year averages based on their first year averages.

15 days free trial to Access Article
Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques

Engineering Technology & Applied Science Research, 2020

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Khawla Altarawneh

Abstract:

Choosing the right and effective way to assess students is one of the most important tasks of higher education. Many studies have shown that students tend to receive higher scores during their studies when assessed by different study methods - which include units that are fully assessed by varying the duration of study or a combination of courses and exams - than by exams alone. Many Educational Data Mining (EDM) studies Process Data in advance through traditional Data extraction, including the Data Preparation Process. In this paper, we propose a different Data Preparation Process by investigating more than 230,000 student records for the Preparation of scores. The Data have been Processed through diverse stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation stage. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing stage. More generally, educational Data should not be prepared in the same way normal Data are due to the differences in Data sources, applications, and error types. The effect of Module Assessment Index (MAI) on the prediction Process using Random Forest and Naive Bayes classification techniques were investigated. It was shown that considering MAI as attribute increases the accuracy of predicting students’ second year averages based on their first-year averages.

15 days free trial to Access Article
Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

Engineering Technology & Applied Science Research, 2019

Co-Authors: Mohammed Alsuwaiket, Anas H. Blasi, Ra'fat Al-msie'deen

Abstract:

The choice of an effective student assessment method is an issue of interest in Higher Education. Various studies [1] have shown that students tend to get higher marks when assessed through coursework-based assessment methods which include either modules that are fully assessed through coursework or a mixture of coursework and examinations than assessed by examination alone. There are a large number of educational Data mining (EDM) studies that pre-Process Data through conventional Data mining Processes including Data Preparation Process, but they are using transcript Data as they stand without looking at examination and coursework results weighting which could affect prediction accuracy. This paper proposes a different Data Preparation Process through investigating more than 230,000 student records in order to prepare students’ marks based on the assessment methods of enrolled modules. The Data have been Processed through different stages in order to extract a categorical factor through which students’ module marks are refined during the Data Preparation Process. The results of this work show that students’ final marks should not be isolated from the nature of the enrolled module’s assessment methods. They must rather be investigated thoroughly and considered during EDM’s Data pre-Processing phases. More generally, it is concluded that educational Data should not be prepared in the same way as other Data types due to differences as Data sources, applications, and types of errors in them. Therefore, an attribute, coursework assessment ratio (CAR), is proposed to be used in order to take the different modules’ assessment methods into account while preparing student transcript Data. The effect of CAR on prediction Process using the random forest classification technique has been investigated. It is shown that considering CAR as an attribute increases the accuracy of predicting students’ second-year averages based on their first-year results.

15 days free trial to Access Article

Scan Science and Technology

Contact Leading Edge Experts & Companies

Scan Science and Technology

Contact Leading Edge Experts & Companies

Yibo Ren - One of the best experts on this subject based on the ideXlab platform.

Study on personalized recommendation based on collaborative filtering

Research on personalized recommendation based on web usage mining using collaborative filtering technique

Mohammed Alsuwaiket - One of the best experts on this subject based on the ideXlab platform.

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques

Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

Measuring academic performance of students in Higher Education using Data mining techniques

Taowei Wang - One of the best experts on this subject based on the ideXlab platform.

Study on personalized recommendation based on collaborative filtering

Research on personalized recommendation based on web usage mining using collaborative filtering technique

Cyrille Artho - One of the best experts on this subject based on the ideXlab platform.

Automated Dataset Construction from Web Resources with Tool Kayur

CANDAR - Automated Dataset Construction from Web Resources with Tool Kayur

Anas H. Blasi - One of the best experts on this subject based on the ideXlab platform.

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education

Refining Student Marks based on Enrolled Modules Assessment Methods using Data Mining Techniques

Refining Student Marks based on Enrolled Modules’ Assessment Methods using Data Mining Techniques

Formulating Module Assessment for Improved Academic Performance Predictability in Higher Education