Rico-Juan, J. R., Peña-Acuña, B., and Navarro-Martinez, O. (2024b). Holistic Exploration of Reading Comprehension Skills, Technology and Socioeconomic Factors in Spanish Teenagers. Heliyon, 10(12), (Impact Factor: 4 (4.1) - Q2 - 23/73). [ bib | DOI | http ]

The intricate relationship between teenagers' literacy and technology underscores the need for a comprehensive understanding, particularly in the Spanish context. This study employs explainable artificial intelligence (AI) to delve into this complex interplay, focusing on the pivotal role of reading comprehension skills in the personal and career development of Spanish teenagers. With a sample of 22,400 15-year-olds from the PISA dataset, we investigate the impact of socioeconomic factors, technology habits, parental education, residential location, and school type on reading comprehension skills. Utilizing machine learning techniques, our analysis reveals a nuanced connection between autonomy, technological proficiency, and academic performance. Notably, family oversight of technology use emerges as a crucial factor in managing the impact of digital technology and the Internet on reading comprehension skills. The study emphasizes the necessity for a balanced and supervised introduction to technology from an early age. Contrary to current trends, our findings indicate that online gaming may not contribute positively to reading comprehension skills, while moderate daily Internet use (1-4 hours) proves beneficial. Furthermore, the study underscores the ongoing nature of acquiring reading comprehension and technological skills, emphasizing the need for continuous attention and guidance from childhood. Parental education levels are identified as partial predictors of children's performance, emphasizing the importance of a holistic educational approach that considers autonomy and technological literacy. This study advocates for addressing socio-economic and gender inequalities in education and highlights the crucial role of cooperation between schools and families, particularly those with lower educational levels.

Rico-Juan, J. R., Cachero, C., and Macià, H. (2024a). Study regarding the influence of a student’s personality and an LMS usage profile on learning performance using machine learning techniques. Applied Intelligence, pages 1–23, (Impact Factor: 5.3 (5.2) - Q2 - 48/145). [ bib | DOI | http ]

Academic performance (AP) is crucial for lifelong success. Unfortunately, many students fail to meet expected academic benchmarks, leading to altered career paths or university dropouts. This issue is particularly pronounced in the early stages of higher education, highlighting the need for the instructors of these foundational courses to have access to simple yet effective tools for the early identification of students at high risk of academic failure. In this study, we propose a streamlined conceptual model inspired by the Model of Human Behavior (MHB) to which we have incorporated two dimensions: capacity and willingness. These dimensions are assessed through the definition of three variables: Prior Academic Performance (PAP), Personality and Academic Engagement, whose measurements can easily be obtained by the instructors. Furthermore, we outline a Machine Learning (ML) process that higher education instructors can use to create their own tailored models in order to predict AP and identify risk groups with high levels of transparency and interpretability. The application of our approach to a sample of 322 Spanish undergraduates studying two mathematical subjects at a Spanish university demonstrates its potential to detect failure early in the semester with a precision that is comparable with that of more complex models found in literature. Our tailored model identified that capacity was the primary predictor of AP, with a gain-to-baseline improvement of 21%, and the willingness variables increasing this to 27%. This approach is consistent over time. Implications for instructors are discussed and an open prediction and analysis tool is developed.

Valero-Mas, J. J., Gallego, A. J., and Rico-Juan, J. R. (2024). An overview of ensemble and feature learning in few-shot image classification using siamese networks. Multimedia Tools and Applications, 83(7):19929–19952, ( Impact Factor: 3.6 (3.1) - Q2 - 31/108). [ bib | DOI | http ]

Siamese Neural Networks (SNNs) constitute one of the most representative approaches for addressing Few-Shot Image Classification. These schemes comprise a set of Convolutional Neural Network (CNN) models whose weights are shared across the network, which results in fewer parameters to train and less tendency to overfit. This fact eventually leads to better convergence capabilities than standard neural models when considering scarce amounts of data. Based on a contrastive principle, the SNN scheme jointly trains these inner CNN models to map the input image data to an embedded representation that may be later exploited for the recognition process. However, in spite of their extensive use in the related literature, the representation capabilities of SNN schemes have neither been thoroughly assessed nor combined with other strategies for boosting their classification performance. Within this context, this work experimentally studies the capabilities of SNN architectures for obtaining a suitable embedded representation in scenarios with a severe data scarcity, assesses the use of train data augmentation for improving the feature learning process, introduces the use of transfer learning techniques for further exploiting the embedded representations obtained by the model, and uses test data augmentation for boosting the performance capabilities of the SNN scheme by mimicking an ensemble learning process. The results obtained with different image corpora report that the combination of the commented techniques achieves classification rates ranging from 69% to 78% with just 5 to 20 prototypes per class whereas the CNN baseline considered is unable to converge. Furthermore, upon the convergence of the baseline model with the sufficient amount of data, still the adequate use of the studied techniques improves the accuracy in figures from 4% to 9%.

Navarro-Soria, I., Rico-Juan, J. R., Juárez-Ruiz de Mier, R., and Lavigne-Cervan, R. (2024). Prediction of attention deficit hyperactivity disorder based on explainable artificial intelligence. Applied Neuropsychology: Child, 0(0):1–14, (Impact Factor: 1.7 (1.7) - Q3 - 58/81). [ bib | DOI | http ]

Accurate assessment of Attention Deficit Hyperactivity Disorder (ADHD) is crucial for the effective treatment of affected individuals. Traditionally, psychometric tests such as the WISC-IV have been utilized to gather evidence and identify patterns or factors contributing to ADHD diagnosis. However, in recent years, the use of machine learning (ML) models in conjunction with post-hoc eXplainable Artificial Intelligence (XAI) techniques has improved our ability to make precise predictions and provide transparent explanations. The objective of this study is twofold: firstly, to predict the likelihood of an individual receiving an ADHD diagnosis using ML algorithms, and secondly, to offer interpretable insights into the decision-making process of the ML model. The dataset under scrutiny comprises 694 cases collected over the past decade in Spain, including information on age, gender, and WISC-IV test scores. The outcome variable is the professional diagnosis. Diverse ML algorithms representing various learning styles were rigorously evaluated through a stratified 10-fold cross-validation, with performance assessed using key metrics, including accuracy, area under the receiver operating characteristic curve, sensitivity, and specificity. Models were compared using both the full set of initial features and a well-suited wrapper-type feature selection algorithm (Boruta). Following the identification of the most suitable model, Shapley additive values were computed to assign weights to each predictor based on their additive contribution to the outcome and to elucidate the predictions. Strikingly, a reduced set of 8 out of the initial 20 variables produced results comparable to using the full feature set. Among the ML models tested, the Random Forest algorithm outperformed others on most metrics (ACC = 0.90, AUC = 0.94, Sensitivity = 0.91, Specificity = 0.92). Notably, the principal predictors, ranked by importance, included GAI – CPI, WMI, CPI, PSI, VCI, WMI – PSI, PRI, and LN. Individual case examples exhibit variations in predictions depending on unique characteristics, including instances of false positives and negatives. Our ML model adeptly predicted ADHD diagnoses in 90% of cases, with potential for further enhancement by expanding our database. Furthermore, the use of XAI techniques enables the elucidation of salient factors in individual cases, thereby aiding inexperienced professionals in the diagnostic process and facilitating comparison with expert assessments. It is important to note that this tool is designed to support the ADHD diagnostic process, where the medical professional always has the final say in decision-making.

Cachero, C., Rico-Juan, J. R., and Macià, H. (2023). Influence of personality and modality on peer assessment evaluation perceptions using Machine Learning techniques. Expert Systems with Applications, 213:119150, (Impact Factor: 8.665 (8.093) - Q1 - 21/144(Open Access)). [ bib | DOI ]

The successful instructional design of self and peer assessment in higher education poses several challenges that instructors need to be aware of. One of these is the influence of students’ personalities on their intention to adopt peer assessment. This paper presents a quasi-experiment in which 85 participants, enrolled in the first-year of a Computer Engineering programme, were assessed regarding their personality and their acceptance of three modalities of peer assessment (individual, pairs, in threes). Following a within-subjects design, the students applied the three modalities, in a different order, with three different activities. An analysis of the resulting 1195 observations using ML techniques shows how the Random Forest algorithm yields significantly better predictions for three out of the four adoption variables included in the study. Additionally, the application of a set of eXplainable Artificial Intelligence (XAI) techniques shows that Agreeableness is the best predictor of Usefulness and Ease of Use, while Extraversion is the best predictor of Compatibility, and Neuroticism has the greatest impact on global Intention to Use. The discussion highlights how, as it happens with other innovations in educational processes, low levels of Consciousness is the most consistent predictor of resistance to the introduction of peer assessment processes in the classroom. Also, it stresses the value of peer assessment to augment the positive feelings of students scoring high on Neuroticism, which could lead to better performance. Finally, the low impact of the peer assessment modality on student perceptions compared to personality variables is debated.

Llorca-Schenk, J., Rico-Juan, J. R., and Sanchez-Lozano, M. (2023). Designing porthole aluminium extrusion dies on the basis of eXplainable Artificial Intelligence. Expert Systems with Applications, page 119808, (Impact Factor: 8.665 (8.093) - Q1 - 21/144). [ bib | DOI ]

This paper shows the development of a tool with which to solve the most critical aspect of the porthole die design problem using a predictive model based on machine learning (ML). The model relies on a large amount of geometrical data regarding successful porthole die designs, information on which was obtained thanks to a collaboration with a leading extrusion company. In all cases, the dies were made of H-13 hot work steel and the billet material was 6063 aluminium alloy. The predictive model was chosen from a series of probes with different algorithms belonging to various ML families, which were applied to the analysis of geometrical data corresponding to 596 ports from 88 first trial dies. Algorithms based on the generation of multiple decision trees together with the boosting technique obtained the most promising results, the best by far being the CatBoost algorithm. The explainability of this model is based on a post-hoc approach using the SHAP (SHapley Additive exPlanations) tool. The results obtained with this ML-based model are notably better than those of a previous model based on linear regression as regards both the R2 metric and the results obtained with the application examples. An additional practical advantage is its explainability, which is a great help when deciding the best way in which to adjust an initial design to the predictive model. This ML-based model is, therefore, an optimal means to integrate the experience and know-how accumulated through many designs over time in order to apply it to new designs. It will also provide an aid in generating the starting point for the design of high-difficulty dies, in order to minimise the number of FEM (finite element method) simulation/correction iterations required until an optimal solution is achieved. It is not aimed to eliminate FEM simulation from the design tasks, but rather to help improve and accelerate the whole process of designing porthole dies. The work presented herein addresses a validation model for a very common porthole die typology: four cavity and four port per cavity dies for 6xxx series aluminium alloys. However, a wide range of research regarding the generalisation of this model or its extension to other porthole die typologies must still be carried out.

Rico-Juan, J. R., Sánchez-Cartagena, V. M., Valero-Mas, J. J., and Gallego, A. J. (2023). Identifying Student Profiles Within Online Judge Systems Using Explainable Artificial Intelligence. IEEE Transactions on Learning Technologies, 16(6):955–969, (Impact Factor: 4.433 (4.414) - Q2 - 41/112). [ bib | DOI ]

Online Judge (OJ) systems are typically considered within programming-related courses as they yield fast and objective assessments of the code developed by the students. Such an evaluation generally provides a single decision based on a rubric, most commonly whether the submission successfully accomplished the assignment. Nevertheless, since in an educational context such information may be deemed insufficient, it would be beneficial for both the student and the instructor to receive additional feedback about the overall development of the task. This work aims to tackle this limitation by considering the further exploitation of the information gathered by the OJ and automatically inferring feedback for both the student and the instructor. More precisely, we consider the use of learning-based schemes—particularly, Multi-Instance Learning and classical Machine Learning formulations—to model student behaviour. Besides, Explainable Artificial Intelligence is contemplated to provide human-understandable feedback. The proposal has been evaluated considering a case of study comprising 2,500 submissions from roughly 90 different students from a programming-related course in a Computer Science degree. The results obtained validate the proposal: the model is capable of significantly predicting the user outcome (either passing or failing the assignment) solely based on the behavioural pattern inferred by the submissions provided to the OJ. Moreover, the proposal is able to identify prone-to-fail student groups and profiles as well as other relevant information, which eventually serves as feedback to both the student and the instructor.

Oliver-Roig, A., Rico-Juan, J. R., Richart-Martínez, M., and Cabrero-García, J. (2022). Predicting exclusive breastfeeding in maternity wards using Machine Learning techniques. Computer Methods and Programs in Biomedicine, page 106837, (Impact Factor: 5.428(Q1) 22/111 - JCR). [ bib | DOI ]

Background and Objective: Adequate support in maternity wards is decisive for breastfeeding outcomes during the first year of life. Quality improvement interventions require the identification of the factors influencing hospital benchmark indicators. Machine Learning (ML) models and post-hoc eXplainable Artificial Intelligence (XAI) techniques allow accurate predictions and explaining them. This study aimed to predict exclusive breastfeeding during the in-hospital postpartum stay by ML algorithms and explain the ML model's behaviour to support decision making. Methods: The dataset included 2042 mothers giving birth in 18 hospitals in Eastern Spain. We obtained information on demographics, mothers' breastfeeding experiences, clinical variables, and participating hospitals' support conditions. The outcome variable was exclusive breastfeeding during the in-hospital postpartum stay. We tested algorithms from different ML families. To evaluate the ML models, we applied 10-fold stratified cross-validation. We used the following metrics: Area under curve receiver operating characteristic (ROC AUC), Area under curve precision-recall (PR AUC), accuracy, and Brier score. After selecting the best fitting model, we calculated Shapley's additive values to assign weights to each predictor depending on its additive contribution to the outcome and to explain the predictions. Results: The XGBoost algorithms showed the best metrics (ROC AUC=0.78, PR AUC=0.86, accuracy=0.75, Brier=0.17). The main predictors of the model included, in order of importance, the pacifier use, the degree of breastfeeding self-efficacy, the previous breastfeeding experience, the birth weight, the admission of the baby to a neonatal care unit after birth, the moment of the first skin-to-skin contact between mother and baby, and the Baby-Friendly Hospital Initiative accreditation of the hospital. Specific examples for linear and nonlinear relations between main predictors and the outcome and heterogeneity of effects are presented. Also, we describe diverse individual cases showing the variation of the prediction depending on individual characteristics. Conclusions: The ML model adequately predicted exclusive breastfeeding during the in-hospital stay. Our results pointed to opportunities for improving care related to support for specific mother's groups, defined by current and previous infant feeding experiences and clinical conditions of the newborns, and the participating hospitals' support conditions. Also, XAI techniques allowed identifying non-linearity relations and effect's heterogeneity, explaining specific cases' risk variations.

Gallego, A. J., Rico-Juan, J. R., and Valero-Más, J. J. (2022). Efficient k-nearest neighbor search based on clustering and adaptive k values. Pattern Recognition, 122:108356, (Impact Factor: 7.740 - Q1 - 17/140 - JCR). [ bib | DOI | http ]

The k-Nearest Neighbor (kNN) algorithm is widely used in the supervised learning field and, particularly, in search and classification tasks, owing to its simplicity, competitive performance, and good statistical properties. However, its inherent inefficiency prevents its use in most modern applications due to the vast amount of data that the current technological evolution generates, being thus the optimization of k-NN-based search strategies of particular interest. This paper introduces the caKD+ algorithm, which tackles this limitation by combining the use of feature learning techniques, clustering methods, adaptive search parameters per cluster, and the use of pre-calculated K-Dimensional Tree structures, and results in a highly efficient search method. This proposal has been evaluated using 10 datasets and the results show that caKD+ significantly outperforms 16 state-of-the-art efficient search methods while still depicting such an accurate performance as the one by the exhaustive kNN search.

Cabrero‑García, J., Rico‑Juan, J. R., and Oliver‑Roig, A. (2021). Does the global activity limitation indicator measure participation restriction? Data from the European Health and Social Integration Survey in Spain. Quality of Life Research, (Impact Factor: 4.147 (4.072) - Q1 - JCR). [ bib | DOI | http ]

Purpose The global activity limitation indicator (GALI) is the only internationally agreed and harmonised participation restriction measure. We examine if GALI, as intended, is a refective measure of the domains of participation; furthermore, we determine the relative importance of these domains. Also, we investigated the consistency of response to GALI by age and gender and compared the performance of GALI with that of self-rated health (SRH). Methods We used Spanish data from the European Health and Social Integration Survey and selected adults aged 18 and over (N=13,568). Data analysis, based on logistic regression models and Shapley value decomposition, were also stratifed by age. The predictors of the models were demographic variables and restrictions in participation domains: studies, work, mobility, leisure and social activities, domestic life, and self-care. The GALI and SRH were the response variables. Results GALI was strongly associated with all participation domains (e.g. for domestic life, adjusted OR 24.34 (95% CI 18.53–31.97) in adult under 65) and performed diferentially with age (e.g. for domestic life, adjusted OR 13.33 (95% CI 10.42–17.03) in adults over 64), but not with gender. The relative importance of domains varied with age (e.g. work was the most important domain for younger and domestic life for older adults). The results with SRH were parallel to those of GALI, but the association of SRH with participation domains was lowest. Conclusions GALI refects well restrictions in multiple participation domains and performs diferently with age, probably because older people lower their standard of good functioning.

Ortega-Bastida, J., Gallego, A. J., Rico-Juan, J. R., and Albarrán, P. (2021). A multimodal approach for regional GDP prediction using social media activity and historical information. Applied Soft Computing, 111:107693, (Impact Factor: 5.472 - Q1 - 20/137 - JCR). [ bib | DOI | http ]

This work proposes a multimodal approach with which to predict the regional Gross Domestic Product (GDP) by combining historical GDP values with the embodied information in Twitter messages concerning the current economic condition. This proposal is of great interest, since it delivers forecasts at higher frequencies than both the official statistics (published only annually at the regional level in Spain) and the existing unofficial quarterly predictions (which rely on economic indicators that are available only after months of delay). The proposed method is based on a two-stage architecture. In the first stage, a multi-task autoencoder is initially used to obtain a GDP-related representation of tweets, which are then filtered to remove outliers and to obtain the GDP prediction from the consensus of opinions. In a second stage, this result is combined with the historical GDP values of the region using a multimodal network. The method is evaluated in four different regions of Spain using the tweets written by the most relevant economists, politicians, newspapers and institutions in each one. The results show that our approach successfully learns the evolution of the GDP using only historical information and tweets, thus making it possible to provide earlier forecasts about the regional GDP. This method also makes it possible to establish which the most or least influential opinions regarding this prediction are. As an additional exercise, we have assessed how well our method predicted the effect of the COVID-19 pandemic.

Rico-Juan, J. R., Cachero, C., and Macià, H. (2021). Influence of individual versus collaborative peer assessment on score accuracy and learning outcomes in higher education: an empirical study. Assessment & Evaluation in Higher Education, pages 1–18, (Impact Factor: 4.984 - Q1 - 17/264 - JCR (Social Science)). [ bib | DOI | http ]

Maximising the accuracy and learning of self and peer assessment activities in higher education requires instructors to make several design decisions, including whether the assessment process should be individual or collaborative, and, if collaborative, determining the number of members of each peer assessment team. In order to support this decision, a quasi-experiment was carried out in which 82 first-year students used three peer assessment modalities. A total of 1574 assessments were obtained. The accuracy of both the students’ self-assessment and their peer assessment was measured. Results show that students’ self-assessment significantly improved when groups of three were used, provided that those with the 20% poorest performances were excluded from the analysis. This suggests that collaborative peer assessment improves learning. Peer assessment scores were more accurate than self-assessment, regardless of the modality, and the accuracy improved with the number of assessments received. Instructors need to consider the trade-off between students’ improved understanding, which favours peer assessment using groups of three, and a higher number of assessments, which, under time constraints, favours individual peer assessment.

Rico-Juan, J. R. and Taltavull de La Paz, P. (2021). Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain. Expert Systems with Applications, 171:114590, (Impact Factor: 6.954 - Q1 - 23/140 - JCR (Open Access)). [ bib | DOI | http ]

Two sets of modelling tools are used to evaluate the precision of housing-price forecasts: machine learning and hedonic regression. Evidences on the prediction capacity of a range of methods points to the superiority of the random forest as it can calculate real-estate values with an error of less than 2%. This method also ranks the attributes that are most relevant to determining housing prices. Hedonic regression models are less precise but more robust as they can identify the housing attributes that most affect the level of housing prices. This empirical exercise adds new knowledge to the literature as it investigates the capacity of the random forest to identify the three dimensions of non-linearity which, from an economic theoretical point of view, would identify the reactions of different market agents. The intention of the robustness test is to check for these non-linear relationships using hedonic regression. The quantile tools also highlight non-linearities, depending on the price levels. The results show that a combination of techniques would add information on the unobservable (non-linear) relationships between housing prices and housing attributes on the real-estate market.

Gallego, A. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2020). Insights into efficient k-Nearest Neighbor classification with Convolutional Neural Codes. IEEE Access, 8:99312–99326, (Impact Factor: 3.367 - Q2 - 65/162 - JCR (Open Access)). [ bib | DOI | http ]

The increasing consideration of Convolutional Neural Networks (CNN) has not prevented the use of the k-Nearest Neighbor (kNN) method. In fact, a hybrid CNN-kNN approach is an interesting option in which the network specializes in feature extraction through its activations (Neural Codes), while the kNN has the advantage of performing a retrieval by means of similarity. However, this hybrid approach also has the disadvantages of the kNN search, and especially its high computational cost which is, in principle, undesirable for large-scale data. In this paper, we present the first comprehensive study of efficient kNN search algorithms using this hybrid CNN-kNN approach. This has been done by considering up to 16 different algorithms, each of which is evaluated with a different parametrization, in 7 datasets of heterogeneous composition. Our results show that no single algorithm is capable of covering all aspects, but rather that each family of algorithms is better suited to specific aspects of the problem. This signifies that Fast Similarity Search algorithms maintain their performance, but do not reduce the cost as much as the Data Reduction family does. In turn, the Approximated Similarity Search family is postulated as a good option when attempting to balance accuracy and efficiency. The experiments also suggest that considering statistical transformation algorithms such as Linear Discriminant Analysis might be useful in certain cases

Rico-Juan, J. R., Valero-Mas, J. J., and Iñesta, J. M. (2020). Bounding Edit Distance for similarity-based sequence classification on Structural Pattern Recognition. Applied Soft Computing, 97:106778, (Impact Factor: 6.725 - Q1 - 11/112 - JCR). [ bib ]

Pattern Recognition tasks in the structural domain generally exhibit high accuracy results, but their time efficiency is quite low. Furthermore, this low performance is more pronounced when dealing with instance-based classifiers, since, for each query, the entire corpus must be evaluated to find the closest prototype. In this work we address this efficiency issue for the Nearest Neighbor classifier when data are encoded as two-dimensional code sequences, and more precisely strings and sequences of vectors. For this, a set of bounds is proposed in the distance metric that avoid the calculation of unnecessary distances. Results obtained prove the effectiveness of the proposal as it reduces the classification time in percentages between 80% and 90% for string representations and between 60% and 80% for data codified as sequences of vectors with respect to their corresponding non-optimized version of the classifier.

Calvo-Zaragoza, J., Rico-Juan, J. R., and Gallego, A. J. (2020). Ensemble classification from deep predictions with test data augmentation. Soft Computing, 24(2):1423–1433, (Impact Factor: 3.643 - Q2 - 49/140 - JCR). [ bib | DOI | http ]

Data augmentation has become a standard step to improve the predictive power and robustness of Convolutional Neural Networks by means of the synthetic generation of new samples depicting different deformations. This step has been traditionally considered to improve the network at the training stage. In this work, however, we study the use of data augmentation at classification time. That is, the test sample is augmented, following the same procedure considered for training, and the decision is taken with an ensemble prediction over all these samples. We present comprehensive experimentation with several datasets and ensemble decisions, considering a rather generic data augmentation procedure. Our results show that performing this step is able to boost the original classification, even when the room for improvement is limited.

Rico-Juan, J. R. and Valero-Mas, J. J. Calvo-Zaragoza, J. (2019). Extensions to rank-based prototype selection in k-Nearest Neighbour classification. Applied Soft Computing, 85(105803), (Impact Factor: 4.873 - Q1 - JCR). [ bib | DOI | http ]

The k-nearest neighbour rule is commonly considered for classification tasks given its straightforward implementation and good performance in many applications. However, its efficiency represents an obstacle in real-case scenarios because the classification requires computing a distance to every single prototype of the training set. Prototype Selection (PS) is a typical approach to alleviate this problem, which focuses on reducing the size of the training set by selecting the most interesting prototypes. In this context, rank methods have been postulated as a good solution: following some heuristics, these methods perform an ordering of the prototypes according to their relevance in the classification task, which is then used to select the most relevant ones. This work presents a significant improvement of existing rank methods by proposing two extensions: i) a greater robustness against noise at label level by considering the parameter `k' of the classification in the selection process; and ii) a new parameter-free rule to select the prototypes once they have been ordered. The experiments performed in different scenarios and datasets demonstrate the goodness of these extensions. Also, it is reported that the new full approach is competitive with respect to existing PS algorithms.

Rico-Juan, J. R., Gallego, A. J., and Calvo-Zaragoza, J. (2019). Automatic detection of inconsistencies between numerical scores and textual feedback in peer-assessment processes with machine learning. Computers & Education, 140(103609), (Impact Factor: 4.538 - Q1 - JCR). [ bib | DOI | http ]

The use of peer assessment for open-ended activities has advantages for both teachers and students. Teachers might reduce the workload of the correction process and students achieve a better understanding of the subject by evaluating the activities of their peers. In order to ease the process, it is advisable to provide the students with a rubric over which performing the assessment of their peers; however, restricting themselves to provide only numerical scores is detrimental, as it prevents providing valuable feedback to others peers. Since this assessment produces two modalities of the same evaluation, namely numerical score and textual feedback, it is possible to apply automatic techniques to detect inconsistencies in the evaluation, thus minimizing the teachers' workload for supervising the whole process. This paper proposes a machine learning approach for the detection of such inconsistencies. To this end, we consider two different approaches, each of which is tested with different algorithms, in order to both evaluate the approach itself and find appropriate models to make it successful. The experiments carried out with 4 groups of students and 2 types of activities show that the proposed approach is able to yield reliable results, thus representing a valuable approach for ensuring a fair operation of the peer assessment process.

Rico-Juan, J. R., Gallego, A. J., Valero-Mas, J. J., and Calvo-Zaragoza, J. (2018). Statistical semi-supervised system for grading multiple peer-reviewed open-ended works. Computers & Education, 126(1):264–282, (Impact Factor: 4.538 - Q1 - JCR). [ bib | DOI | http ]

In the education context, open-ended works generally entail a series of benefits as the possibility of develop original ideas and a more productive learning process to the student rather than closed-answer activities. Nevertheless, such works suppose a significant correction workload to the teacher in contrast to the latter ones that can be self-corrected. Furthermore, such workload turns to be intractable with large groups of students. In order to maintain the advantages of open-ended works with a reasonable amount of correction effort, this article proposes a novel methodology: students perform the corrections using a rubric (closed Likert scale) as a guideline in a peer-review fashion; then, their markings are automatically analyzed with statistical tools to detect possible biased scorings; finally, in the event the statistical analysis detects a biased case, the teacher is required to intervene to manually correct the assignment. This methodology has been tested on two different assignments with two heterogeneous groups of people to assess the robustness and reliability of the proposal. As a result, we obtain values over 95% in the confidence of the intra-class correlation test (ICC) between the grades computed by our proposal and those directly resulting from the manual correction of the teacher. These figures confirm that the evaluation obtained with the proposed methodology is statistically similar to that of the manual correction of the teacher with a remarkable decrease in terms of effort.

Castellanos, F. J., Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2018). Oversampling imbalanced data in the string space. Pattern Recognition Letters, 103:32–38, (Impact Factor: 1.995 - Q2 - JCR). [ bib | DOI | http ]

Imbalanced data is a typical problem in the supervised classification field, which occurs when the different classes are not equally represented. This fact typically results in the classifier biasing its performance towards the class representing the majority of the elements. Many methods have been proposed to alleviate this scenario, yet all of them assume that data is represented as feature vectors. In this paper we propose a strategy to balance a dataset whose samples are encoded as strings. Our approach is based on adapting the well-known Synthetic Minority Over-sampling Technique (SMOTE) algorithm to the string space. More precisely, data generation is achieved with an iterative approach to create artificial strings within the segment between two given samples of the training set. Results with several datasets and imbalance ratios show that the proposed strategy properly deals with the problem in all cases considered.

Gallego, A. J., Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2018). Clustering-based k-Nearest Neighbor Classification for Large-Scale Data with Neural Codes Representation. Pattern Recognition, 74(1):531–543, (Impact Factor: 5.898(Q1) 14/134 - JCR). [ bib | DOI | http ]

While standing as one of the most widely considered and successful supervised classification algorithms, the k-Nearest Neighbor (kNN) classifier generally depicts a poor efficiency due to being an instance-based method. In this sense, Approximated Similarity Search (ASS) stands as a possible alternative to improve those efficiency issues at the expense of typically lowering the performance of the classifier. In this paper we take as initial point an ASS strategy based on clustering. We then improve its performance by solving issues related to instances located close to the cluster boundaries by enlarging their size and considering the use of Deep Neural Networks for learning a suitable representation for the classification task at issue. Results using a collection of eight different datasets show that the combined use of these two strategies entails a significant improvement in the accuracy performance, with a considerable reduction in the number of distances needed to classify a sample in comparison to the basic kNN rule.

Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2017b). Recognition of Handwritten Music Symbols using Meta-features Obtained from Weak Classifiers based on Nearest Neighbor. In ICPRAM, pages 96–104. [ bib | DOI | http ]

Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2017b). Selecting promising classes from generated data for an efficient multi-class Nearest Neighbor classification. Soft Computing, 21(20):6183–6189, (Impact Factor: 2.367 - Q2 - 45/132 - JCR). [ bib | DOI | http ]

The Nearest Neighbor rule is one of the most considered algorithms for supervised learning because of its simplicity and fair performance in most cases. However, this technique has a number of disadvantages, being the low computational efficiency the most prominent one. This paper presents a strategy to overcome this obstacle in multi-class classification tasks. This strategy proposes the use of Prototype Reduction algorithms that are capable of generating a new training set from the original one to try to gather the same information with fewer samples. Over this reduced set, it is estimated which classes are the closest ones to the input sample. These classes are referred to as promising classes. Eventually, classification is performed using the original training set using the Nearest Neighbor rule but restricted to the promising classes. Our experiments with several datasets and significance tests show that a similar classification accuracy can be obtained compared to using the original training set, with a significantly higher efficiency.

Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2017a). An Experimental Study on Rank Methods for Prototype Selection. Soft Computing, 21(19):5703–5715, (Impact Factor: 2.367 - Q2 - JCR). [ bib | DOI | http ]

Prototype Selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.

Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2017a). Prototype Generation on Structural Data using Dissimilarity Space Representation. Neural Computing and Applications, 28(9):2415–2424, ( Impact Factor: 4.213 - Q1 - JCR). [ bib | DOI | http ]

Data Reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, Prototype Selection (PS) and Prototype Gen- eration (PG) are the most representative ones. These two families differ in the way the reduced set is ob- tained from the initial one: while the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to delimit more efficiently decision bound- aries, the operations required are not so well defined in scenarios involving structural data such as strings, trees or graphs. This work studies the possibility of using Dissimilarity Space (DS) methods as an interme- diate process for mapping the initial structural representation to a statistical one, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.

Valero-Mas, J. J., Calvo-Zaragoza, J., and Rico-Juan, J. R. (2016). On the suitability of Prototype Selection methods for kNN classification with distributed data. Neurocomputing, 203:150–160, (Impact Factor: 2.392 - Q1 - JCR). [ bib | DOI | http ]

In the current Information Age, data production and processing demands are ever increasing. This has motivated the appearance of large-scale distributed information. This phenomenon also applies to Pattern Recognition so that classic and common algorithms, such as the k-Nearest Neighbour, are unable to be used. To improve the efficiency of this classifier, Prototype Selection (PS) strategies can be used. Nevertheless, current PS algorithms were not designed to deal with distributed data, and their performance is therefore unknown under these conditions. This work is devoted to carrying out an experimental study on a simulated framework in which PS strategies can be compared under classical conditions as well as those expected in distributed scenarios. Our results report a general behaviour that is degraded as conditions approach to more realistic scenarios. However, our experiments also show that some methods are able to achieve a fairly similar performance to that of the non-distributed scenario. Thus, although there is a clear need for developing specific PS methodologies and algorithms for tackling these situations, those that reported a higher robustness against such conditions may be good candidates from which to start.

Calvo-Zaragoza, J., Valero-Mas, J. J., and Rico-Juan, J. R. (2015). Improving kNN multi-label classification in Prototype Selection scenarios using class proposals. Pattern Recognition, 48(5):1608–1622, (Impact Factor: 3.096 - Q1 - JCR). [ bib | DOI | http ]

Prototype Selection (PS) algorithms allow a faster Nearest Neighbor classification by keeping only the most profitable prototypes of the training set. In turn, these schemes typically lowers the performance accuracy. In this work a new strategy for multi-label classifications tasks is proposed to solve this accuracy drop without the need of using all the training set. For that, given a new instance, the PS algorithm is used as a fast recommender system which retrieves the most likely classes. Then, the actual classification is performed only considering the prototypes from the initial training set belonging to the suggested classes. Results show this strategy provides a large set of trade-off solutions which fills the gap between PS-based classification efficiency and conventional kNN accuracy. Furthermore, this scheme is not only able to, at best, reach the performance of conventional kNN with barely a third of distances computed, but it does also outperform the latter in noisy scenarios, proving to be a much more robust approach.

Rico-Juan, J. R. and Calvo-Zaragoza, J. (2015). Improving classification using a Confidence Matrix based on weak classifiers applied to OCR. Neurocomputing, 151:1354–1361, (Impact Factor: 2.392 - Q1 - JCR). [ bib | http ]

This paper proposes a new feature representation method based on the construction of a Confidence Matrix (CM). This representation consists of posterior probability values provided by several weak classifiers, each one trained and used in different sets of features from the original sample. The CM allows the final classifier to abstract itself from discovering underlying groups of features. In this work the CM is applied to isolated character image recognition, for which several set of features can be extracted from each sample. Experimentation has shown that the use of the CM permits a significant improvement in accuracy in most cases, while the others remain the same. The results were obtained after experimenting with four well-known corpora, using evolved meta-classifiers with the k-Nearest Neighbor rule as weak classifier and by applying statistical significance tests.

Rico-Juan, J. R. and Iñesta, J. M. (2014). Adaptive training set reduction for nearest neighbor classification. Neurocomputing, 38(1):316–324, (Impact Factor: 2.083 - Q2 - JCR). [ bib | http ]

The research community related to the human-interaction framework is becoming increasingly more interested in interactive pattern recognition, taking direct advantage of the feedback information provided by the user in each interaction step in order to improve raw performance. The application of this scheme re[[uires learning techniques that are able to adaptively re-train the system and tune it to user behavior and the specific task considered. Traditional static editing methods filter the training set by applying certain rules in order to eliminate outliers or maintain those prototypes that can be beneficial in classification. This paper presents two new adaptive rank methods for selecting the best prototypes from a training set in order to establish its size according to an external parameter that controls the adaptation process, while maintaining the classification accuracy. These methods estimate the probability of each prototype of correctly classifying a new sample. This probability is used to sort the training set by relevance in classification. The results show that the proposed methods are able to maintain the error rate while reducing the size of the training set, thus allowing new examples to be learned with a few extra computations.

Abreu, J. and Rico-Juan, J. R. (2014). A New Iterative Algorithm for Computing a Quality Approximated Median of Strings based on Edit Operations. Pattern Recognition Letters, 36:74–80, (Impact Factor: 1.551 - Q2 - JCR). [ bib ]

This paper presents a new algorithm that can be used to compute an approximation to the median of a set of strings. The approximate median is obtained through the successive improvements of a partial solution. The edit distance from the partial solution to all the strings in the set is computed in each iteration, thus accounting for the frequency of each of the edit operations in all the positions of the approximate median. A goodness index for edit operations is later computed by multiplying their frequency by the cost. Each operation is tested, starting from that with the highest index, in order to verify whether applying it to the partial solution leads to an improvement. If successful, a new iteration begins from the new approximate median. The algorithm finishes when all the operations have been examined without a better solution being found. Comparative experiments involving Freeman chain codes encoding 2D shapes and the Copenhagen chromosome database show that the quality of the approximate median string is similar to benchmark approaches but achieves a much faster convergence.

Abreu, J. and Rico-Juan, J. R. (2013). An improved fast edit approach for two-string approximated mean computation applied to OCR. Pattern Recognition Letters, 34(5):496—504, (Impact Factor: 1.226 - Q2 - JCR). [ bib | http ]

This paper presents a new fast algorithm for computing an approximation to the mean of two strings of characters representing a 2D shape and its application to a new Wilson-based editing procedure. The approximate mean is built up by including some symbols from the two original strings. In addition, a greedy approach to this algorithm is studied, which allows us to reduce the time required to compute an approximate mean. The new dataset editing scheme relaxes the criterion for deleting instances proposed by the Wilson editing procedure. In practice, not all instances misclassified by their near neighbors are pruned. Instead, an artificial instance is added to the dataset in the hope of successfully classifying the instance in the future. The new artificial instance is the approximated mean of the misclassified sample and its same-class nearest neighbor. Experiments carried out over three widely known databases of contours show that the proposed algorithm performs very well when computing the mean of two strings, and outperforms methods proposed by other authors. In particular, the low computational time required by the heuristic approach makes it very suitable when dealing with long length strings. Results also show that the proposed preprocessing scheme can reduce the classification error in about 83% of trials. There is empirical evidence that using the greedy approximation to compute the approximated mean does not affect the performance of the editing procedure.

Rico-Juan, J. R. and Iñesta, J. M. (2012a). Confidence voting method ensemble applied to off-line signature verification. Pattern Analysis and Applications, 15(2):113–120, (Impact Factor: 0.814 - Q3 - JCR). [ bib ]

In this paper, a new approximation to off-line signature verification is proposed based on two-class classifiers using an expert decisions ensemble. Different methods to extract sets of local and a global features from the target sample are detailed. Also a normalisation by confidence voting method is used in order to decrease the final equal error rate (EER). Each set of features is processed by a single expert, and on the other approach proposed, the decisions of the individual classifiers are combined using weighted votes. Experimental results are given using a subcorpus of the large MCYT signature database for random and skilled forgeries. The results show that the weighted combination outperforms the individual classifiers significantly. The best EER obtained were 6.3% in the case of skilled forgeries and 2.3% in the case of random forgeries.

Rico-Juan, J. R. and Iñesta, J. M. (2012b). New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognition Letters, 33(5):654–660, (Impact Factor: 1.226 - Q2 - JCR). [ bib | http ]

Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.

Abreu, J. and Rico-Juan, J. R. (2011). Characterization of contour regularities based on the Levenshtein edit distance. Pattern Recognition Letters, 32:1421–1427, (Impact Factor: 1.034 - Q3 - JCR). [ bib ]

This paper describes a new method for quantifying the regularity of contours and comparing them (when encoded by Freeman chain codes) in terms of a similarity criterion which relies on information gathered from Levenshtein edit distance computation. The criterion used allows subsequences to be found from the minimal cost edit sequence that specifies an alignment of contour segments which are similar. Two external parameters adjust the similarity criterion. The information about each similar part is encoded by strings that represent an average contour region. An explanation of how to construct a prototype based on the identified regularities is also reviewed. The reliability of the prototypes is evaluated by replacing contour groups (samples) by new prototypes used as the training set in a classification task. This way, the size of the data set can be reduced without sensibly affecting its representational power for classification purposes. Experimental results show that this scheme achieves a reduction in the size of the training data set of about 80% while the classification error only increases by 0.45% in one of the three data sets studied.

Rico-Juan, J. R. and Abreu, J. I. (2010). A new editing scheme based on a fast two-string median computation applied to OCR. In Hancok, E. R., Wilson, R. C., Ilkay, T. W., and Escolano, F., editors, Structural, Syntactic, and Statistical Pattern Recognition, number 6218 in Lecture Notes in Computer Science, pages 748–756. Springer. [ bib ]

This paper presents a new fast algorithm to compute an approximation to the median between two strings of characters representing a 2D shape and its application to a new classification scheme to decrease its error rate. The median string results from the application of certain edit operations from the minimum cost edit sequence to one of the original strings. The new dataset editing scheme relaxes the criterion to delete instances proposed by the Wilson Editing Proce- dure. In practice, not all instances misclassified by its near neighbors are pruned. Instead, an artificial instance is added to the dataset expecting to successfully classify the instance on the future. The new artificial instance is the median from the misclassified sample and its same-class nearest neighbor. The experiments over two widely used datasets of handwritten characters show this preprocessing scheme can reduce the classification error in about 78% of trials.

Rico-Juan, J. R. (2009). Creating synchronised presentations for mobile devices using open source tools. In Proceedings of the 2009 International Conference on e-Learning, e-Business, Enterprise Information Systems and e-Government, pages 50–52, Las Vegas, Nevada, USA. CSREA Press. [ bib | .pdf ]

In this paper, we describe a way to create synchronized presentations for mobile devices using only open-source tools. In the framework of higher education, it is important to provide the students with flexible and interactive resources when the time assigned to laboratory or lectures gets decreased. Nowadays, the students have often one o many mobile devices such as mobile phones, smartphones, PDAs (Personal Digital Assistant), etc. This gives teachers the opportunity to create resources for these kind of devices. On the other hand, the open-source software offers an interesting alternative in order to create educational resources, just using a single tool or a combination of them. The main idea here is to describe a procedure to create presentations combining PDF files as slides, audio files with detailed explanations and flash video files (.swf) as showing demos. We describe in detail how to integrate these individual components to create a high quality presentation,, based on vectorial components, with small size of result files. It also allows to play these presentations in a mobile devices. In contrast to commercial tools, our approach does not use special interfaces or formats and it allows one to export presentations to formats compatible with other tools in a future tools. Our proposal also allows one to work with conventional tools to create slides (such as PowerPoint, OpenOffice.org Impress or LaTeX) due to the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format. In order to make use of the educational resources, we just need a mobile device with a web browser and a flash plug-in installed and, therefore, the result can be easily distributed through a web server or as a package that can be stored locally in the device.

Abreu, J. I. and Rico-Juan, J. R. (2009). Contour regularity extraction based on string edit distance. In Pattern Recognition and Image Analysis. IbPRIA 2009, Lecture Notes in Computer Science, pages 160–167, Póvoa de Varzim, Portugal. Springer. [ bib | .pdf ]

In this paper, we present a new method for constructing prototypes representing a set of contours encoded by Freeman Chain Codes.Our method build new prototypes taking into account similar segments shared between contours instances. The similarity criterion was based on the Levenshtein Edit Distance definition. We also outline how to apply our method to reduce a data set without sensibly affect its representational power for classification purposes. Experimental results shows that our scheme can achieve compressions about 50% while classification error increases only by 0.75%.

Rico-Juan, J. R. and Iñesta, J. M. (2007). Normalisation of Confidence Voting Methods Applied to a Fast Handwritten OCR Classification. In Kurzynski, M., Puchala, E., Wozniak, M., and Zolnierek, A., editors, Computer Recognition Systems 2, number 45 in Advances in Soft Computing, pages 405–412, Wroclaw, Poland. Springer. [ bib | .pdf ]

In this work, a normalisation of the weights utilized for combining classifiers decisions based on similarity Euclidean distance is presented. This normalisation is used by the confidence voting methods to decrease the final error rate in an OCR task. Difierent features from the characters are extracted. Each set of features is processed by a single classifier and then the decisions of the individual classifiers are combined using weighted votes, using different techniques. The error rates obtained are as good or slightly better than those obtained using a Freeman chain codes as contour representation and the string edit distance as similarity measure, but the complexity and classication time decrease dramatically.

Rico-Juan, J. R. and Carrasco, R. C. (2007b). How to do easy video presentaions using open source tools. In International Technology, Education and Development Conference (INTED), volume 1, pages 30–31, Valencia, (Spain). International Association of Technology, Education and Development (IATED). [ bib | http ]

In this paper, we describe a flexible approximation to create video presentations using open source tools. In the new ECTS framework the time that the student spends in a laboratory or in a classroom is reduced and, therefore it is important to assist to students with materials more flexible and interactive than classical electronic papers or books. The open source programs are a good alternative to create educational resources. Often, it not possible to do the whole video presentation with a single tool, but it is possible choose different tools to do that. We implements a method to create video presentations from PDF file (slides), audio files and flash (.swf) video files. We describe in detail how integrate this individual components to create automatically a high quality video presentation with small output files. With commercial tools, we may only use the special interfaces and formats supported by the tool. As a consequence, we cannot export all presentation to files compatible with other presentations tool. So, if we cannot export our previous presentations it is difficult to change to a new better tool. The method described here solves these problems. It allows us to work with traditional tools to create slides (PowerPoint, Open Office Impress or LaTeX), provided that we export to PDF the final slides. We can use some audio tool to create audio for each slide in different formats (WAV, OGG and MP3 are supported). If we want to include a video we need to convert it to SWF (Sockwave FlashTM) format. The result requires only a web browser with a Flash plug-in. So, we can distribute the result in standard media such as a web server, a CD or a DVD.

Rico-Juan, J. R. and Carrasco, R. C. (2007a). How to create an efficient audiovisual slide presenter. In IADAT-e2007. 4th. IADAT Interntional Conference on Education, volume 1, pages 40–43, Palma de Mallorca, (Spain). International Association for the Development of Advances in Technology (IADAT). [ bib ]

In this paper, we describe a flexible tool to create synchronized presentations using only open-source tools. In the framework of the new European Credit Transfer System (ECTS), it is even more important to provide the students with flexible and interactive resources as the time assigned to laboratory or lectures gets decreased. The open-source software offers an interesting alternative in order to create educational resources, some times using a single tool but often using a combination of them. Here, we describe a procedure to create AV presentations combining PDF files (slides), audio files, flash video files (.swf) and flash video streaming (.flv) . We describe in detail how integrate these individual components to automatically create a high quality presentation, that is, based on vectorial components, with small size of result files. It also allows to integrate video or video streaming into single slides. In contrast to commercial tools, this tool does not use special interfaces or formats and it allows one to export presentations to formats compatible with other (future) presentation tools. Our tool also allows one to work with traditional tools to create slides (such as PowerPoint, OpenOffice Impress or LaTeX) provided that the final slides are exported to PDF and also to use standard audio tools to create audio (WAV, OGG and MP3 are supported). Video can be included just by converting the original file to SWF (flash video) format or FLV (flash video streaming). In order to make use of the educational resource, we need just a web browser with flash plug-in installed and, therefore, the result can be easily distributed through a web server, a CD or a DVD.

Rico-Juan, J. R. and Iñesta, J. M. (2006a). Edit Distance for Ordered Vector Sets: A Case of Study. In Yeung, D., Kwok, J. T., Fred, A., Roli, F., and de Ridder, D., editors, Structural, Syntactic, and Statistical Pattern Recognition, number 4109 in Lecture Notes in Computer Science, pages 200–207, Hong Kong, China. Springer. [ bib | .pdf ]

Digital contours in a binary image can be described as an ordered vector set. In this paper an extension of the string edit distance is defined for its computation between a pair of ordered sets of vectors. This way, the differences between shapes can be computed in terms of editing costs. In order to achieve efficency a dominant point detection algorithm should be applied, removing redundant data before coding shapes into vectors. This edit distance can be used in nearest neighbour classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based on string or tree representations of the binary image.

Rico-Juan, J. R. and Iñesta, J. M. (2006b). An edit distance for ordered vector sets with application to character recognition, volume 1, chapter 4, pages 54–62. Computer Vision Center. [ bib | .ps ]

In this paper a new algorithm to describe a binary image as an ordered vector set is presented. An extension of the string edit distance is defined for computing it between a pair of ordered sets of vectors. This edit distance can be used in nearest neighbor classification tasks. The advantages of this method applied to isolated handwritten character classification are shown, compared to similar methods based in string or tree representations of the binary image.

Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2005). Smoothing and Compression with Stochastic k-testable Tree Languages. Pattern Recognition, 38(9):1420–1430, (Impact Factor: 2.607 - Q1 - JCR). [ bib ]

In this paper, we describe a generalization for tree stochastic languages of k-gram models. These models are based on the k-testable class, a subclass of the languages recognizable by ascending tree auntomata. One of the advantages of this approchis that the probabilistic model can be updated in an incremental fashion. Another feature is that backing-off schemes can be defined. As an illustration of their applicability, they have been used to compress tree data files at a better rate than string-based methods.

Rico-Juan, J. R. and Micó, L. (2004). Finding significant points for a handwritten classification task. In Campilho, A. and Kamel, M., editors, International Conference on Image Analysis and Recognition, number 3211 in Lecture Notes in Computer Science, pages 440–446, Porto, Portugal. Springer. [ bib | .pdf ]

When objects are represented by curves in a plane, highly useful information is conveyed by significant points. In this paper, we compare the use of different mobile windows to extract dominant points of handwritten characters. The error rate and classification time using an edit distance based nearest neighbour search algorithm are compared for two different cases: string and tree representation.

Rico-Juan, J. R. and Micó, L. (2003b). Some Results about the Use of Tree/String Edit Distances in a Nearest Neighbour Classification Task. In Goos, G., Hartmanis, J., and van Leeuwen, J., editors, Pattern Recognition and Image Analysis, number 2652 in Lecture Notes in Computer Science, pages 821–828, Puerto Andratx, Mallorca, Spain. Springer. [ bib | .pdf ]

In pattern recognition there is a variety of applications where the patterns are classified using edit distance. In this paper we present some results comparing the use of tree and string edit distances in a handwritten character recognition task. Some experiments with different number of classes and of classifiers are done.

Carrasco, R. C. and Rico-Juan, J. R. (2003). A similarity between probabilistic tree languages: application to XML document families. Pattern Recognition, 36(9), (Impact Factor: 1.611 - Q1 - JCR). [ bib ]

We describe a general approach to compute a similarity measure between distributions generated by probabilistic tree automata that may be used in a number of applications in the pattern recognition field. In particular, we show how this similarity can be computed for families of structured (XML) documents can be computed. In such case, the use of regular expressions to specify the right part of the expansion rules adds some complexity to the task.

Rico-Juan, J. R. and Micó, L. (2003a). Comparison of AESA and LAESA search algorithms using string and tree edit distances. Pattern Recognition Letters, 24(9):1427–1436, (Impact Factor: 0.809 - Q3 () - JCR). [ bib | .pdf ]

Although the success rate of handwritten character recognition using a nearest neighbour technique together with edit distance is satisfactory, the exhaustive search is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms is very low and does not depend on the number of prototypes in the training set. In this paper, we compare the behaviour of these algorithms when string and tree edit distances are used.

Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2002). Stochastic k-testable Tree Languages and Applications. In Adriaans, P., Fernau, H., and van Zaanen, M., editors, Grammatical Inference: Algorithms and Applications. ICGI 2002, number 2484 in Lecture Notes in Artificial Intelligence, pages 199–212, Amsterdam (Nederland). Springer-Verlag. [ bib ]

In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)

Rico-Juan, J. R. and Calera-Rubio, J. (2002). Evaluation of handwritten character recognizers using tree-edit-distance and fast nearest neighbour search. In Iñesta, J. M. and Micó, L., editors, Pattern Recognition in Information Systems, pages 326–335, Alicante (Spain). ICEIS PRESS. [ bib | .pdf ]

Although the rate of well classified prototypes using tree-edit-distance is satisfactory, the exhaustive classification is expensive. Some fast methods as AESA and LAESA have been proposed to find nearest neighbours in metric spaces. The average number of distances computed by these algorithms does not depend on the number of prototypes. In this paper we apply these classifiers algorithms to the task of handwritten character recognition and obtain a low average error rate (2%) and a fast classification.

Rico-Juan, J. R. (2001). Inferencia estocástica y aplicaciones de los lenguajes de árboles. PhD thesis, Universidad de Alicante, Departamento de Lenguajes y Sistemas Informáticos. [ bib | .pdf ]

Una de las aportaciones originales de esta tesis es la definición de un modelo de inferencia estocástica para lenguajes k-testables de árboles y su aplicación a la compresión y clasificación. También se aportan otros modelos probabilísticos para las tareas de compresión de superficies 3D y reconocimiento de palabras manuscritas fuera de línea

Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000b). Probabilistic k-testable tree-language. In Oliveira, A. L., editor, Proceedings of 5th International Colloquium, volume 1891 of Lecture Notes in Computer Science, pages 221–228, Lisboa (Portugal). Springer-Verlag. [ bib | .ps.gz | .pdf ]

In this paper, we present a natural generalization of k-gram models for tree stochastic languages based on the k-testable class. In this class of models, frequencies are estimated for a probabilistic regular tree grammar wich is bottom-up deterministic. One of the advantages of this approach is that the model can be updated in an incremental fashion. This method is an alternative to costly learning algorithms (as inside-outside-based methods) or algorithms that require larger samples (as many state merging/splitting methods)

Rico-Juan, J. R., Calera-Rubio, J., and Carrasco, R. C. (2000a). Lossless compression of surfaces described as points. In Ferri, F. J., Iñesta, J. M., Amin, A., and Pudil, P., editors, Advances in Pattern Recognition, volume 1876 of Lecture Notes in Computer Science, pages 457–461, Berlin. Springer-Verlag. [ bib ]

In many applications, objects are represented by a collection if unorganized points that scan the surface of the object. In such cases, an efficent way of storin this information is of interest. In this paper we present an arithmetic compression scheme that uses a tree representation of the data set and allows for better compression rates than general-purpose methods.

Rico-Juan, J. R. (1999b). Off-line cursive handwritten word recognition based on tree extraction and an optimized classification distance. In Torres, M. I. and Sanfeliu, A., editors, Pattern Recognition and Image Analysis: Proceedings of the VII Symposium Nacional de Reconocimiento de Formas y Análisis de Imágenes, volume 3, pages 15–16, Bilbao (Spain). [ bib | .ps.gz | .pdf ]

This paper describes a geometric approach to the dificult off-line cursive handwritten word recognition problem. The method extracts and classifies feature trees from isolated handwitten words, mesasuring the distance between two trees.

Rico-Juan, J. R. (1999a). Esquemas Algorítmicos. Publicaciones de la Universidad de Alicante. [ bib ]

Este libro describe dos esquemas de programación: programación dináica y ramificación y poda. La descripción se hace desde un punto de vista general extrayendo característcas representativas y generando un esquema que luego se aplicará a casos concretos. Contiene gran variadad de ejemplos y ejerccios resueltos


This file was generated by bibtex2html 1.99.