Gait Classification Using Machine Learning for Foot Disseises Diagnosis

. Person recognition systems based on biometrics have recently attracted a lot of attention in the scientific community. It’s an ever-evolving technology that aspires to do biometric recognition automatically, rapidly, precisely, and consistently. In recent decades, gait recognition has emerged as a type of biometric identification that focuses on recognizing individuals using personal measures and correlations, such as trunk and limb size, as well as space-time information linked to intrinsic patterns in individuals’ motions. Lower-limb surgery is one of the leading causes of loss of autonomy in patients. An improved rehabilitation process is a vital aspect for care facilities since it improves both the patient’s quality of life and the associated costs of the post-surgery procedure. Proper progress monitoring is critical to the success of a rehabilitation program. In this paper, we employed machine learning methods as classifiers to classify foot diseases and then monitor the progress in the patient case. Five classifiers were utilized to train and test the EMG dataset in the lower limb. These classifiers are K-Nearest Neighbours (KNN), Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), and Stochastic Gradient Descent (SGD). The experimental results show high accuracy reaching 99% in both KNN and RF classifiers and 97% in the DT classifier. The fundamental benefit of the suggested procedures is their high estimation accuracy, which leads to better therapeutic results.


Introduction
At the moment, fingerprint recognition method, Iris recognition method, face recognition method, DNA identification method, and other well-known biometric recognition methods are examples [1]. In comparison to other biometric recognition methods that have been employed in the market, the gait recognition method is still in its early phases and has a bigger research space [2]. Furthermore, gait recognition systems have some advantages over other biometric identification models, because compromising such designs is difficult [3]. The challenge stems mostly from the concept's inherent properties, namely, identification based on the silhouette and its mobility, which is particularly difficult to reproduce [4]. The same cannot be said for other strategies, such as disguising one's face from the system [5]. Furthermore, unlike iris and fingerprint recognition models, gait recognition models do not require highresolution photographs or specialized equipment to be successful [6]. Furthermore, while other to increase estimation accuracy. For lower limb muscles, their trials have a high estimation accuracy of 98.67%.
A tool for comparing the results of surface electromyography in healthy people and people with lower-limb pathologies was proposed by E. Meza P, M. Trujillo, and A. Acosta [18] where they employed a Support Vector Machine (SVM) to categorize electromyography signals because models are robust to overfitting. The EMG Dataset from UCI Machine Learning was used for the lower limb research. Preprocessing, feature extraction, training, and validation were the four stages of the analysis. The Support Vector Machine model obtains a high accuracy of 96.7% based on results acquired by algorithms with different kernels.
Y. Choo et al. [19] looked examined the viability of machine learning approaches. In a retrospective analysis, they recruited 474 stroke patients. The following demographic and clinical data were gathered as input data at the time of transfer to the rehabilitation centre (16.20 and 6.02 days) and six months after stroke onset. The Deep Neural Network (DNN) model obtained an AUC of 0.887 for the area under the curve. The AUCs for the logistic regression and random forest models were 0.855 and 0.845, respectively. The outcomes imply that machine learning technologies, specifically DNN, can anticipate the requirement for AFO in stroke patients during recovery.
Then, to determine medication status, S. Aich et al. [20] introduced a system that relies on wearable gait data to do so."Random Forest, Support Vector Machine, K-Nearest Neighbour, and Naïve Bayes classifiers are used to process a combination of statistical and spatiotemporal gait data. The performance of the suggested approach in conjunction with the four aforementioned classifiers was evaluated on a total of 20 Parkinson's disease patients with definite motor fluctuations."With an accuracy of 96.72% of Random Forest classifier which is surpassed other classifiers.
Researchers conducted a study to see how well-supervised machine learning classifiers functioned in identifying sagittal gait patterns in CP children with spastic diplegia. Y. Zhang and Y. Ma [21], used 200 children with spastic diplegia CP to create gait parameters and characterize the major kinematic aspects of each child's gait. Seven supervised machine learning methods were used to classify gait, these are Naïve Bayes (NB), Neural Network (ANN), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbours (KNN), Discriminant Analysis, and Support Vector Machine (SVM). According to the findings, the ANN technique has the highest prediction accuracy 93.5%, followed by the DT, SVM, and RF methods all of which have an accuracy greater than 77.9%.
By utilizing a foot-pressure database gathered utilizing the GAITRite walkway system, G. Guodong et al., [22] used computer vision technology to categorize pathology-related variations in gait in young children. They also looked into the possibilities of estimating age based on foot placement because it changes with children's development. The GAITRite data can be utilized to classify normal and problematic gaits, according to the findings."Support Vector Machine (SVM) and Random Forest (RF) classifiers were employed to categorize gait". SVM's accuracy was 94.36%, whereas RF's accuracy was 97.50%, according to the findings.

Research Method
This study uses machine learning techniques to automatically classify Foot Drop Rehabilitation. K-Nearest Neighbours (KNN) Random Forest (RF), Decision Tree (DT), Logistic Regression (LR), and Stochastic Gradient Descent (SGD) were utilized as machine learning techniques to attain this goal. As a result, machine learning-based classifiers will be used to evaluate input signals in order to determine which is the best and most significant for  Fig. (1) and Algorithm (1) are diagrams of the proposed system. The proposed system consists of several phases, which we will talk about in detail later. These phases are pre-processing, which includes six types of tools that are used to extract features. Then comes the stage of reducing the features obtained in the previous step. Finally, the data is classified using intelligent classification algorithms that rely on machine learning.

Dataset Description
The signals in this database come from 11 patients with knee anomalies and 11 people who were previously classified as normal. They assess lower-limb activity through three actions: walking, leg extension from a seated posture, and flexing of the leg up. The following website is where you may get this dataset: http://archive.ics.uci.edu/ml/datasets/emg+dataset+in+lower+limb#. 3.1.1. Protocol 11 male subjects had a variety of knee issues that had previously been identified by a physician. They assess the activation of the knee muscle through three activities: walking, leg extension from a sitting position, and leg flexion. Data was collected using a goniometer in the knee and four electrodes (Vastus Medialis, semitendinosus, biceps femoris, and rectus femoris).

Instrumentation
These data were collected using Datalog equipment MWX8 by Biometrics with 8 digital channels and 4 analog channels, of which 4 were used for sampling SEMG and 1 for goniometry, and transmitted in Real-time Datalog software via Bluetooth adapter, 14-bit resolution, and 1000Hz sampling frequency, using Datalog equipment MWX8 by Biometrics with 8 digital channels and 4 analog channels, of which 4 were used for sampling SEMG and 1 for goniometry, and transmitted in real-time.

Data Configuration
The overall amount of electrodes is four, one for each channel in the time series (1 to 4). Each subject has five shares or motion repeats in each series.

Pre-processing Phase
Raw data must be turned into well-formed data sets before data mining technologies can be used. It's typical for raw data to be incomplete and structured wrongly.

Apply Bandpass Filter
Bandpass filters are named for the centre or peak wavelength they transmit and will block longer and shorter wavelengths, resulting in increased contrast and better control over variations in ambient illumination conditions that may occur over time. Most machine vision applications require "wide" Bandpass Filters.

Features Extraction Phase
This phase consists of six types of features extraction as follows:

Maximum and Minimum Features Extraction
By preserving the largest feature variance and the lowest reconstruction error, the dominant features preserve the majority of the information.

Average Features Extraction
This means the average value of the data set which calculate as shown in Eq. (1):

Mood Features Extraction
The most often occurring number in the data set, Mode is always the number from the data set.

Variance Features Extraction
Measures how far the data set's values deviate from the mean on average The population variance is the average of the squared deviations. Eq. (2) explain the calculation of variance as follows: (2)

Standard Division Features Extraction
It is a square root of the variance as shown in Eq. (3)

Features Reduction
Feature reduction or dimensionality reduction is the process of reducing the number of features in a resource-intensive calculation without losing crucial information. When the number of features is reduced, the number of variables decreases, making the computer's duty easier and faster. Feature reduction is made up of two operations: feature selection and feature extraction. Feature reduction can be performed using a variety of methods. The methods were used in this system are:

Principal Component Analysis (PCA)
PCA is used to decrease dimension vectors in order to improve recognition. PCA is a powerful statistical technique for reducing characteristics. It is frequently used within feature extraction to reduce raw data dimensionality to low-dimensional orthogonal features while retaining information about significant features and process variable correlation patterns [23].

Singular Value Decomposition (SVD)
The SVD approach determines and sorts the dimensions along which data points fluctuate the most. This corresponds to the third method to look at SVD: once we've discovered where the most variation exists, we can acquire the best approximation of the original data points with fewer dimensions. As a result, SVD can be considered an approach for data minimization [24].

Data splitting
It is the process of separating given data into two halves, generally for the purpose of crossvalidation. The first part of data is utilized to build a prediction model, while the second is utilized to evaluate the model's outcomes. One of the strategies for guaranteeing proper generalization and avoiding overtraining is to use cross-validation techniques. The primary concept is to divide data set T into two sub groups: one for training and the other for testing the final model outcomes. The purpose of cross-major validation is to get a consistent and trustworthy approximation of the model's outputs.

Classification Phase
The stage of data classification using machine learning is the most crucial in the process. This section explains the five classifiers employed in this work, which are as follows:

K-Nearest Neighbours (KNN) Classifier
The k-NN classification algorithm employs training data whenever a fresh sample needs to be classified. The training samples are vectors in a multidimensional feature space, each with a class name, and the classifier will decide the new sample's class by looking for a number of neighbours. As indicated in Fig. (2), a constant called k will determine the number of neighbours' samples. Consider the case below: the classifier will look at the five points nearest to the new instance (the dark dot) and create a new instance by allocating the label of the most popular class among the k training samples nearest to that inquiry point, which in this case is blue because the five closest neighbours have more blue labelled samples than red labelled samples. This classifier is capable of handling both small and large datasets. It also has a strong tolerance for noisy training data. The first stage is to choose the optimal k value, followed by a distance computation technique. The most common k values are 1, 3, 5, 7, and so on [25].

Figure 2. KNN Classifier [25]
Eq. (4) explains that the Euclidean Distance measurement method was used in this work. Where ( ) represents a new instance, ( ) represents an old instance, and q represents the number of roots.

Random Forest (RF) Classifier
It is a versatile learning model that can address both regression and classification problems. During the training phase, it constructs numerous "Decision Trees" and produces average forecasts from all of them. The aim variable in the regression is continuous, while it is categorical in classification-related concerns. Random Forest is a data analysis algorithm with a good accuracy score. The goal is to discover an f(X) function that can predict Y. The prediction function, L (Y, f (X)), in Eq. (5) is determined to minimize the expected loss value [26]: = L (Yˎ f(X)) (5)

Decision Tree (DT) Classifier
Decision trees are a powerful tool that may be applied to a wide range of applications, including machine learning, image processing, and pattern detection. DT is a sequential model that connects a group of fundamental tests in which each test compares a numerical attribute to a threshold value in an effective and consistent manner. The numerical weights in a neural network of node connections are far more complex to build than the conceptual ideas. Because of its ease of analysis and accuracy across a large range of data types, decision trees have a broad range of applications. A basic decision tree's construction, for example, is shown in Fig.  3. The core of the problem was discovered to be pressure. Normal, FM1, and FM2 leaf nodes are the three types of leaf nodes. Clearly, the categorization rules listed below may be defined [27]:

Logistic Regression (LR) Classifier
The method of modeling logistic regression is the process of calculating the likelihood of a ISSN: 2668-778X www.techniumscience.com discrete outcome given an input variable. A binary result, such as true/false, yes/no, and so on, is the most common logistic regression model. Multinomial logistic regression can be utilized to model events having more than two discrete outcomes. In classification jobs, logistic regression is a useful analysis method for assessing if a new sample fits best into a category. Because components like threat detection are classification issues, logical regression is a helpful analytic technique in cyber security [28].

Stochastic Gradient Descent (SGD) Classifier
it is the most commonly utilized method for training data (SGD). Training on distributed systems is prevalent due to the longer training times generated by larger networks and datasets, and distributed SGD versions, particularly asynchronous and synchronous SGD, are often utilized. In terms of communication, asynchronous SGD is effective, but its accuracy suffers as a result of delayed parameter updating. Regardless of its benefits, as the number of nodes grows, synchronous SGD becomes more communication-intensive [29].

4.
Proposed System Implementation The following are the two phases of the system's implementation:

Training
The suggested method's initial stage is to train the dataset using Holdout splitting, in which the bulk of data (70%) goes into the training phase and the remaining data (30%) goes into the testing phase.

Testing
The suggested system's testing phase is the second stage. The remaining data (30%) will be handled in the same way as the training data, as previously stated.

Performance Metrics
The performance of the recommended machine learning model may be measured using a variety of criteria. The following are the metrics [

Precision
The percentage of all samples expected to be from class I that is really from class I is calculated as follows: When class data is unbalanced, accuracy alone isn't always sufficient to assess the model's effectiveness. If the model predicts everything as class 0, and there are 99 cases of class 0 and 1 example of class 1, the accuracy is 99%. but when precision is taken into consideration, the model performs badly. The precision of class 0 will be zero in this case.

Recall
Sensitivity is another name for it. The proportion of all samples predicted to be class I that is really class I This is how it's defined:  (8) As a result, the prior example's class 0 will also have zero recall. The goal of our model is to maximize both precision and recall.

5.4.
F-score It's a mix of recollection and accuracy. The harmonic mean is what it's called. This is how it is defined:

Results and Discussion
In this section, the results are obtained through the use of machine learning classifiers to classify the medical data that were downloaded from the internet. These data include explained in detail as mentioned above in section 3.1. The work aims to help diagnose the disease and then find solutions in a better and faster way. To rehabilitate the leg in many pathological cases, and achieve this, five classifiers were used, and the results as shown in Table (1) and Fig. (4) the classifiers KNN, RF, DT, LR, and SGD. The results refer to the superiority of KNN and RF classifiers in classifying gait and determining the healthy person from the patient, the accuracy of these two classifiers reached 99%. The accuracy of the DT classifier has reached 97%. While the accuracy of LR and SGD was 82% the lowest accuracy among the system classifiers. The reason for obtaining high accuracy is due to the methods of feature extraction and features reduction which were explained in detail in the above sections. This process leads to the speed of the system in classification in addition to the high accuracy

Figure (4). Evaluation Metrics of the Classifiers
Now a comparison will be made between the research results and the results of previous studies that also relied on machine learning algorithms as classifiers. When compared to earlier research, it is obvious that the suggested technique produced superior results, and when compared to [18], the proposed method worked on the same set of data. It had

Conclusions
The goal of this work was to create and evaluate an automated gait analysis system that employed lower-body motion data and machine learning techniques to differentiate between healthy and sick patients. The detection and diagnosis of foot disseises are extremely accurate because of the employment of five machine learning algorithms as classifiers. The findings show the high accuracy of machine learning classifiers, where K-Nearest Neighbours (KNN) and Random Forests (RF) give an accuracy rate equal to 99% while the Decision Tree (DT) classifier gives accuracy reached 97%, both Logistic Regression (LR) and Stochastic Gradient Descent (SGD) classifiers were shared the same accuracy rate which equal to 82% with a time not exceeding the seconds.