Materials
Dataset
As shown in Table 1, the dataset utilized in this study comprises five classes obtained through various IoT device-targeting attacks. These attacks include DDoS, fuzzing, OS fingerprinting, port scanning, and DoS, resulting in a total of six classes when including normal instances. The percentage shares of each class and their size distributions are provided in megabytes34. Additional information about the dataset is presented in Tables 2 and 3, along with descriptions of each attack type:
Table 2 Categorization of datasets and their size and percentage distribution.Table 3 The features of the dataset and their description.
Denial of Service (DoS): DoS attacks were conducted using Hping3, launched by a malevolent host against the server or an IoT device using false IP addresses and other methods. Fake IP addresses increase resource usage as each attack packet initiates a new flow regulation. Various payload sizes (100, 500, and 1000 bytes) and packet transmission rates (6000, 8000, 10,000, and 5000 bytes) were combined.
Distributed DoS (DDoS): DDoS attacks, prevalent in IoT networks facilitating SDN, deplete system resources, reducing system availability. These attacks utilize communication protocols employed by authorized users to deplete network computational resources, subjected to the same conditions as DoS.
Port scanning: Attackers employ Nmap software for port scanning, with a rogue host initiating the attack against IoT devices or servers, scanning all port numbers from 0 to 65,535.
OS fingerprinting: OS fingerprinting attacks also utilize Nmap, scanning for open ports at the beginning of the attack and then launching the attack using these ports, employing one rogue host to target the server.
Fuzzing attacks: Boofuzz program is used for fuzzing attacks, targeting victim weaknesses with random data until failure. HTTP and FTP-based attacks were launched using one compromised host, with randomly generated input fields aware of the expected input format for both FTP and HTTP connections. For example, HTTP version details and random request URLs were fuzzed using connect, options, trace, put, delete, and head methods.
Data pre-processing
An initial step in the current study was to analyze and remove inconsistent data that could have caused the learning algorithms to converge more slowly, ensuring that the data fed into the machine learning models is of high enough quality. Two techniques were employed: the first utilized linear regression for smoothing and involved the removal of lower and upper extreme values35.
Eliminating discrepant data
The theory behind the process of removing inconsistent values is that extreme values are often indicative of misinterpreted data. Such outliers commonly stem from issues with data acquisition, malfunctioning sensors, or communication interference. Consequently, erroneous samples can find their way into the dataset, such as zero readings when the machine is idle or readings that exceed the predicted sensor ranges. To prevent these samples from impeding machine learning processes, they are typically eliminated.
In the current study, the limits for each variable were established, and samples falling outside the defined boundaries were replaced with values within the acceptable range. The boundaries were determined using the equation below:
$${Q}_\frac{1}{4}=\frac{1}{4}\left(n+1\right)$$
(1)
$${Q}_\frac{3}{4}=\frac{3}{4}\left(n+1\right)$$
(2)
$$IQR={Q}_\frac{1}{4}-{Q}_\frac{3}{4}$$
(3)
$${{\text{Down}} }_{{\text{limit}} }={Q}_\frac{1}{4}-K\times IQR$$
(4)
$$U{p}_{{\text{limit}} }={Q}_\frac{3}{4}+K\times IQR$$
(5)
The variable’s acceptable lower limit is known as the “down limit,” which is determined by deducting the IQR multiplied by the constant k to \({Q}_\frac{1}{4}\). The variable’s upper limit, or “up limit,” is determined by multiplying the constant “k” by the IQR to \({Q}_\frac{3}{4}\), wherein k is the limits’ variation constant. It computes the limits for every variable. Examples of data points with values outside of the range \(\left[Dow{n}_{{\text{limit}} },U{p}_{\text{limit}}\right]\) are substituted with the average.
Data smoothing
Cleveland invented Localised weighted/estimated scatterplot smoothing, or LOWESS/LOESS, is a nonparametric regression method. Using robust locally weighted regression, variables can be made smoother, \(\left({x}_{i},{y}_{i}\right),i=1,\cdots ,n\), where the fitted value is at \({z}_{k}\) is the result of applying weighted least squares weight for the polynomial that is fitted to the data \(\left({x}_{i},{y}_{i}\right)\) is high if \({x}_{i}\) is close to \({x}_{k}\) and little in case it’s not. The quantity of samples \((n)\) every local approximation that is utilised \(\left({z}_{k}\right)\) is the model’s parameter. Another model parameter is the polynomial function’s degree. Since the polynomial degree is frequently 1, a linear regression is carried out.
Methods
ZOA for hyper parameter tuning
This introduces and describes mathematically the nature-inspired optimisation technique (ZOA) that is applied to choose the best features.
1.
Idea and Concept
Equine species indigenous to eastern and southern Africa are called zebras. These animals are widely recognized for their distinctive stripes of black and white fur. The stripes on zebras’ bodies and necks, usually arranged vertically, serve two functions: they camouflage the animals from potential predators and deter biting flies from feeding on them. The conditions and corresponding specifications for zebras are as follows: their bodies range in length from 2.1 to 3 m, their tails from 0.41 to 0.81 m, the height of their shoulders from 1.1 to 1.6 m, and they weigh between 175 and 450 kg. Despite their large size and weight, zebras can sprint quickly, when necessary, thanks to their remarkably long and slender legs. Zebras, being related to rambunctious equids, have long necks, only one toe on each foot, and a head shape that facilitates grazing on grass from the ground36.
Foraging and defending against attackers are two behaviors crucial to zebras’ social lives in the wild. The zebra leader guides the rest of the pack in their search for food, enabling them to approach food sources more efficiently. Consequently, the pack follows the lead of this pioneering zebra as the herd migrates across the savanna37.
The zigzag pattern that zebras use to flee serves as their first line of defense against predators. However, on rare occasions, they may group together in an attempt to intimidate or confuse the predator. The two aforementioned clever behavioral patterns of zebras serve as a major inspiration for the proposed ZOA architecture’s mathematical models.
2.
Initialization
Zebras are a key component of the population utilized in the population-based ZOA approach. From a mathematical perspective, each zebra represents a potential solution to the problem, and the habitat of zebras serves as a representation of the search space for the problem.
The position of each zebra within the decision factors’ values is determined by the search space. Consequently, each zebra, as a distinct entity within the zebra optimizer, can be represented by a vector. This vector, constituting a component of the problem, comprises members representing the values of these variables. Viewing the vector in its entirety, the zebra optimizer can be comprehended. A matrix can serve as the data source for the population of zebras represented mathematically. The initial location of the zebras within the search area is determined through a completely random process. The qualifying parameters for the ZOA population matrix are delineated in Eq. (6).
$$\begin{array}{*{20}{c}}
{P = {{\left[ {\begin{array}{*{20}{c}}
{{P_1}}\\
\vdots \\
{{P_i}}\\
\vdots \\
{{P_N}}
\end{array}} \right]}_{N \times m}} = {{\left[ {\begin{array}{*{20}{c}}
{{p_{{{1,1}}}}}& \cdots &{{p_{1,j}}}& \cdots &{{p_{1,m}}}\\
\vdots & \ddots & \vdots & \cdot & \vdots \\
{{p_{i,1}}}& \cdots &{{p_{i,j}}}& \cdots &{{p_{i,m}}}\\
\vdots &{{\rm{ }} {\mathinner{\mkern2mu\raise1pt\hbox{.}\mkern2mu
\raise4pt\hbox{.}\mkern2mu\raise7pt\hbox{.}\mkern1mu}} }& \vdots & \ddots & \vdots \\
{{p_{N,1}}}& \cdots &{{p_{N,j}}}& \cdots &{{p_{N,m}}}
\end{array}} \right]}_{N \times m}}}
\end{array}
$$
(6)
wherein \(P\) represents the zebra population, \({P}_{i}\) indicates the \(i\) th zebra candidate, \({p}_{i,j}\) represents the \(j\)th problem variable that the \(i\)th zebra candidate proposes, \(N\) represents the quantity of variables that need to be adjusted, and m stands for the number of search agents. Every zebra represents a possible solution to the optimisation problem. As a result, by contrasting the recommended solutions from each zebra, we may evaluate the fitness function. Equation (7), which describes the fitness function values, can be used to.
$$\begin{array}{c}F={\left[\begin{array}{c}{F}_{1}\\ \vdots \\ {F}_{i}\\ \vdots \\ {F}_{N}\end{array}\right]}_{N\times 1}={\left[\begin{array}{c}F\left({P}_{1}\right)\\ \vdots \\ F\left({P}_{i}\right)\\ \vdots \\ F\left({P}_{N}\right)\end{array}\right]}_{N\times 1}\end{array}$$
(7)
where \(F\) symbolises a column vector with the fitness function candidates in it, and \({F}_{i}\) is the figure assigned to the fitness function for the ith zebra. One can accurately evaluate the standard of the potential solutions that align with the current issue and evaluate the candidates that are found for the function of fitness to ascertain which viable solution is the best. The zebra with the lowest fitness function value is the best contender for solving minimization-related problems. Every time an iteration is performed, the optimal solution must be determined since the zebras’ positions and, consequently, the fitness function’s values vary.
Two of the zebras’ natural behaviours have been used to keep members of the zebra optimizer current throughout each process iteration. These pursuits are:
i.
Foraging activity.
ii.
Defensive strategies against predators.
3.
Stage I: Foraging activity
Applying zebra activity theories when foraging, individuals in the population are updated in the first stage. Zebras mostly eat grasses and sedges, but they will also eat buds, fruits, bark, roots, and leaves when these resources are in short supply. Zebras are able to spend between sixty and eighty percent of their time feeding, depending on the type and amount of vegetation. For animals that require shorter, more nutrient-dense grasses, the plains zebra serves as a leader grazer, clearing a canopy of taller, less nutrient-dense grasses for them to eat. The person who is deemed to be the most competent member of the population in a Zebra optimizer is referred to as the “zebra leader,” and it is his responsibility to persuade other group members to work alongside him in the lab. The location updates of the zebras during the foraging season can be simulated by using Eqs. (8) and (9).
$${p}_{i,j}^{\text{new },S1} ={p}_{i,j}+r\cdot \left(Z{L}_{j}-I\cdot {p}_{i,j}\right)$$
(8)
$${P}_{i} =\left\{\begin{array}{ll}{P}_{i}^{\text{new },S1},& {F}_{i}^{\text{new },S1}<{F}_{i}\\ {P}_{i},& \text{ else}\end{array}\right.$$
(9)
where \({P}_{i}^{\text{new , }}{ }^{S1}\) shows the \(i\)th Zebra’s updates based on the first stage, \({p}_{i,j}^{\text{new },S1}\) denotes its \(j\) th dimension value, \({F}_{i}^{\text{new },S1}\) depicts its role in fitness, \(ZL\) symbolises the zebra leader, or the most exceptional person, \(Z{L}_{j}\) denotes its \(j\) th dimension, \(r\) shows a random value between 0 and 1, \(I=\) round \((1+\) rand\()\), where a 0-to-1 random number is placed is represented by the rand. Therefore, I can have a value of one or two. When the value is two, there are noticeably more variations in population mobility.
4.
Stage II: Anti-predators’ defensive techniques
At this point, the search space’s positions of ZOA population individuals are updated by mimicking the zebras’ defensive strategies against intruders. One could argue that lions are the main predators of zebras. As zebras approach water, they run the risk of becoming crocodile prey. Zebras defend themselves differently from different kinds of predators. When a lion attacks, a zebra’s best defence is to run away, making full speed, abrupt turns, and running in a zigzag pattern. Zebras become more aggressive in response to ambush by solitary, hyena- and dog-led small predators that confuse and terrify their prey. Every subsection that follows is assumed to have an equal probability of occurring inside the ZOA approach’s framework.
Exploitation (defensive technique against lion)
This strategy helps zebras escape their current location when they are attacked by lions so they can stay out of harm’s way. As a result, the lions are unable to devour the zebras. Because of this, Eq. (10) can be used to mathematically represent this method.
$$\begin{array}{c}{p}_{i,j}^{\text{new },S2}={p}_{i,j}+R\cdot (2r-1)\cdot \left(1-\frac{t}{T}\right)\cdot {p}_{i,j}, {P}_{s}\le 0.5\end{array}$$
(10)
where \({p}_{i,j}^{\text{new, }S2}\) indicates the second stage’s \(j\)th dimension value of the \(i\) th zebra, t the iteration that is currently underway, T the highest quantity of repetitions, R is the fixed amount of 0.01 and \({P}_{S}\) is the likelihood of choosing this course of action, which is randomly assigned and ranges from 0 to unity.
Exploration (defensive techniques against other predators)
When a hungry animal attack one of the zebras in the group, the others approach it and try to create a barrier of defence to frighten and confuse the attacker. Equation (11) serves as the mathematical representation of this zebra technique. Updated crowd positions allow the updated location of a zebra to be accepted if it improves the fitness function’s outcome. This updating criterion can be represented by Eq. (12).
$${p}_{i,j}^{\text{new },S2} ={p}_{i,j}+r\cdot \left(A{Z}_{j}-I\cdot {p}_{i,j}\right),{P}_{S}>0.5$$
(11)
$${P}_{i} =\left\{\begin{array}{ll}{P}_{i}^{\text{new },S2},& {F}_{i}^{\text{new },S2}<{F}_{i}\\ {P}_{i},& \text{ else}\end{array}\right.$$
(12)
where \({P}_{i}^{\text{new },S2}\) represents the \(i\)th Zebra’s updates for the second stage, \({F}_{i}^{\text{new },S2}\) shows the value of its fitness function, \(AZ\) symbolises the condition of the zebra that was attacked, and \(A{Z}_{j}\) indicates its \(j\)th dimension value. The ZOA’s pseudocode is explicable in Algorithm 1.
Algorithm 1
Source link