An,optimal,guidance,method,for,free-time,orbital,pursuit-evasion,game

發布時間：2025-06-14 11:36:24 來源：心得體會點擊：

小中大

字號：

手機查看

ZHANG Chengming,ZHU Yanwei,YANG Leping,and ZENG Xin

College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410000,China

Abstract:With the development of space rendezvous and proximity operations (RPO) in recent years,the scenarios with noncooperative spacecraft are attracting the attention of more and more researchers.A method based on the costate normalization technique and deep neural networks is presented to generate the optimal guidance law for free-time orbital pursuit-evasion game.Firstly,the 24-dimensional problem given by differential game theory is transformed into a three-parameter optimization problem through the dimension-reduction method which guarantees the uniqueness of solution for the specific scenario.Secondly,a close-loop interactive mechanism involving feedback is introduced to deep neural networks for generating precise initial solution.Thus the optimal guidance law is obtained efficiently and stably with the application of optimization algorithm initialed by the deep neural networks.Finally,the results of the comparison with another two methods and Monte Carlo simulation demonstrate the efficiency and robustness of the proposed optimal guidance method.

Keywords:orbital pursuit-evasion,differential game,dimensionreduction,deep neural networks.

The orbital pursuit-evasion game (OPEG) is a hot problem in the fields including non-cooperative spacecraft.In OPEG,where one is called the pursuer and the other is called the evader,both spacecraft have the ability to maneuver and make decisions.The objectives of two spacecraft are opposite completely which vary with the missions such as the fuel consumption,the overall time or other multiple objectives.Generally,the OPEG can be seen as a continuous dynamic adversarial problem with bilateral control,and can result in a two point boundary value problem (TPBVP) [1].Comparing to the rendezvous with cooperative targets,solving this problem needs to consider the state and control strategy of both spacecraft which increases the problem dimension and decreases the efficiency of generating the optimal guidance law.In this paper,a method is applied to reduce the dimension of problem and improve the accuracy of the initial guess to solve OPEG more efficiently and stably.

The theoretical basis of OPEG is the differential game theory pioneered first by Isaacs [2].Thereafter,differential game theory had been widely applied and studied in the air combat [3] and missile intercepting problems [4].As for the orbital pursuer-evasion games,Pontani and Conway [5] transformed the game into a nonlinear programming problem and presented a semi-direct collocation method based on the analytical necessary conditions.However,the equivalency between the proposed nonlinear programming problem and the primary problem is unresolved [6].In the current studies,Ye et al.[7] studied the influence of the thrust configurations of two spacecraft and got the conclusion that the open-loop control law could fail in some specific initial states.Some studies focus on the improvement of computation efficiency of solving OPEG.Jagat and Sinclair [8] applied the state-dependent Riccati equation method to extend the standard linear-quadradic differential game theory and obtained the near-optimal nonlinear optimal control law.The simulation results demonstrated that the efficacy of this method was superior to that of linear control law.Hafer et al.[9] proposed a sensitivity analysis technique to get the state sensitivity matrix and the constraint sensitivity matrix,then used the homotopy method which began with the solution of the gravity-less problem to get the solution more efficiently than the genetic algorithm(GA).However,this method was not successful in some cases due to the singularity behavior.Stupik et al.[10]used the particle swarm optimization (PSO) algorithm to get the solution which requires no initial guess and no gradient information,but the calculation cost is up to several hours which is unacceptable.To improve the performance of PSO,a software named Kriging constituted by a group of interpolation and extrapolation techniques has been introduced.However,a lot of optimal trajectories near the primary state with random perturbations were necessary and only the coplanar cases had been demonstrated in [10].In terms of the dimension reduction,Li et al.[11] presented a dimension-reduction method to formulate a set of four-dimensional nonlinear equations and got the solution based on the differential evolution (DE)algorithm and Newton’s method more efficiently.In this method,the solution of four dimensions is not unique for a given initial condition which expands the solution space unnecessarily and the initial guess are provided by DE which is random and inefficient.In this paper,a threeparemeter optimization problem (3POP) is formulated to reconstruct the free-time OPEG and reduce the dimension based on the normalization of costates which restricts the solution space of costates to a unit spherical surface and supplies the access of the application of deep neural networks (DNNs).

In addition to the reconstruction of OPEG,this paper also pays attention to the initial guess generation for the numerical methods like that in [10].When the initial guess is generated very close to the exact solution in the algorithm,the optimization process will be much more efficiently.Using DNN to supply initial guess for trajectory optimization problems in aerospace dynamics field has been studied preliminarily in recent years.Yin et al.[12] adopted the technique of normalization in the optimal control problem which reduced the solution space and applied the homotopy technique to overcome the nonsmoothness and discontinuous.Cheng et al.[13]introduced the DNNs to generate good initial guess for the indirect method in generating optimal asteroid landing trajectories where the solution continuation process reconstructed the primary problem and enabled the realtime control.With regard to the research about OPEG,Wu et al.[14] applied the DNNs to fit the relationship between the real time state and control variables which were the outputs of DNNs directly without the guarantee of convergence,and the cumulative errors increased with the time going.On the whole,using DNNs to get the control variables directly in this field is really difficult and unreliable.In this paper,the DNNs are introduced to a close-loop interactive mechanism involving feedback to generate precise initial guess for 3POP and the optimization algorithm is used furthermore to get the exact solution whose solution is much more reliable.

The remaining parts of this paper are organized as follows.The original free-time OPEG is first modeled in Section 2 based on the differential game and then the algorithm dimension-reduction and DNNs (DRD) is presented in Section 3 in detail which consists of the model reconstruction,the initial guess generation and the exact solution.In Section 4,the numerical simulations in different cases and the comparison among different methods are displayed.In addition,the results of Monte Carlo simulation demonstrate the robustness of the method.Finally,the conclusion is placed in Section 5.

In this section,the dynamics of spacecraft in the free-time orbital pursuit-evasion game is introduced firstly.Then based on the differential game theory,the free-time OPEG problem between two spacecraft is established and transformed into a 24-dimensional TPBVP.The necessary conditions for solving will be presented at last.

2.1 Dynamics model of relative motion

In the free-time orbital pursuit-evasion game,two spacecraft are assumed to orbit close to a reference spacecraft which is on a circular orbit.Instead of the inertial coordinate,the local vertical local horizontal (LVLH) coordinate system is introduced to describe the state of two spacecraft which is anchored to the reference spacecraft.In Fig.1 and Fig.2,the LVLH coordinate system is plotted by red lines.The origin of LVLH is the reference spacecraft which is moving on the circular orbit.xis in the direction from the center of the Earth to the reference spacecraft,zis the norm vector of the reference orbit plane,andyis determined by the right-hand rule.

Fig.2 Thrust and control variables

The dynamics of the spacecraft in the LVLH coordinate system is denoted as C-W equation in (1).

wherenis the mean angular velocity of the reference orbit and (ax,ay,az)Tis the acceleration along the coordinates.

Considering the position vector and the velocity vector as the state of spacecraft,(1) can be rewritten [6] as follows:

wherex=(rx,ry,rz,vx,vy,vz)Tis denoted as the state of the spacecraft andu=(ax,ay,az)Tis the acceleration along the coordinates.Ais the coefficient matrix corresponding to the reference orbit andBis the constant control matrix.The thrust of each spacecraft is assumed to be low,continuous and constant in the game and the mass of each spacecraft is considered as a constant.Actually,the mass of each spacecraft will decrease as time goes by,but the mass changes influence the solution little based on the results in [15].

In (3),Tis the thrust-mass ratio of each spacecraft.αandβare two angles to control the direction of the thrust as shown in Fig.2,whereα∈ [-π/2,π/2] andβ∈ [-π,π].

2.2 Two-player free-time pursuit-evasion game

Based on the theory of the differential game,generally,the dynamics system of two players are described as

where the subscriptpandeindicate the pursuer and the evader respectively.xi(i=p,e) is the state vector of the spacecraft anduiis the control variable of two players.In this paper,(2) is the specific form of (4) and (5).

The initial state and the terminal state can be described asxi(t0) andxi(tf),wheret0is the initial time of the game andtfis the terminal time.For the generic free-time OPEG,the initial state is given by the problem,but the terminal time and state are unknown.However,the terminal constraint is straightforward which is that when the pursuer intercepts the evader successfully,the game ends immediately.This terminal constraint is necessary in this paper.If the pursuer fails,there will be no saddle-point solution for two players.Then the pursuit-evasion barrier needs to be analysed furthermore,which is not the concern in this paper.Then the terminal constraint is as follows:

In this game model,the pursuer and the evader have the completely opposite objectives.For the pursuer,it aims to minimize the time of pursuing and the evader aims to maximize this time.With the assumption that the propellant is sufficient,the terminal timetfis considered as the payoff functionJ.

According to [16],the necessary conditions are introduced.Firstly the Hamiltonian functionHand the generalized terminal condition φ are defined as

whereλi(i=p,e) is the costate of the player.The costate is physical meaningless and has no boundary conditions with the same dimension ofxi(i=p,e).τis the Lagrange multipliers.According to (9),the state function and the costate function are

Equations (9)-(16) transform the free-time orbital pursuit-evasion game into a TPBVP where the initial condition is given by the specific problem.

The dimension of TPBVP in Section 2 is 24 due to the dimension of state and costate of two players.Several numerical methods have been applied to search for the solutions.However,the 24-dimensional game model is too complex for the numerical methods to search for the accurate solutions efficiently.In [11],considering the relationship betweenλpandλe,this model is transformed into a four-dimensional problem.However,the solutions are not unique and the random initial guess by GA increases the calculation consumption a lot.Hence,an efficient and reliable approach named DRD for solving free-time OPEG is introduced in this paper and the technique details will be introduced.

Firstly,a 3POP is presented to reconstruct the game model where the costate normalization technique is applied [17].Thus the solution of the specific scenario will be unique,which provides the access of application of DNN.Then the DNNs are applied to generate high precision initial guess for 3POP based on the interactive structure.Finally,the DRD is presented with both two methods and the usage of the optimization algorithm in NLopt.

3.1 Model reconstruction

The necessary conditions for solving TPBVP have been given in Section 2.From (15),the value ofλpandλeat the terminal time are

whereτ1,τ2,τ3are three Lagrange multipliers.They are unknown,and have no boundaries or physical sense.These three characteristics are the main difficulties for numerical algorithms.

Furthermore,the differential equations of costate in(12) are linear and independent of state.Thus the transition matrix for costate isΦ(t0,t) [18].

From (19),whenλis known,the value of costate in any time will be expressed as

Then it is obvious thatλp+λe=0 at any time of the game due to (17),(18) and (20).Then,based on (13) and(14),the optimal thrust directions of two spacecraft are given by the costate at the current time.

wherec?is the unit vector of the thrust pointing direction,φ is part of the transition matrixΦand ξ=[τ1,τ2,τ3]T.The specific form of φ is

In (21),whentfand ξ are certain,c?is determinate,scilicet thattfand ξ are the solutions of the free-time OPEG problem.

Therefore,for the specific problem,ξ is the solution for terminal costates.We assume that there exists another quantity denoted as ξ in (22).

Substituting (22) into (21),then

Equation (23) demonstrate that whentfis certain,ξ′andξprovide the same solution for the problem.From another perspective,it demonstrates that when the freetime OPEG is decided by the initial conditions,only if the problem is solvable,there will be a unique solution oftfand a solution domain ofξinvolving countless feasible solutions rather than a unique one,which makes the relationship between the primary problem and the solutions intricate.To solve this weakness,the normalization technique ofξis introduced based on (23).

For anyξin the solution domain of the problem,ξ0is denoted as the characteristic value ofξin (24).

For anyξ,ξ0can substituteξin (21) without changing the value ofc?.The 2-norm ofξ0is 1 so the solution domain ofξ0is constricted to a unit sphere.Thenξ0can be described as follows:

whereθ1∈ [-π/2,π/2] andθ2∈ [-π,π].

Fig.3 shows that the original solution space of the three costates is the whole three-dimensional space and now it is the unit spherical surface in red.

Fig.3 Original and new solution space of costates

Based on (21),c?is determined byθ1,θ2andtfas that in (26).

whereηis denoted as

The above analysis has elaborated that the solution of the free-time OPEG problem isη.And the dimension of problem is reduced to three.

Moreover,we shift the attention to the TPBVP above again.Equation (2) reveals that the dynamics system of spacecraft is a linear system and the control variables of two players are the same.Thus a new state and a new control variable are introduced in (28) and (29) to simplify the problem which is in (30) and (31).The necessary conditions for solving this new problem is similar to TPBVP in Section 2.

From (26),it is clear that the optimal control strategy for the new state at any time is determined byη.Then based on (30) and (31) and the specific initial state,the terminal statex(tf) andHwill be determined.When the pursuer intercepts the evader successfully and the pursuing time is the minimum,there will be

Moreover,the shooting function of the optimization problem is

wherek1andk2are the weighting coefficients andk1＞ 0,k2＞ 0.The shooting function corresponds to a nonlinear root-find problem with three to-be-determined variables inη.In this problem,the parametersθ1,θ2inηdetermine the terminal costate.Then together withtfinη,the histories of costate and control variables will be given by (20)and (23).Thus the histories of state and Hamilton will be got by (30) and (31) and the shooting function in (34) can be calculated.

Then the original OPEG problem has been reconstructed into a 3POP where the parameters are denoted in(27),and the shooting function is (34).The shooting success rate and the computation efficiency of 3POP depends on good initial guess a lot.Hence,an initial guess generation method which could generate precise and stable guess is required.

3.2 Initial guess generation

According to Stupik [10],the software named as Kriging has been introduced with interpolation and extrapolation techniques to create a feedback controller,the performance is much better than that of PSO in [10].However,many optimal results corresponding to the specific problem are necessary and this software is sensitive to the number of data points no matter too less or too more.Thus the generation of the initial guess is not precise or stable.As the fast development of data science and DNN in current years,the performance of DNNs in approximation and generalization of the relationship between data cannot been ignored.In this paper,the solutions are unique after the model reconstruction and are connected to the initial state of the specific OPEG problem totally.These two keys match the application of DNN perfectly.Thus an initial guess generation method based on the DNNs will be proposed.Ahead of this method,some issues should be addressed before.

First,to obtain an accurate guess,a lot of samples are necessary to cover the problem space as much as possible while it is impractical.Thus in this paper,the orbits near geosynchronous orbits are the whole problem space and all the samples are generated in this space which will be stated in Subsection 3.2.2.Second,to satisfy the requirement of DNNs,the number of samples is still very large and generating enough samples by numerical methods costs too much time.For improving the efficiency of generating samples,an interactive structure which combines generating and training together is introduced.In this structure,generating samples and network training are performed alternately and the results of 3POP will be feed backed to train the DNNs.

3.2.1 Interactive structure

Generally,the application of DNNs can be divided into two steps.The first step is generating the dataset of samples which contains the initial states and the exact solutions.The second is training the DNNs with the dataset.For some specific problems which have been studied a lot in deep learning (DL) such as image recognition,there are many existing datasets.However,the application of DNN in the field of OPEG is rare and there is no mature dataset.Based on such fact,the samples should be generated by 3POP firstly [19].In this paper,the samples generation and the training of networks are merged into an interactive structure in this subsection and the data communication is bidirectional in the structure.

Fig.4 displays the structure in detail.In Fig.4,the initial state is generated randomly and input to the DNNs after the preprocessing.In the first generating,the DNNs will output the guess randomly and the guess will be used as the initial guess of 3POP,then the exact solution will be got.When the number of samples grows to a specific number,the first training begins.Due to the limited size of samples,after the first training,the DNNs can only provide rough guess for the following generating.It does not matter because they are used as an initial value for the following 3POP and will converge to the threshold only if the problem is solvable.Then the new samples are generated,and the DNNs will be trained again by the buffer which has been supplemented by the new samples.Thus as the number of generating and training increases,the precision of guess will be better than the previous times and the generation will be more and more efficient.After predetermined times of training,the well-trained DNNs will supply accurate guess at last.

Fig.4 The interactive structure

The networks are constituted by two independent nets,Netθand Nett.They are established to fit the relationship between state and [θ1,θ2],state andtfrespectively.Both nets have the same seven inputs corresponding to the states of the problem which is denoted asx0in (35).In(35),the first six elements inx0are the state difference denoted in (28) and the last element is the mean angular velocity of the reference orbit.

Compared to the network which has only one net with three outputs corresponding to three parameters,two independent nets eliminate the influence of the loss produced by residual outputs.The mixture of three losses will reduce the guess’s precision of one single parameter.Furthermore,[θ1,θ2] reflects the relationship between three costates at the terminal time so it is better to set them in one net together comparing to the terminal timetf.

3.2.2 Data preprocessing

Data preprocessing is a necessary operation to improve the DNN’s learning performance.According to the aforementioned analysis,the problem space S has been constricted.Furthermore,for the convenience of description,the evader’s orbit at the initial time is employed as the reference orbit.The eccentricity and argument of perigee of all orbits are set as zero.Table 1 displays the range of S.

Table 1 The problem space S

In Table 1,ais the semi-major axis of the evader at the initial time and all the other orbital elements of the evader are random within the scope of physical significance.Re is the mean radius of the Earth.Moreover,dis the distance between the pursuer and the evader,when the evader is determined,the pursuer is generated in the sphere with the maximum radiusd.It is obvious that the order of magnitudes between position,velocity and time are significantly different,which will confuse the DNNs because the influence of physical quantities to the DNNs’losses will be quite different which will obliterate the effect of one or some inputs.To face this issue,the normalization of inputs is necessary.In particular,a length normalization coefficient DU and a time normalization coefficient TU is introduced as what is shown in Table 2.Then,the velocity normalization coefficient VU can also be given as shown in Table 2.

Table 2 Normalization coefficients

3.2.3 Training and optimization of DNNs

In Subsection 3.2.1,the interactive structure has been explained in detail.In this subsection,the training algorithm and the optimization of networks will be presented.

The training algorithm is displayed in Algorithm 1.Each step of training is named as one episode and the maximum of episode isM.The length of the samples buffer R isN,this buffer is saved as a stack.When the buffer is not full,the new sample will be pushed into the stack bottom.When the buffer overflows,the new sample will replace the sample positioned in the stack top and the old samples will be popped.This measure will improve the diversity of samples in some extent.Batch is the number of the samples chosen randomly from R which are used to calculate the loss and then implement the gradient descent algorithm to update the hyper-parameters of networks.

Feed-forward fully connected networks are employed to approximate the nonlinear relationship betweenx0andη.To avoid the gradient disappearance when the number of layers increases,the ReLu nonlinear activation function is applied in the hidden layers.In addition,the ReLu function can drive some neurons to output 0 which results in networks sparsity and alleviates the over fitting problem.The output of each network has a determined value range so the Tanh function is applied in the output layer with the range [-1,1].The mean square error (MSE) of the networks is applied as the loss function in the training and validating.The Adam algorithm is employed to minimize the MSE,and the learning rate of Netθand Nettis initialized to be 10-5.

The size of network affects the training effect evidently and the optimization of the network’s size is based on the approximation performance on the validation data.In general,the more complex the network size is,the more excellent the fitting performance is.However,when the size of networks is up to an extent,the training time increases a lot but the error decreases not obviously.Thus in this study,three layers,five layers,seveb layers and nine layers are selected to compare with each other.The number of units in different hidden layers are shown in Table 3.

Table 3 The size of hidden layers

Fig.5 and Fig.6 display the MSE of the validation dataset varying four sizes of hidden layers.The size of networks is optimized to seven layers finally.

Fig.5 MSE of Netθ

Fig.6 MSE of Nett

3.3 Exact solution

The DNNs outputηwithout iterations and the specific threshold judgment as that in the optimization algorithm based on the real physical constricts,the guess ofηfrom DNNs cannot guarantee convergence theoretically and the controller will be not robust enough.This phenomenon is prevalent in the research about the application of DNNs in dynamics control [20].To get the exact solution,the furthermore optimization algorithm is necessary which is a conservative but more reliable approach.

In addition to ensuring the convergence of solutions,there are two other advantages for taking such method.First,the terminal accuracy can be reset in the optimization algorithm.The interactive structure generates all samples with the same accuracy which influences the outputs of DNNs.However,in general,the accuracy requirements of the terminal state are different corresponding to the specific mission.The 3POP supplies the options to change the accuracy to adapt different scenarios.Second,outputs of DNNs are used as the initial value which relaxes the requirement of output precision in some extent.Because the outputs error of networks will be eliminated by the optimization algorithm.In other words,a huge amount of samples and a lot of episodes are not necessary to train the networks where both will improve the efficiency of training.

In a summary,with the introduction of the optimization algorithm following the initial guess of DNNs,this DRD method takes both the advantage of the robustness of 3POP and the efficiency of DNNs.

In this section,the simulations are presented to show the robustness and efficiency of DRD in solving OPEG with different initial conditions.All following computations are implemented in the environment Windows 10 Python 3.7.4 and a commodity Intel i7-9700F CPU @3.00 GHz RAM 16 GB computer.The thrust-weight ratio of the pursuer and the evader is assumed asTp=0.04g,Te=0.03grespectively wheregis the gravitational acceleration on the standard surface of the earth.The range of total pursuing time of the gametfis from 0.1 TU to 10 TU.

4.1 Results in two different cases

In this subsection,the simulation results intend to validate the effectiveness of DRD dealing with different initial conditions by introducing different cases.All the orbits are prograde.In Case 1,the two spacecraft are on the coplanar orbits and the initial distance is longer,and the orbits are non-coplanar and the initial distance is closer in Case 2.

Case 1：Coplanar orbits

In this case,the state of evader is given first.Then the initial position of pursuer is decided by the relative distance and directions in LVLH which makes it easier to understand the scenario.In additional,the velocity of pursuer can be got by the circular orbit.

The pursuer is determined as follows:

wheredis the distance between two spacecraft,σandψdecides the direction of the pursuer in LVLH.Andψ=0°implies that two spacecraft are coplanar as shown in Fig.7.

Fig.7 The position of players in Case 1

Therefore,the initial state of the problem is

The trajectory result,the miss distance results and the control variables of two angles are displayed in Fig.8,Fig.9 and Fig.10 respectively.In Fig.8,the red line and the blue line shows the trajectory of the pursuer and the evader respectively,the black line is the trajectory of the state difference in the pursuing process.In Fig.9,the label “DNNs ” means the result got by the output of DNNs directly which is different from “DRD” and the dotted lines show the results when the spacecraft flies according to the control variables given by the DNNs.Comparing to the trajectories given by the DRD,the terminal miss distance error is larger.

Fig.8 Trajectory results in Case 1

Fig.9 Miss distance results in Case 1

Fig.10 Control variable results in Case 1

Table 4 displays the values of parameters and the terminal miss distance between the control sequences given by the DNNs and DRD respectively wheretcis the time consumption for two methods to get the solution.The outputs of DNNs are close to the exact solutions and it demonstrates that the guess of parameters generated by DNNs are accurate enough to be an initial guess.

Table 4 Simulation result in Case 1

Case 2: Non-coplanar orbits

In this case,the two spacecraft are not coplanar and the initial distance is closer than that in Case 1.The initial state is shown as follows.

Then,the initial state of the problem is

The results are displayed in Figs.11-13 and Table 5 as that in Case 1.

Table 5 Simulation result in Case 2

Fig.11 Trajectory results in Case 2

Fig.12 Miss distance results in Case 2

Fig.13 Control variable results in Case 2

According to simulation results of Case 1 and Case 2,the outputs of DNNs are really very close to the exact solution of the parameters.Comparing to that in Case 1,the performance of the initial guess is better when two orbits are non-coplanar,because in co-planar cases the inputs of DNNs will contain zero which will influence the outputs significantly.In addition,comparing to Netθ,the performance of Nettis better,which accords with the results displayed in Fig.5 (b) and Fig.6 (b).The computation consumption of DRD is about two times of that of only DNNs approximately because of the furthermore optimization in 3POP.However,the time cost of DRD has been improved at least one magnitude comparing to the previous studies which will be discussed more concretely in Subsection 4.2.

4.2 Efficiency comparison

The effectiveness of DRD has been demonstrated in two cases preliminarily above.In this subsection,we will compare the efficiency of DRD with another two methods.

We replicate the dimension-reduction (DR) method in[11] and use DNNs to supply initial guess for DR in [11]like that in this paper.These two methods are labeled as“DR in [11]” and “DNNs” respectively to validate the efficiency and stableness of DRD.To eliminate the influence of the application of different optimization algorithms,the global optimization algorithm named as improved stochastic ranking evolution strategy (ISRES)[21] and the local optimization algorithm named as Nelder-Mead Simplex supplied in NLopt are applied for three methods in this paper when necessary.Furthermore,100 hundred trails have been implemented to remove the randomness of calculation.

The statistical analysis of three methods ’ time consumption is given in Fig.14 and Table 6.From the results shown in Fig.14,the values and dispersion of the consumption in DRD are less and steadier than another two methods apparently.From the statistics in Table 6,after 100 trails,the mean value of three methods have demonstrated that the DRD in this paper can improve 97%approximately comparing to that of DR in [11] because of the model reconstruction and application of DNNs in this paper.Furthermore,the efficiency of “DNNs” improves about 69.8% than that of only DR in [11].Apart from the mean value,Fig.14 and the statistical analysis display that the dispersion of “DNNs” is the widest.It indicates that although most of the results of “DNNs” are better than that of “DR in [11]”,it is not stable and reliable.The main reason is the non-uniqueness of solutions in “DR in[11]”.In some times,the guess is far away from the exact one and the global optimization algorithm is necessary to resolve the problem which is a waste of calculation consumption.

Fig.14 Comparison of efficiency between three methods

Table 6 Statistics of three methods s

In summary,the comparison results demonstrate the high efficiency and the stableness of DRD in statistical analysis.

4.3 Monte Carlo simulations

In the real scenarios,the relative state of two spacecraft may not be precise due to different measuring equipment and measuring methods.To validate the robustness of the outputs of DNNs and the proposed DRD,the Monte Carlo simulations are introduced in this subsection.

The initial state of this scenario is given randomly and it is shown in Table 7.It is assumed that the initial state is real and accurate.Then there is a normal distribution noise shown in Table 7 where the value ofσis 1% of the corresponding value to simulate the measure errors and 1 000 random initial states are generated in this Monte Carlo simulations.The following simulation results display the influence of noise.

Table 7 Dispersion in Monte Carlo simulation

The 1 000 trajectories of all simulations are shown in Fig.15.All these trajectories are driven by the control sequence given by states with normal noise through DRD.The label “initial state” indicates the real initial state in Table 7.The miss distance and the control variables in three directions are shown in Fig.16 and Fig.17 respectively.The accumulated errors grow significantly with the time going.The statistical result in Table 8 indicates that the terminal error and the measurement error are approximately equal in distribution.Moreover,to demonstrate the robustness of the DNNs,the distribution of parameters output by the networks is also displayed in Figs.18-21.The label “KDE” and “Normal” are the fitting results of Gaussian kernel density estimation and normal distribution respectively.The fitting results display that the distribution of outputs of DNNs are also near normal.In particular,Fig.21 shows the joint distribution of the parametersθ1andθ2in the solution space.The results of DNNs demonstrate that even in the noisy environment,the DNNs can also output the close guess.

Table 8 Statistical result in terminal distance error m

Fig.15 Trajectories in Monte Carlo

Fig.16 Miss distance results in Monte Carlo

Fig.17 Control variable results in Monte Carlo

Fig.18 Distribution of θ1

Fig.19 Distribution of θ2

Fig.20 Distribution of tf

Fig.21 Joint distribution of θ1 and θ2

The Monte Carlo simulation results demonstrate the robustness of the DNNs and DRD.When there is a noise in the state,the guess and solution will be also reliable statistically.

4.4 More discussions

According to the previous results and analysis,there are other issues needed to be discussed furthermore.Firstly,in this paper,the study background is set in the orbits near geostationary orbit (GEO) where the C-W equation is exact enough to describe the relative motions.When the orbits are closer to the Earth,the errors caused by the linearization of dynamics and the influence of perturbations such asJ2perturbation cannot be neglected.Secondly,the objective function of OPEG is the game time,in some cases the fuel consumption or other multiple objectives are propositional.In order to solve such problems better,the DRD in this paper can be combined with the solution continuation process such as the homotopy algorithm in the following studies.Third,the current studies focus on the problems where the thrust is continuous and low.However,the impulse thrust is more common in the satellites and there has been study [22] showing that the control strategies may be completely opposite.The theories and methods about such cases should be discussed furthermore.

This paper pays attention to solving the free-time orbital pursuit-evasion game problem and presents a method denoted as DRD to generate the optimal guidance law efficiently and reliably.Firstly,a 3POP has been proposed from the traditional TPBVP based on the costate normalization technique to reduce the dimension of the problem.Then,the DNNs are applied in an interactive structure to obtain initial guess reliably.Finally,DRD is developed to get the exact solution.The numerical results in different cases and Monte Carlo simulations have demonstrated the robustness of DRD and the comparison between three different methods shows the improvement of DRD comparing to the previous work in calculation consumption.