Volume 173 | Theoretical and Natural Science

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33448

Financial Higher-Order Interaction Network Analysis for Precious Metals and Stock Indices Based on Information Dynamics

Shuhao Hu

Against the backdrop of deepening global financial integration, the linkage between precious metals and stock markets has become a key issue in asset allocation and financial risk management. Existing studies mainly focus on lower-order interactions between the two asset classes, making it difficult to capture higher-order nonlinear interdependencies in multi-asset systems or to distinguish between redundancy and synergy effects. To address this gap, this paper investigates Gold, Silver, Platinum, the CSI 300 Index, and the Nasdaq Composite Index over the period from January 2016 to November 2025. Based on information dynamics, we construct a multi-resolution higher-order interaction (HOI) framework at the global, node, and link levels, derive the core O-information rate (OIR) measures under a vector autoregressive framework, and divide the sample into the Full Sample Period, Bull Market Period, Bear Market Period, and Recent Period. The results show that the five-dimensional cross-market system is overall dominated by redundancy effects. Precious metals are the main contributors to system redundancy, whereas stock indices are the primary sources of cross-market higher-order synergy. At the link level, intra-precious-metals pairs constitute the core redundancy channels, while cross-category pairs and intra-equity links are generally balanced between synergy and redundancy, except during the Bear Market Period, when significant cross-category synergy emerges. These findings provide a new perspective for understanding cross-market interactions and offer a useful framework for financial market analysis.

Read Article PDF

Cite

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33465

Stability Analysis of Reinforcement Learning Training for Large Language Models

Jia Hu

As large language models have gradually acquired the ability to deal with complex logical reasoning and long text generation tasks, reinforcement learning has become a key technology to achieve model output and align human preferences. However, during the training process, the model often falls into the dilemma of policy collapse, performance degradation or reward hacking. Therefore, this paper systematically discusses the training stability problem faced by large language models in reinforcement learning. Based on the detailed analysis of the estimation error of the value function and other issues, this paper summarize several key reasons for the training instability. It is found that the misinitialization and signal attenuation of the value function for long chain of thought tasks will lead to errors in the results of the advantage function, and due to the lack of random search, it is easy to fall into the local optimal state in the process of policy gradient. In summary, this paper mainly discusses the shortcomings of the current mainstream algorithms in dealing with these stability, and provides prospects and development directions for future development.

Read Article PDF

Cite

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33416

Analysis of LLM-as-a-Judge Position Bias Mechanism Based on Controlled Synthesis Data

Zhiyin Yang

With the widespread application of large language models in automatic evaluation tasks, Large Language Model (LLM)-as-a-Judge has gradually become an important paradigm for reward modeling and model alignment. Existing research mainly improves judgment ability through thought chain reasoning and reinforcement learning, but pays less attention to the impact of training data distribution structure characteristics on model learning behavior. Addressing the positional bias problem in large language models' generated answers, this paper constructs a controllable synthetic preference data environment from the perspective of data distribution and compares the learning differences of reward models under biased and unbiased training mechanisms. The experiment uses a logistic regression model as the reward function approximator and analyzes the model's accuracy, positional bias rate, and parameter structure changes while keeping the test set unbiased. The results show that when there is a statistical correlation between order and label in the training data, the positional feature weight is 1.047, significantly higher than the control group's weight of 0.303. This indicates that the model significantly increases its dependence on positional features, and even if the overall performance does not decrease significantly (94.67% & 98.33%), its internal decision-making mechanism still shifts. The study reveals that data structure features may alter the model's discrimination criteria without affecting surface performance, providing a mechanism-level reference for reward model design and data construction.

Read Article PDF

Cite

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33417

Intelligent Traffic Light Control Based on Reinforcement Learning

Yubo Wang

With the advancement of technology, the number of vehicles has increased, and the traffic problems have become more severe. During the morning rush hour, traffic congestion has become a common occurrence. The traditional traffic lights cannot determine when to change the lights as the cycle changes. Therefore, the traditional traffic lights that change according to a fixed cycle are no longer suitable for the current traffic conditions. This experiment aims to investigate the feasibility of an intelligent traffic light control system based on deep reinforcement learning (DQN). By constructing a simplified simulation environment of a crossroads, the DQN intelligent agent was designed to dynamically adjust the phase switching, and a comparative experiment was conducted with the fixed-duration strategy. The aim of this study is to cut down queuing time, lower time costs and fuel consumption. The experimental results show that the DQN agent can autonomously learn the optimal control strategy, reducing the average queue length by approximately 28.6%, verifying the application potential of reinforcement learning in intelligent traffic control. This research conducted a systematic analysis of aspects such as technical feasibility, data feasibility, and one's own capabilities, providing a framework for the development of intelligent traffic lights.

Read Article PDF

Cite

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33395

The Analysis of Solar-Powered Wireless Charging Technology and Its Optimization

Wanyue Yuan

Wireless charging, as a new charging method, shows its unique advantages such as no need to plug or unplug connectors, safety and convenience to the world. However, the technology still has some defects in its working efficiency and stability. Some researchers cast their eyes on solar power generation which is more environmentally friendly. They try to combine wireless charging with solar power generation in order to compensate for the shortcomings of both technologies. Based on the theory of wireless charging and solar power generation, this article is going to discuss the feasibility of solar-powered wireless charging technology and analyze reasonable optimization methods for this technology. Through examples, this article draws the following conclusions. First, the efficiency of solar-powered wireless charging technology can be improved by adjusting the circuit structure and coordinating with appropriate algorithms. Then, the material of the coil and the type of power supply can be flexibly matched according to the needs, supplemented by necessary compensation parts, which can greatly reduce the cost of the system. Last, adding functional components is able to enlarge usage scenarios of the technology. Furthermore, homes are great application scenarios for solar-powered wireless charging technology.

Read Article PDF

Cite

Research Article Open Access

Published 11 May 2026 DOI: 10.54254/2753-8818/2026.33408

The Application and Practice of Reinforcement Learning in Recommendation Systems

Yiyang Wang

The application scenarios of recommendation systems are becoming increasingly complex, and traditional algorithms struggle to balance users' short-term click behaviors with their long-term value demands. Reinforcement learning, with its dynamic policy iteration capability for the interaction between the subject and the environment, provides a core technical path for optimizing the long-term value of recommendation systems. This paper focuses on three typical recommendation scenarios: e-commerce, games, and knowledge retrieval. It systematically reviews the algorithm paradigms of reinforcement learning and their applicability in different scenarios, compares the application logic differences between product recommendation and content recommendation, and summarizes the verification results of basic reinforcement learning recommendation models based on public datasets. The research shows that value-based reinforcement learning algorithms are suitable for product recommendation scenarios, while policy-based algorithms are more suitable for content recommendation scenarios. The related research results provide theoretical references and practical basis for the application of reinforcement learning in industrial-level recommendation scenarios.

Read Article PDF

Cite

Research Article Open Access

Published 18 May 2026 DOI: 10.54254/2753-8818/2026.33708

Optimal Decay Rate to the Contact Discontinuity for 1-D Compressible Radiation Hydrodynamics Model

Shaoping Peng, Dan Chen

The radiation hydrodynamics equations, which describe the interaction between fluid flow and radiation, play a fundamental and essential role in modeling a wide range of high-energy physical phenomena. In this paper, we consider the optimal decay rate of solutions to the initial value problem for the one-dimensional compressible radiative hydrodynamics model with respect to contact discontinuities. Specifically, for the case with viscosity but without considering heat conduction, under the non-zero mass condition, the optimal decay rate(1+t)-1/2of the solution to the viscous contact discontinuity in theL∞-norm is obtained, provided that the initial perturbations around the contact discontinuity and the strength of the contact discontinuity are sufficiently small. The absence of heat conduction effects poses substantial difficulties in establishing energy estimates and decay estimates of the solution. By employing the energy method together with a crucial transformation, we obtain refined energy estimates and derive the optimal time-decay rate of the solution.

Read Article PDF

Cite

Research Article Open Access

Published 18 May 2026 DOI: 10.54254/2753-8818/2026.33497

DQN for CartPole Inverted Pendulum Balance Control

Zhaoyuan Wang

The inverted pendulum stabilization is considered one of the classical problems used to measure the performance of deep reinforcement learning algorithms. In this paper, the Double Deep Q-Network (Double DQN) algorithm is implemented using the PyTorch framework, which operates in the CartPole-v1 environment created by OpenAI Gym. Moreover, this research explores the key factors of the Double DQN model architecture. The key advantage of the Double DQN is in the separation of the actions choosing and estimation stages, which prevents overestimation of Q-value and improves the learning stability and quality of control at the end of the training process. Specifically, this part of the paper highlights such key architectural features of the algorithm as the structure of the three-layer fully connected neural network, the use of the ReLU activation function, the weight initialization strategy based on the Xavier initialization, the update policy for the two networks, and the Adam optimization strategy. Besides, ablation studies were performed with respect to the baseline DQN. The results showed that after 160 training iterations, Double DQN reached an average score of 196.5 on the last 100 iterations. As compared to the baseline DQN, the Double DQN achieved a faster convergence by 20% and better training stability.

Read Article PDF

Cite

Research Article Open Access

Published 18 May 2026 DOI: 10.54254/2753-8818/2026.33681

Optimized Design of Flexible Wireless Charging Coils Using Magnetic Coupling Resonance

Xingyu Sheng

At present, many devices still use wired charging. There are many lines, and the interface is easy to break. It is not easy to use if the environment is poor. Ordinary wireless charging uses a rigid coil, which is limited in space. It is easy to be damaged when stretched, and the charging efficiency drops severely. In order to optimize the rigid coil of magnetic coupling resonance wireless charging in wireless charging, this paper first explains the working principle of the system, and then summarizes the existing research on snake wire, island bridge structure, flexible substrate, and ferrite. The main purpose is to optimize the island bridge structure so that the flexible coil is not easily deformed when stressed and select more suitable materials. Therefore, this paper innovatively puts forward the structural scheme of a flexible and stretchable magnetic Snake Island Bridge coil, and after summarizing the existing research, it is concluded that this kind of coil has less structural change under high stress environment, and can also meet the needs of people's daily light travel.

Read Article PDF

Cite

Research Article Open Access

Published 18 May 2026 DOI: 10.54254/2753-8818/2026.33527

Applications of Large Language Models in Data Analysis

Ruiqi Zou

With the background of the ever-increasing data volumes, data analysis becomes even more significant in the context of scientific studies and business decisions. Nevertheless, the conventional data analysis approaches usually consist of programming and statistical expertise, which to a certain degree restricts their fields of application. In this paper, the author explores the use of large language models in the analysis of data, performing a systematic analysis of the data analysis process in four phases: data processing, data planning, data reasoning and data feedback. The study reveals that big language models can automatically undertake actions like cleaning data and feature engineering, thus efficiently boosting the processing of data. Moreover, they are capable of breaking down more complicated analytical challenges to create systematic analytical processes. The reasoning ability of the models has been highly improved by utilizing the architecture of Transformer and prompt-based learning techniques. Moreover, with the addition of a reinforcement learning mechanism through human feedback (RLHF), the model is further optimized in performance and improved in the interpretation of results. Despite still having issues with inference stability, the accuracy of results and training costs, large language models show that they have enormous benefits in reducing the entry threshold to data analysis and improving the efficiency of the analysis process, which promises them a promising future.

Read Article PDF

Cite

Articles in this Volume