Optimizing Language Models and Text-Based EconomicSystems with Proximal Policy Optimization
DOI:
https://doi.org/10.30812/bite.v7i1.5222Keywords:
Large Language Model, Mean Absolute Error, Proximal Policy Optimization, Reward, TestAbstract
Background: This study investigates the use of the Proximal Policy Optimization (PPO) algorithm in two text-based case studies: alignment of large language models (LLMs) with human preferences and dynamic pricing based on customer reviews. In the LLM case, PPO combined with preference-based learning significantly improves alignment, BLEU, and human-likeness scores.
Objective: This research aims to evaluate PPO’s effectiveness in text-based decision-making through these two cases.
Methods: The method employed is reinforcement learning experimentation using the PPO approach. For the LLM case, PPO is integrated with preference learning to enhance alignment, BLEU, and human-like output. Meanwhile, in the economic scenario, PPO produces adaptive pricing strategies with high accuracy or low Mean Absolute Error (MAE) and the best cumulative rewards, outperforming the A3C and DDPG algorithms. Cross-validation and ablation studies assessed PPO’s generalization capability and the contribution of reward components, clipping, and exploration strategies.
Result: The findings demonstrate that PPO excels across distinct domains and offers a stable and efficient solution for text-based tasks.
Conclusion: The findings confirm its flexibility for various NLP applications and intelligent decision-making systems
Downloads
References
[1] H. Zhong dan T. Zhang, “A theoretical analysis of optimistic proximal policy optimization in linear Markov decision processes,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, ser. NIPS ’23, Red Hook, NY, USA: Curran Associates Inc., Dec. 2023, pp. 73 666–73 690.
[2] L. Zhang et al., Penalized Proximal Policy Optimization for Safe Reinforcement Learning, Jun. 2022. doi: 10.48550/arXiv.2205.11814. arXiv: 2205.11814 [cs].
[3] A. K. Jayant dan S. Bhatnagar, “Model-based safe deep reinforcement learning via a constrained proximal policy optimization algorithm,” in Proceedings of the 36th International Conference on Neural Information Processing Systems, ser. NIPS ’22, Red Hook, NY, USA: Curran Associates Inc., Nov. 2022, pp. 24 432–24 445.
[4] L. Wang et al., “An adaptive traffic signal control scheme with Proximal Policy Optimization based on deep reinforcement learning for a single intersection,” Engineering Applications of Artificial Intelligence, vol. 149, p. 110 440, Jun. 2025. doi: 10.1016/j.engappai.2025.110440.
[5] L. Luo dan X. Yan, “Scheduling of stochastic distributed hybrid flow-shop by hybrid estimation of distribution algorithm and proximal policy optimization,” Expert Systems with Applications, vol. 271,p. 126 523, May 2025. doi: 10.1016/j.eswa.2025.126523.
[6] B. Konda et al., “Enhancing Traceability and Security in mHealth Systems: A Proximal Policy OptimizationBased Multi-Authority Attribute-Based Encryption Approach,” in 2025 29th International Conference on Information Technology (IT), Feb. 2025, pp. 1–6. doi: 10.1109/IT64745.2025.10930307.
[7] X. Chen et al., “Dynamic path planning for multi-USV in complex ocean environments with limited perception via proximal policy optimization,” Ocean Engineering, vol. 326, p. 120 907, May 2025. doi:10.1016/j.oceaneng.2025.120907.
[8] J. Kumar V dan V. K. Elumalai, “A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator,” Results in Engineering, vol. 25, p. 104 178,Mar. 2025. doi: 10.1016/j.rineng.2025.104178.
[9] M. Bilban dan O. Inan, “Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization,” Sensors, vol. 25, no. 6, p. 1941, Jan. 2025. doi: 10.3390/s25061941.
[10] J. Zhang et al., “Proximal policy optimization via enhanced exploration efficiency,” Information Sciences, vol. 609, pp. 750–765, Sep. 2022. doi: 10.1016/j.ins.2022.07.111.
[11] A. Alagha et al., “Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization,” Future Generation Computer Systems, vol. 136, pp. 342–357, Nov. 2022. doi:10.1016/j.future.2022.06.015.
[12] N. Gupta, H. Kandath, dan H. Kodamana, “An adversarial twin-agent inverse proximal policy optimization guided by model predictive control,” Computers & Chemical Engineering, vol. 199, p. 109 124, Aug. 2025.doi: 10.1016/j.compchemeng.2025.109124.
[13] C. Wu, W. Bi, dan H. Liu, “Proximal policy optimization algorithm for dynamic pricing with online reviews,” Expert Systems with Applications, vol. 213, p. 119 191, Mar. 2023. doi: 10.1016/j.eswa.2022.119191.
[14] T. Wu et al., “Pairwise proximal policy optimization: Language model alignment with comparative RL,” in First Conference on Language Modeling, 2024, pp. 1–21.
[15] S. Son et al., “Gradient informed proximal policy optimization,” in Proceedings of the 37th International Conference on Neural Information Processing Systems, ser. Guide Proceedings, Dec. 2023, pp. 8788–8814. doi: 10.5555/3666122.3666506.
[16] C. Zhang et al., “Proximal Policy Optimization for Efficient D2D-Assisted Computation Offloading and Resource Allocation in Multi-Access Edge Computing,” Future Internet, vol. 16, no. 1, p. 19, Jan. 2024. doi: 10.3390/fi16010019.
[17] X. L¨u et al., “Adaptive energy management strategy for FCHEV based on improved proximal policy optimization in deep reinforcement learning algorithm,” Energy Conversion and Management, vol. 321, p. 118 977, Dec. 2024. doi: 10.1016/j.enconman.2024.118977.
[18] S. Yin dan Z. Xiang, “A hyper-heuristic algorithm via proximal policy optimization for multi-objective truss problems,” Expert Systems with Applications, vol. 256, p. 124 929, Dec. 2024. doi: 10.1016/j.eswa.2024.124929.
[19] L. Klein, I. Zelinka, dan D. Seidl, “Optimizing parameters in swarm intelligence using reinforcement learning: An application of Proximal Policy Optimization to the iSOMA algorithm,” Swarm and Evolutionary Computation, vol. 85, p. 101 487, Mar. 2024. doi: 10.1016/j.swevo.2024.101487.
[20] X. Wu et al., “Combustion optimization study of pulverized coal boiler based on proximal policy optimization algorithm,” Applied Thermal Engineering, vol. 254, p. 123 857, Oct. 2024. doi: 10.1016/j.applthermaleng. 2024.123857.
[21] A. Ahmad et al., Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards, 2024. doi: 10.48550/ARXIV.2411.17861.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Irwan Darmawan, Nilam Ramadhani, Mohammad Nazir Arifin, Ubaidi -, Nindian Puspa Dewi, Muhammad Innuddin

This work is licensed under a Creative Commons Attribution 4.0 International License.