This article discusses the problems and possible solutions for HFT data in empirical asset price models. The Capital Asset Pricing Model (CAPM) and Fama-French three-factor model are validated with high-frequency data before and after cleaning the data using advanced outlier detection methods. It use the compute tools like Isolation Forest, DBSCAN, and RPCA to spot and correct the inaccurate data points that often distort financial models. Models accuracy and robustness were improved dramatically following the data cleaning, with CAPM and Fama-French models receiving an accuracy enhancement of 0.75 to 0.89 and 0.78 to 0.85, respectively. The paper also examines classical data cleaning processes versus computational methods and the efficiency of the latter. The impact for financial modeling and asset management is far-reaching, with a message that better data means better choices and more predictable models. These results provide evidence of the importance of advanced data cleaning in high frequency trading and their ability to enhance decision making in financial markets.
Research Article
Open Access