K-mer analysis is a core technology in bioinformatics, widely used in genome assembly, variant detection, and metagenomic research. With the exponential growth of sequencing data, efficiently processing massive k-mer datasets imposes higher demands on computational performance, memory usage, and system scalability. This paper focuses on implementing k-mer analysis in Python environments, exploring its key challenges in algorithmic efficiency and resource management. Python-based workflows demonstrate strong performance in data compression, graph structure simplification, and feature extraction by incorporating greedy strategies, sorting optimisations, parallel computing, and deep learning methods. With GPU acceleration and cloud platform deployment, this technical approach exhibits potential for scaling to petabyte-scale genomic datasets, making it suitable for high-throughput bioinformatics tasks across multiple scenarios. This methodology not only addresses current computational needs but also provides a reference for the development of bioinformatics computational models.
Research Article
Open Access