Introduction
Data curation and preservation face significant challenges in facilities, digital repositories systems, and high performance computing (HPC). These challenges include sustainability of infrastructure, metadata quality, governance gaps, and the scalability required to handle massive datasets. Addressing them is essential to ensure long term accessibility, usability, and trustworthiness of digital assets. (Springer, 2024).
Facilities
Institutions face persistent challenges in their facilities for data curation and preservation, which directly affect the sustainability and reliability of digital repositories. One of the most pressing issues is funding constraints. Many institutions rely on short term project based funding rather than long term financial planning, making it difficult to maintain servers, storage systems, and disaster recovery facilities. This lack of sustainable investment often results in inconsistent preservation practices and increases the risk of data loss (EOSC Association, 2024).
Another challenge is the lack of expertise among staff. Skilled professionals in metadata standards, repository management, and digital preservation are scare, particularly in developing regions. Without adequate training and retention strategies, institutions struggle to implement best practices for long term data stewardship. This skills gap undermines the effectiveness of preservation facilities and weakens institutional credibility (Kanyundo, 2022)
https://unidata.pro/wp-content/uploads/2024/07/data-curation-cover-768x568.webp
Digital repositories systems
Digital library challenges: (Sharma, n.d.).
Repositories often struggle with technological limitations, including outdated software and hardware, poor interoperability with open standards, and inconsistent metadata practices. These issues reduce the discoverability of content and hinder usability. Limited resources further restrict institutions from adapting to evolving tools, making repositories less accessible and effective for researchers and the public (Pinfield et al., 2017).
In addition to technological limitations, institutional repositories also face challenges related to scalability and sustainability of infrastructure. As the volume of digital content grows, many repositories struggle to expand their systems to handle increasing storage and retrieval demands. This often results in performance bottlenecks, slower access times, and difficulties in maintaining long term preservation. Furthermore, the lack of interoperability with emerging technologies and evolving standards makes it harder for repositories to integrate with global scholarly communication networks. These scalability issues, combined with limited institutional resources, hinder the ability of repositories to remain effective and relevant in supporting research dissemination and preservation (Rothfritz et al., 2025).
High performance computing
High performance computing crossing the barriers between clouds achieved (Wilkinson, et al., 2016).
High performance computing (HPC) environments in institutions generate massive datasets that require careful curation and preservation. One of the primary issues is the sheer volume of data, which often reaches petabyte scale in scientific research. Without scalable storage and effective classification, institutions risk losing valuable information or incurring unsustainable costs. This challenge is compounded by the need to ensure metadata consistency and provenance tracking, which are essential for reproducibility and long term usability of research outputs. In many cases, institutions lack standardiZed frameworks for metadata, making it difficult to verify or reuse datasets effectively (Wilkinson, et al., 2016).
High performance computing (HPC) environments in institutions generate massive datasets that require careful curation and preservation. One of the primary issues is the sheer volume of data, which often reaches petabyte scale in scientific research. Without scalable storage and effective classification, institutions risk losing valuable information or incurring unsustainable costs. This challenge is compounded by the need to ensure metadata consistency and provenance tracking, which are essential for reproducibility and long term usability of research outputs. In many cases, institutions lack standardized frameworks for metadata, making it difficult to verify or reuse datasets effectively (Wilkinson, et al., 2016).
High performance computing (HPC) systems in institutions not only struggle with managing vast amounts of data but also with ensuring the long term sustainability and security of that data. As research outputs expand into petabyte and exabyte scales, the costs of maintaining reliable storage infrastructure rise significantly. Inefficient handling of inactive datasets often leads to clogged primary storage, which increases expenses and reduces systems efficiency. To mitigate this, institutions are adopting tiered storage models and archival strategies that balance affordability with accessibility. At the same time, the integrity of curated data is consistently at risk from corruption or accidental loss. This makes the implementation of redundancy, and robust backup protocols essential to safeguard research outputs and preserve them for future use (Rothfritz et al., 2025).
REFERENCES
EOSC Association. (2024). EOSC Strategic Research and Innovation Agenda (SRIA) 2024. European Open Science Cloud Association. Retrieved from https://eosc.eu
Kanyundo, A. J. (2022). Knowledge management practices at Lilongwe University of Agriculture and Natural Resources (LUANAR) Bunda College [Master’s dissertation, Mzuzu University]. Mzuzu University Digital Repository.
Medium (nd) Data Curation for AI at Scale: Overcoming Challenges in Cleaning & Structuring Large Datasets https://sodevelopment.medium.com/data-curation-for-ai-at-scale-overcoming-challenges-in-cleaning-structuring-large-datasets-3b3bce4f128d
Pinfield, S., Salter, J., & Bath, P. A. (2017). A "gold-centric" implementation of open access: Hybrid journals, the "total cost of publication," and policy development in the UK and beyond. Journal of the Association for Information Science and Technology, 68(9), 2248–2263. https://doi.org/10.1002/asi.23817
Rothfritz, L., Matthias, L., Pampel, H., & Wrzesinski, M. (2025). Current challenges and future directions for institutional repositories: A systematic literature review. Annual Review of Information Science and Technology (ARIST). https://doi.org/10.1002/asi.70016
Sharma, V. K. (n.d.). Digital repositories and knowledge management practices in academic institutions. Jaypee University of Information Technology, Waknaghat, Solan, Himachal Pradesh, India.
Springer Nature. (2024). Challenges for monitoring and data analytics in a leadership public data repository. ISC High Performance 2024 International Workshops.
Well explained and wonderful Ads
ReplyDeleteThis is well articulated keep it up
ReplyDeleteWell explained
ReplyDeleteNice one
ReplyDeleteNice work
ReplyDelete