Businesses today have a significant opportunity to leverage data innovatively. However, they must also carefully consider the data they retain and how they utilize it to avoid potential legal challenges. Despite the rise of generative AI, organizations are tasked with not only protecting personal data but also strategically managing and deleting outdated information that poses greater risk than business value.
Forrester projects that unstructured data will double by 2024, spurred by AI advancements. Given the shifting data landscape and the increasing costs of breaches and privacy violations, organizations must critically evaluate their data retention and deletion strategies.
The Rising Threat of Data Breaches
As data volumes surge, so do the expenses associated with data breaches and privacy violations. Ransomware attacks have targeted sensitive databases, including those of prominent entities such as 23andMe, Infosys, and Boeing. According to IBM, the average total cost of a breach soared to $4.45 million in 2023, marking a 15% increase since 2020.
To mitigate these risks, organizations need robust policies to delete obsolete data. While generative AI may raise questions about the necessity of data deletion, retaining data longer increases exposure to breaches and potential penalties for privacy law violations. The first step in reducing this risk is to thoroughly assess how a company utilizes its data, examining both the nuanced considerations and tangible benefits of a sound data retention strategy.
Why Eliminate Obsolete Data?
Many organizations must delete outdated data due to legal obligations inherent in data protection laws. Regulations typically require personal data retention only as long as necessary, prompting companies to establish varied retention periods across different business areas. Besides minimizing legal liabilities, removing obsolete data can lead to significant cost savings in storage.
Identifying Obsolete Data
To determine which data is outdated and which holds ongoing business value, companies should develop a data map detailing data sources, types, storage systems, and intended processing purposes. This comprehensive mapping ensures companies understand where personal data resides, what types of personal data are processed, and the applicable privacy laws in different geographic locations. A thorough data inventory and classification are essential for an effective privacy program, offering the data lineage necessary to track data flow within the organization.
Once a data map is established, legal and technical teams can collaborate with business stakeholders to assess the value of specific data, applicable regulatory restrictions, and the risks associated with data leaks or unnecessary retention.
The Challenge of Deleting Data
Business stakeholders often hesitate to delete data, concerned about emerging technologies. Discussions around data retention should focus on business utility. For example, a financial institution's data analytics team might aim to train lending eligibility models on extensive datasets. However, relying on outdated data can contradict data protection laws. For instance, data from 20 years ago might not accurately reflect modern consumer behaviors, especially in a rapidly changing economy.
The commercial real estate sector exemplifies this issue. Many risk-prediction models built on pre-pandemic data may lead to inaccurate forecasts in light of shifts toward online shopping and remote work. Educating stakeholders on how data can become stale and how it impacts decision-making is crucial.
Strategies for Handling Obsolete Data: Determine, Delete, or De-Identify
When deciding how long to retain data, consider legal obligations around financial records and sector-specific regulations. Consult statute of limitations to ascertain necessary retention periods for potential litigation defense, keeping only personal data that is essential, such as transaction logs or user consent records.
When it's time to dispose of less valuable data, organizations can delete it manually according to predefined retention schedules. Automating the process through a purge policy enhances reliability. Alternatively, organizations may opt for de-identification, which removes identifiable data, though this presents its own set of challenges.
Proper de-identification typically falls under exemptions in data protection laws, but achieving this requires significant data reduction, potentially losing valuable insights. For example, to comply with HIPAA safe harbor standards, an entity must eliminate 18 specific identifiers. While this approach can be useful for analytics and AI models, it is essential to weigh the pros and cons with stakeholders.
Avoiding Common Pitfalls
One of the most significant mistakes companies make in handling obsolete data is rushing the process and neglecting comprehensive discussions. It is crucial to involve legal, privacy, and security teams, as well as business leaders, to gather diverse perspectives on essential data retention. Companies should be prepared to shorten retention periods gradually, whereas once data is deleted, recovery is often impossible.
To navigate the complexities of data deletion, companies must prioritize thorough data mapping and lineage analysis, define retention criteria clearly, and implement these policies efficiently. By understanding the legal, cybersecurity, and financial implications, organizations can craft a robust data retention strategy that not only complies with regulations but also protects their digital assets effectively.
Seth Batey is the Data Protection Officer and Senior Managing Privacy Counsel at Fivetran.