Ensuring data integrity is at the heart of modern archival systems, especially for organizations managing critical or large-scale data workflows. We’re excited to announce a new feature in ScoutAM that strengthens its already robust data integrity capabilities: support for user-supplied checksums. This enhancement adds another layer of assurance to your workflows, ensuring data accuracy and reliability at every step.

Why Data Integrity Matters

In large-scale storage environments, data integrity is critical. Whether you’re managing petabytes of data for scientific research, archiving enterprise information, or supporting high-performance computing workloads, ensuring that file contents remain unchanged is non-negotiable. Any corruption or unintended modification of files can disrupt operations, introduce errors, or even lead to data loss.

ScoutAM has always prioritized data integrity with built-in checksum verification. For more background information on what a checksum is or how it works, check out our detailed explanation here.

ScoutAM’s Comprehensive Checksum Support

One of the key advantages of the stateless architecture is its ability to scale horizontally. As your storage needs grow, you can easily add more Versity S3 Gateway instances to handle increased traffic without having to worry about complex state or session synchronization between those instances. Each new instance can immediately begin processing requests independently, allowing your system to handle growing demand seamlessly.

Efficient Load Balancing

ScoutAM has long supported popular cryptographic hash algorithms, including:

  • MD5
  • SHA1
  • SHA256
  • SHA384
  • SHA512

These built-in checksums allow users to verify file integrity at various stages of the workflow. However, we recognize that many organizations rely on externally generated checksums to meet internal policies, regulatory requirements, or legacy system compatibility. This is where our new feature comes in.

Introducing User-Supplied Checksums

With this new capability, ScoutAM now allows users to supply their own independently-generated file-level checksums during file ingestion or migration. This provides an additional layer of validation and seamlessly integrates with existing workflows.

How It Works:

  1. During File Ingestion: Users can supply pre-generated checksums (e.g., from an external system or prior validation process) alongside the files being ingested into ScoutAM.
  2. Checksum Verification: ScoutAM immediately verifies the supplied checksum to confirm file integrity during ingestion.
  3. On-Demand Validation: At any point, users can initiate checksum verification via the CLI or API to ensure that file contents remain intact.
  4. Automated Verification During Data Movement: Checksums can be automatically re-validated during data transfers within ScoutAM, such as during replication, migration, or retrieval processes.

By allowing users to import their own checksums, ScoutAM enhances compatibility with other systems and strengthens its end-to-end data integrity features.

CLI Example:

# md5sum /mnt/scoutfs/testfile
29ddb9ac92635ced72af4cb9c66c6803  /mnt/scoutfs/testfile
# samcli file checksum --type MD5 --set 29ddb9ac92635ced72af4cb9c66c6803 /mnt/scoutfs/testfile

API Example:

# TOKEN=$(curl -s -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' --data '{"acct":"admin","pass":"versity"}' http://scoutam.domain:8080/v1/security/login' | jq -r '.response')

# curl -s -k -X POST -H "Authorization: Bearer $TOKEN" -H "Accept: application/json" -H "Content-Type: application/json" --data '{"path":"testfile","type":"MD5","value":"29ddb9ac92635ced72af4cb9c66c6803"}' https://scoutam.domain:8080/v1/file/checksum

Use Cases for User-Supplied Checksums

  • Legacy System Migrations: When migrating from existing systems (like Versity Storage Manager or other platforms), organizations can bring their pre-existing checksums to verify data consistency.
  • Regulatory Compliance: Industries with strict compliance standards often require checksum verification at multiple stages. User-supplied checksums provide an extra layer of assurance.
  • Data Integrity Auditing: Users can cross-verify file contents against externally-generated checksums for complete confidence in data accuracy.
  • Custom Validation Workflows: Organizations that generate checksums as part of their internal validation or ingest pipeline can now seamlessly integrate this process into ScoutAM.

Why This Matters

End-to-end data integrity is one of the core pillars of reliable storage management. ScoutAM has long upheld this standard with robust built-in checksum verification. Now, by enabling user-supplied checksums, ScoutAM addresses critical vulnerabilities that can emerge during data ingestion and retrieval.

These vulnerabilities arise when checksums are generated solely within the archival system, leaving potential gaps in protection at key stages, such as:

  1. Data being copied to the archive filesystem cache.
  2. Data being archived to media, where the checksum is generated.
  3. Data being released from the filesystem cache.
  4. Data being staged from archival media with checksum verification.
  5. Data being retrieved from the archive filesystem cache to primary storage.

When checksums are generated in step 2, the interactions in steps 1 and 5 remain unprotected. This leaves the data source, network transmission, and storage hardware susceptible to corruption that could go undetected.

To close these gaps, ScoutAM empowers users or applications to generate and supply their own checksums at the point of ingestion. By validating these checksums immediately and throughout the data lifecycle, ScoutAM ensures unmatched end-to-end data integrity, giving you full confidence that your data remains accurate and reliable from ingestion to long-term storage.

By supporting user-supplied checksums, ScoutAM offers:

  • Greater Flexibility: Users can leverage their existing checksum workflows without any disruption.
  • Improved Assurance: Independent verification adds another layer of confidence in your data.
  • Seamless Integration: Checksums can be supplied, verified, and revalidated effortlessly through ScoutAM’s CLI and API.

Whether you’re moving data between systems, monitoring files over time, or ensuring compliance with industry standards, this new feature makes ScoutAM an even more powerful solution for managing data integrity.

Conclusion

With the introduction of user-supplied checksums, ScoutAM continues to set the standard for reliable, high-performance archive management. This new feature complements its existing suite of tools for data verification, providing users with maximum confidence that their data remains intact.

Learn more about ScoutAM here and how it meets the evolving needs of modern storage management.

Read more

New Customer Spotlight: NIWA Chooses Versity to Modernize Massive Weather Archive
Articles

New Customer Spotlight: NIWA Chooses Versity to Modernize Massive Weather Archive

Discover how NIWA, New Zealand’s leading environmental research institute, is safeguarding the future of the nation’s climate data by partnering with Versity. Learn why they chose Versity’s ScoutAM to modernize their vast weather archive, ensuring the long-term preservation and accessibility of critical environmental data. This article dives into the innovative approach that made Versity the ideal choice for this crucial modernization effort.

Beyond Backup: How An Integrated Archive Solution Can Tackle Backup Data Challenges
Articles

Beyond Backup: How An Integrated Archive Solution Can Tackle Backup Data Challenges

Discover how traditional backup systems, though vital, often fall short when dealing with massive datasets. By directing backup data to an archiving platform, organizations can overcome inefficiencies, reduce storage costs, and enhance data scalability. Learn how this innovative approach can optimize your data management strategy, ensuring both long-term preservation and swift recovery.

Rise to the challenge

Connect with Versity today to find out how we can tailor a solution to keep your organization’s data safe and accessible as you advance your mission.