Local Solution:#

If you decide to preserve source code yourself, there are different levels of preservation standards you can abide by – depending on the resources and capacity you have to maintain your software. The following instructions are based on the National Digital Stewardship Alliance’s (NDSA) 4 levels of digital preservation, which we have adapted to fit the needs of Github repository preservation. These methods for storing software are based on MackinationsAI’s guide (link).

Prerequisites:

​ Before you start, make sure you have:

​ • Git installed

​ • On macOs with Homebrew: brew install git

​ • On Ubuntu/Debian: sudo apt-get install git

​ • (Optional but recommended) Git LFS if the repo uses large files

​ • brew install git-lfs or sudo apt-get install git-lfs

​ • then run git lfs install

​ • Enough disk space on the target drive

​ • On the repo’s GitHub page you can check the approximate size

Simple “mirror” clone (good for long-term storage)

​ If you mainly care about preserving the history and do not need to open the project in an editor right now, amirror clone is the safest and most compact option.

		# 1. Choose a directory where you keep archivescd /path/to/archive/folder

# 2. Mirror-clone the repository

​		git clone --mirror https://github.com/username/repository.git

\# 3. Enter the bare repo

​		cd repository.git

# 4. (Optional) Fetch all Git LFS objects, if used

​		git lfs fetch --all  # only works if Git LFS is installed

# 5. Fetch all tags explicitly (usually already included)

​		git fetch --all --tags`

What this gives you

​ • a bare Git repository in repository.git/

​ • all branches and full commit history

​ • all tags and refs

​ • (if you ran git lfs fetch –all) all LFS objects

​ You cannot open this bare repo directly in most IDEs, but you can recreate a working copy in the future f rom this archive:

# later, on another machine:
git clone /path/to/archive/repository.git my-working-copy`

​ This is a good choice if your Local Solution is mainly about long-term storage.

Verification

​ Check how many branches you have:

git branch | Measure-Object -Line

​ View all branches:

git branch -a

​ Check disk usage:

Get-ChildItem .git -Recurse | Measure-Object -Property Length -Sum

How this fits into the Local Solution section

​ This bash guide is one concrete implementation of the Local Solution described in the main decision guide:

​ It stays on your own storage (external disk, NAS, etc.).

​ It does not depend on Software Heritage or any institution. ​ It can be combined with other strategies (for example, you can mirror to two disks and also submit the repo to Software Heritage). ​ For many individual developers or small groups, this is the fastest way to gain real control over their GitHub code: a copy that lives on their own hardware, with all branches, tags, and history preserved.


Levels of Github Repository Preservation

Level 1 (Know your content)Level 2 (Protect your content)Level 3 (Monitor your content)Level 4 (Sustain your content)
StorageHave 2 complete copies of this repository in separate locations Document all storage media where content is stored (a finding aid for your harddrives) Put content in stable storageHave 3 complete copies of this repository in separate locations Document storage and storage media indicating the resources and dependencies they require to functionHave at least one copy in a geographic location with a different disaster threat than the other copies Have at least one copy on a different storage media type Track the obsolescence of storage and mediaHave at least three copies in geographic locations, each with a different disaster threat Maximize storage diversification to avoid single points of failure Have a plan and execute actions to address obsolescence of storage hardware, software, and media
IntegrityVerify integrity information if it has been provided with the content Generate integrity information if not provided with the content Bagit and generate a checksum (more info) Virus check all content; isolate content for quarantine as neededVerify integrity information when moving or copying content Use write-blockers when working with original media Back up integrity information and store copy in a separate locationVerify integrity information of content at fixed intervals Document integrity information verification processes and outcomes Perform audit of integrity information on demandVerify integrity information in response to specific events or activities Replace or repair corrupted content as necessary
ControlDetermine the human and software agents that should be authorized to read, write, move, and delete contentDocument the human and software agents authorized to read, write, move, and delete content and apply theseMaintain logs and identify the human and software agents that performed actions on contentPerform periodic review of actions/access logs Refer to “Updating the archive” portion of the Local Solution
MetadataCreate inventory of content, also documenting current storage locationsStore enough metadata to know what the content is (this might include some combination of administrative, technical, descriptive, preservation, and structural) You could model your metadata document off of the Guggenheim’s Identity Report for Sun Yuan and Peng Yu’s piece I Can’t Help Myself orDetermine what metadata standards to apply Refer to CodeMeta Vocabulary Find and fill gaps in your metadata to meet those standardsRecord preservation actions associated with content and when those actions occur (VCS are well fitted for this) Implement metadata standards chosen
ContentDocument file formats, and other essential content characteristics including how and when these were identified. This documentation should also include dependenciesVerify file formats and other essential content characteristics Build relationships with content creators to encourage sustainable file choices. You could sign up for notifications and updates on changes to the software your repo is dependent on. Sign up for email notifications ofMonitor for obsolescence, and changes in technologies on which content is dependentPerform migrations, normalizations, emulation, and similar activities that ensure content can be accessed