Customizing Git - Git Attributes



Git Attributes

We may set up Git's behavior for particular files or folders in our repository using Git attributes.

For temporary or local settings, we declare these attributes in the .git/info/attributes file or in a .gitattributes file, which is often found in the root directory.

  • Using attributes we are can customize merging techniques for conflict resolution.

  • Can treat non-text files differently when doing merges and diffs.

  • In order to make sure Git functions in accordance with the particular requirements of our project, we may additionally implement filters to evaluate file content before committing or checking out.

The .gitattributes file should be placed at the root of your repository. You can also create it in subdirectories to apply rules to specific parts of your project.

Binary Files

Git attributes allow us to identify binary files and provide instructions to Git on how to handle them.

The attributes aid Git in applying certain guidelines to binary files that it may not be able to identify automatically.

  • While some binary files can be diffed, some machine-generated text files might not be.

  • We make sure Git handles every kind of file properly by specifying these properties.

  • Git now handles and compares various file types in our repository more effectively.

Identifying Binary Files

Despite having a text appearance, certain files should be seen as binary data due to their intended use.

For example, UTF-8 text files called .pbxproj in Xcode projects serve as small databases.

  • Because of their nature, these files are not appropriate for merging or diffing.

  • Even though they are text-based, their purpose is not human editing but machine consumption.

  • In order to prevent problems with merging and diffing, we need to setup Git such that it handles these files as binary.

To instruct Git to handle all .pbxproj files as binary data, include *.pbxproj binary in our .gitattributes file.

When using git show or git diff, this setting stops Git from converting CRLF line endings and stops it from creating diffs for changes to certain files.

*.pbxproj binary

Diffing Binary Files

Git attributes can be used to manage binary file diffing by converting binary data into a text format for comparison.

This method works well in Git for managing binary files such as Microsoft Word documents.

  • We can efficiently track changes and apply version control to these files by configuring Git to transform binary data.

  • By using this method, we can use the diff tools in Git on binary files even if they are raw binary files.

  • We can more effectively manage and version binary files in a Git repository with proper configurations.

The result of git diff on a binary file indicates that the two versions of report.docx differ from one another, but Git simply shows that the files have changed and not the actual content.

Git characteristics must be configured in order to compare versions of binary files, such as docx, directly in the editor.

For example, we can implement a custom filter to perform the task of comparing .docx files. This is how we can configure it:

Write the following in the .gitattributes file:

*.docx diff=word

This line instructs Git to show diffs for .docx files using the word filter.

  • To transform word files into a readable format, the word filter needs to be set up.

  • We can configure Git, for example, to use docx2text or another tool to convert docx to text format so that they may be diffed.

  • With this configuration, docx versions may be easily compared by turning them into text files for diff analysis.

Using Git to diff .docx files:

Install docx2txt by downloading it from SourceForge and follow the setup guidelines.

Create a Wrapper Script: Type the following content into a script called docx2txt :

#!/bin/bash
docx2txt.pl "$1" -

Put this script in a directory located in the PATH of our system.

Make the Script Executable: To provide the script executable permissions, run the following command:

chmod a+x /path/to/docx2txt

Configure Git: Set the textconv filter in Git's configuration to instruct Git to use the script for.docx files.

git config diff.word.textconv docx2txt
  • Git is set up to use the docx2txt tool for diffs involving .docx files.

  • Git uses the word filter to turn a .docx file into text when it comes across one.

  • Git can now produce readable text-based Word document diffs as a result.

The problem of diffing image files can be solved by extracting and comparing their information instead of the actual image content.

  • We can convert the EXIF metadata from images into a text format by utilizing a tool such as exiftool.

  • Git can now do diffs based on changes to the textual metadata, giving information about any changes made to the image files.

To set up Git to compare the metadata of images in order to diff them:

Modify .gitattributes : Include the following line in your .gitattributes file to indicate that .png files need to utilize the exif filter:

*.png diff=exif

Set up Git: utilize this command to configure Git to utilize exiftool for the exif filter:

git config diff.exif.textconv exiftool

With this configuration, Git will be able to use exiftool to convert image metadata to text, enabling meaningful diffs based on changes in the metadata.

Git displays a diff of the metadata that was taken out of the image file using exiftool when we replace an image and run git diff.

Instead of directly comparing the image content, this provides a textual representation of changes by displaying variations in the image's metadata, including file size, modification date, and dimensions.

Keyword Expansion

A particular feature of SVN or CVS that embeds metadata (such as version numbers or revision IDs) within files is called keyword expansion.

Git prevents file updates after commit by using checksums to track file changes, which makes traditional keyword expansion impractical.

  • Instead, Git allows metadata to be added to files during checkout and then removed before committing.

  • This method uses Git attributes to control when and how metadata is added and removed.

  • Git attributes provide two ways to deal with keyword expansion: using built-in algorithms or creating custom filters.

Using Git attributes, we may automatically put a file's (or blob's) SHA-1 checksum into a specified field (such $Id$).

  • Every time we check out the branch, the file's checksum is updated in this field.

  • The content of the file is represented by the checksum, not the commit to which it belongs.

To allow the SHA-1 checksum to be automatically appended to files, perform steps as follows:

Include the following line in our file named .gitattributes:

*.txt ident

Create a test file that contains a reference to $Id$:

echo '$Id$' > test.txt

Remove the test file and then check it out again:

rm test.txt
git checkout -- test.txt

See the injected SHA-1 checksum by viewing the altered file:

cat test.txt
$Id: 42812b7653c7b88933f8a9d6cad0ca16714b9bb3 $

Because it lacks information about the file's age, the SHA-1 checksum result obtained by Git's keyword substitution is not very helpful on its own.

Git's SHA-1 is random and doesn't reveal the age of the file, in contrast to CVS or Subversion, which may contain datestamps.

git attributes

However, Git allow us to design unique smudge and clean filters to handle files during checkouts and commits.

Before files are checked out or staged, these filters can be set up to do different things in the .gitattributes file.

git attributes

In the example commit message demonstrates how to use the indent program to automatically format C source code before committing.

To do this, add the subsequent line to our .gitattributes file:

*.c filter=indent

Configure the following commands to specify how Git's indent filter behaves:

Specify that the indent program should be used for cleaning files (before staging):

git config --global filter.indent.clean indent

Before checking out, set up the cat command to be used for smudging files:

git config --global filter.indent.smudge cat

With this configuration, Git will utilize the cat program (which does nothing) before checking out C source code files, and it will run them via the indent program before staging them.

This guarantees that before they are committed, all C files are formatted using indent.

Exporting Our Repository

Git attributes gives us the ability to manage the file processing that takes place during project archiving, allowing us to create customized exports that include additional metadata, filtering, and transformations.

  • We can specify which files or directories should not be included in an archive by using the export-ignore property.

  • We can use it to preserve directories or files in our repository that we don't need in the exported archive.

  • To make sure certain files or directories are not included in the archive, set the export-ignore property on them.

  • By retaining important files in the repository and removing irrelevant ones, this aids in managing the contents of our exported archive.

We can add the following line to our Git attributes file if we have internal documentation in a docs/ directory that we do not want to be included in the zip archive for our project:

docs/ export-ignore

The docs/ directory will not be included in the archive when we use git archive to generate a zip file of our project.

export-subst

We can apply Git's log formatting and keyword expansion to specific file sections during export by using the export-subst attribute.

  • It makes it possible to substitute dynamic data, like version numbers or commit details, for placeholders in files.

  • It guarantees that the metadata in our exported files is up-to-date as it comes from the repository.

To automatically inject metadata into a file named LAST_COMMIT during archive creation, follow these steps:

Configure .gitattributes:

echo 'LAST_COMMIT export-subst' > .gitattributes

Create the LAST_COMMIT file with metadata placeholders:

echo 'Last commit date: $Format:%cd by %aN$' > LAST_COMMIT

Add and commit the changes:

git add LAST_COMMIT .gitattributes
git commit -am 'Add LAST_COMMIT file for archives'

Create the archive and check the contents:

git archive HEAD | tar xCf ../deployment-testing -
cat ../deployment-testing/LAST_COMMIT

The LAST_COMMIT file in the archive will show metadata such as the last commit date and author.

When creating an archive, we can include formatted commit metadata in a file using the export-subst attribute.

Merge Strategies

Git attributes let us use several merging algorithms on particular files at merge time.

  • When conflicts occur, we can set Git to merge the changes or to automatically favor our version of a file over others.

  • This guarantees that during merges, crucial or specialized files don't change.

  • We can keep control over how files are handled in complex merging scenarios by defining these merge methods.

We can set up an attribute like this:

strings.json merge=ours

Then, configure a dummy ours merge strategy with:

git config --global merge.ours.driver true

When we perform a merge from another branch, we will avoid merge conflicts with strings.json.

Instead, we see output like this:

$ git merge feature-branch
Auto-merging strings.json
Merge made by recursive.

In this case, strings.json will remain in its original state, with no changes from the merge.

Benefits of Using Git Attributes

Following are some of the benefits of using Git Attributes:

  • It ensures consistent handling of files across various environments and operating systems.

  • For specific files, it is helpful in managing merge conflicts and diffing.

  • It defines clear and precise rules for file handling, which results in easier and effective collaboration.

Advertisements