Skip to Main Content

EXDL 802: Doctoral Seminar

What is deduplication?

There are two forms of deduplication to consider when conducting a systematic review. The first is the removal of identical records retrieved from multiple databases. The second is the issue of multiple articles published from the same data set. If undetected, either could create bias in the conclusions of your review. 

Identifying and removing duplicate records is necessary because multiple databases often index overlapping journals. Your method of deduplication may depend on the number of articles included in your review: manual deduplication is more realistic with smaller numbers, whereas larger numbers may require automatic tools. Automatic tools are not perfect, so both methods should be used for accurate deduplication. Whichever process you decide to follow, document it and report it accurately in your article. 

Identifying multiple articles published from the same data set is a bit more complicated. The Cochrane Handbook for Systematic Reviews of Interventions offers some good suggestions for authors. While the Cochrane Handbook is focused on the health sciences, the methods can be applied across disciplines. Chapter 4 subsection 6.2 offers guidance on identifying multiple reports from the same study.

You need to track the number of duplicate articles you remove for either reason for inclusion in your prisma diagram. 

Adapted from https://guides.lib.byu.edu/systematicreviews/deduplication

Manual deduplication

Export your references to a CSV or Excel file. In most cases, you will need to first use conditional formatting in Excel to identify duplicates, then do a final scan manually.

Conditional formatting 

  • Sort the column alphabetically

    • Start with titles, though you can use this same process for any other columns you choose, such as DOI

  • Select conditional formatting from the Home ribbon, go to Highlight Cells Rules, then Duplicate Values

  • Replace punctuation (dashes, periods, question marks, semi colons, colons) in titles with spaces using the find and replace tool

  • For titles, truncating (to 30 characters, for example, though this number is arbitrary) will sometimes find more duplicates

    • insert a blank column

    • use this formula =LEFT(C2,30) where C2 is the cell you are truncating

    • copy the formula down the length of the column to truncate it all

Manual scan

  • Sort by title 

  • Scan through the list, looking for duplicate titles

  • Check the additional information (author, journal, volume, page number) to make sure it matches before designating a duplicate

DO NOT delete duplicate records. Instead, move them to a separate sheet for duplicates, to track numbers.

Adapted from https://guides.lib.byu.edu/systematicreviews/deduplication

Deduplication with software

Most bibliographic management software, including Zotero, offer a deduplication option. For more information on how to deduplicate through Zotero, see the Duplicate Detection guide.

chat loading...