Phenome Wide Association Studies

Phenome-wide association studies (PheWAS) analyze many phenotypes compared to a single genetic variant (or other attribute). This method was originally described using electronic medical record (EMR) data from EMR-linked in the Vanderbilt DNA biobank, BioVU, but can also be applied to other richly phenotyped sets.



Phecode Maps

Phecode Maps Description Phecode count Pre-release Base terminology(s)
Phecode Map 1.2 (Combined) Updated release 1,866 No ICD9 and 10
Phecode Map X Extended Phecode groupings (not based on prior versions) 3,612 Yes ICD10

How to Cite the PheWAS catalog

Denny JC, Bastarache L, Ritchie MD et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10.[Article]


Download PheWAS Software

PheWAS R package : this package contains methods for performing PheWAS.

PheTK - The Phenotype Toolkit : a fast python library for PheWAS utilizing both Phecode Map 1.2 and Phecode Map X

A shared notebook of PheTK demo in All of Us Research Hub : requires a All of Us Research Hub account to access


PheWAS Catalogs

The original PheWAS catalog contains the PheWAS results for 3,144 single-nucleotide polymorphisms (SNPs) present in the NHGRI GWAS Catalog as of 4/17/2012 in 13,835 European-ancestry individuals from five sites of the Electronic Medical Records and Genomics ( eMERGE) network. A total of 1,358 EMR-derived phenotypes were analyzed for each SNP. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations, and 210/751 of all prior GWAS associations. We also identified 63 potentially pleiotropic associations with p < 4.6x10-6 (false discovery rate < 0.1); the strongest of these novel associations replicated in an independent cohort (n=7,406). The catalog contains all associations with p < 0.05 (uncorrected). Click Here for More Information


PheWAS References

Denny JC, Bastarache L, Ritchie MD et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10.[Article][Pubmed]

Denny JC, Ritchie MD, Basford M, et al. PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010 May 1;26(9):1205-10.[Article][Pubmed]

Denny JC, Crawford DC, Ritchie MD, et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet. 2011 Oct 7;89(4):529-42.[Article][Pubmed]

Ritchie MD, Denny JC, Zuvich RL, et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation. 2013 Apr 2;127(13):1377-85.[Article][Pubmed]

Simonti CN, Vernot B, Bastarache L, et al. The phenotypic legacy of admixture between modern humans and Neandertals. Science. 2016 Feb 12;351(6274):737-41. [Article][Pubmed]

Bastarache L. Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS. Annu Rev Biomed Data Sci. 2021 Jul 20. [Article][Pubmed]


PhecodeX (Extended), version 1.0
Next-generation phenotyping: introducing phecodeX for enhanced discovery research in medical phenomics

Version X addresses the following issues:

*More phenotypes

Phecodes were designed to support replication of genotype/phenotype associations in the GWAS catalog. Thus, we focused on creating codes that capture common/complex diseases found in adults. Many specific ICD codes were aggregated into broad phecodes, particularly in chapters relating to pregnancy, congenital anomalies, and neonatology. PhecodeX adds granularity across the coding structure and includes 3612 phecodes, compared with 1851 phecodes in v1.2. These added phecodes are meant to facilitate new research applications, such as the study of Mendelian disease or pregnancy-related conditions.

*New Look

Each phecodeX label is prefixed by a two-letter label indicating the category, followed by an underscore and a three-digit root code. In contrast, v1.2 phecodes were labeled with three-digit root codes, similar to ICD-9s. The character prefixes make phecodeX visually distinct from v1.2 and ICD codes, and prevent programs like R and Excel from corrupting codes by interpreting them as integers (e.g., phecode “008” being transformed to “8”). The numeric component of each phecodeX code is unique, even without the prefix.

*New categories

PhecodeX introduces a new section for genetic conditions that includes 324 phecodes for specific genetic diagnoses and chromosomal anomalies (e.g., Rett syndrome, Trisomy 18, and DiGeorge syndrome). PhecodeX also includes a new neonatal section. The 'injuries/poisonings' section in v1.2 has been removed from phecodeX.

*Alignment with ICD-10

Phecodes were developed before the release of ICD-10, and their structure largely conforms to the ICD-9 coding system. ICD-10 introduced new, more granular concepts that were not captured in the previous system. The new version created 574 new codes that pertain to ICD-10 only codes. These codes are marked as “1” in the icd10_only column and their phecode string ends with a “*”.

*Multi-mapping

The original phecode structure was based on a 1-to-1 mapping (each ICD mapped to a unique phecode). To incorporate ICD-10s into the map, we needed to do away with the 1-to-1 convention. For version X, we created new phecodes that took advantage of this new flexibility, which is particular helpful for infectious disease phenotypes. For example, V 1.2 maps “Streptococcus pneumoniae” to pneumonia, in the respiratory section; Version X maps “Streptococcus pneumoniae” to phecodes for pneumonia as well as streptococcus infections (in the ID section).

PheWAS R integration:

PhecodeX is compatible with the popular R PheWAS package. Instructions to use phecodeX are on GitHub

Caveats:

Many of the phenotypes added to phecodeX are for rare conditions and symptoms,and many of the phecodes for common/complex disease are highly similar between v1.2 and phecodeX. Thus, both phecode versions may produce highly similar results in analyses relating to common diseases of adulthood. A large number of studies have been published using phecodes v1.2, including publically available catalogs on this site and on pheweb , a site hosted by University of Michigan. For this reason, researchers may chose to continue using phecodes v1.2, which has been shown to be suitable for the study of common/complex diseases. We encourage researchers to experiment with both maps to determine which is right for their project.

Projects that use phecodeX:

The added granularity of phecodeX was instrumental in several published studies, including work relating to perinatal risk factors, hereditary cancer syndromes, PheWAS analysis with PheTK [ PheTK GitHub ], and a knowledgebase designed to interpret PheWAS results.

Future work:

The utility of phecodes has always been grounded in their ability to reproduce known associations. With this mindset, we are working on studies that compare phecodes V1.2 and phecodeX in terms of their ability to replicate known genetic associations and to rare disease. Stay tuned.


Maps:

The following files contain a map of ICD-9 to phecodes, ICD-10 to phecodes, and phecode strings.

Downloads:

Phecode information file:

phecodeX_info.csv This file includes phecodes, phecodes strings, category labels, and information relating to sex-specific codes


We provide ICD to phecode map files that support both ICD-9 and -10 Clinical modification (CM), the extended version of ICDs used by the United States (US), as well as the WHO version of ICD-10. If you are using ICD codes from the US, use the 'CM' files; otherwise, use the 'WHO' files.

ICD to phecode map, unrolled:

The unrolled map file maps ICD codes to phecodes. The relationships are 'unrolled' such that a child phecode is mapped to all parents (e.g. 'Type 2 diabetes' implies 'Diabetes mellitus'). The 'flag' column indicates whether the ICD is an ICD-9 code or ICD-10 code. This file is useful for translating ICDs to phecodes, which can be done using join in mysql or merge in R (with ICD and vocubulary_id as keys).

phecodeX_unrolled_ICD_CM.csv Compatible with the clinical modification (CM) ICD-9 and -10 codes.

phecodeX_unrolled_ICD_WHO.csv Compatible with the WHO's ICD-10.


ICD to phecode map, descriptive:

This is a highly descriptive mapping file that includes ICDs and phecodes along with their descriptive labels. It is useful for browsing how phecodes are defined. The file is 'flat' meaning the ICD->phecode relationships are not unrolled.

phecodeX_ICD_CM_map_flatv.csv Compatible with the clinical modification (CM) ICD-9 and -10 codes.

phecodeX_ICD_WHO_map_flat.csv Compatible with the WHO's ICD-10.

For more information about phecodeX files and formatting, see our GitHub repositories for phecodeX


Share your comments and thoughts about phecodeX to lisa.bastarache@vumc.org


Expert consultation:

In creating the phecodes, we consulted with 21 clinicians, many of whom use phecodes regularly, for advice on how best to structure the phecodes. We are so grateful for their help! Crack team of clinicians who helped with phecodes:

April Barnado,Julie Bastarache,Elly Brokamp,Meredith Campbell,Jeff Goldstein,Beth Ann Malow,Johnathan Mosley,Travis Osterman,

Dolly Padovani-Claudio,Andrea Ramirez,Dan Roden,Bryce Schuler,Eddie Siew,Bill Stead,Jen Sucre,Isaac Thomsen,Rory Tinker,Sara Van Driest,

Colin Walsh,Jeremy Warner,Quinn Wells,Lee Wheless,

informatics expertise:

Megan Shuey, Ida Aka, Adam Lewis


Learn more:

If you'd like to read more about phecodes in general, this paper provides a detailed overview.


Phecode Map 1.2 with ICD9 and ICD-10cm Codes


Maps:

The following files contain a map of ICD-9 to phecodes, ICD-10 to phecodes, and phecode strings.

Downloads:

Phecode Definitions 1.2:

phecode_definitions1.2.csv This is the description of each phecode, including phecode string and exclude range.


Phecode 1.2 map of both ICD-9 and -10 Clinical modification (CM)

This map is useful for defining phecode case groups by joining the map to a table of individuals and ICD-9-CM or ICD-10-CM codes.

Phecode_map_v1_2_icd9_icd10cm.csv Compatible with the clinical modification (CM) ICD-9 and -10 codes.


Phecode 1.2 map of ICD-10 published by the World Health Organization (WHO)

This map is useful for defining phecode case groups by joining the map to a table of individuals and ICD-10-WHO codes.

Phecode_map_v1_2_icd10_WHO_beta.csv Compatible with the World Health Organization (WHO) ICD-10 codes.


Share your comments and thoughts about phecode 1.2 to lisa.bastarache@vumc.org


PheWAS of GWAS Catalog of SNPs


Download this Full Catalog:

phewas-catalog.csv


Citations:

Denny JC, Bastarache L, Ritchie MD et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10.[ Article ] [ Pubmed ]


Share your comments and thoughts about phewas catalog to lisa.bastarache@vumc.org


HLA PheWAS: Discovery & Replication Results


Download this Full Catalog:

hla-phewas-catalog.csv


Citations:

Karnes JH1, Bastarache L2, Shaffer CM3, et al. Phenome-wide scanning identifies multiple diseases and disease severity phenotypes associated with HLA variants. Sci Transl Med. 2017 May 10;9(389).[ Article ] [ Pubmed ]


Share your comments and thoughts about HLA PheWAS to lisa.bastarache@vumc.org


Telomere PheWAS: Meta-analysis for Telomere length, Marshfield clinic and BioVU


Download this Full Catalog:

PheWASofTL_allresults_manuscript_Oct14th2022.csv


Share your comments and thoughts about Telomere PheWAS to lisa.bastarache@vumc.org