Activation of MHC class I antigen presentation in the lungs of COVID-19 patients is associated with the amount of SARS-CoV-2 RNA in the tissue
A certain proportion of COVID-19 patients admitted to hospitals develop acute respiratory distress syndrome . We thus decided to investigate the lung transcriptomes of patients with COVID-19 using gene-set enrichment analysis (GSEA) to identify differences between two groups of patients who presented different levels of SARS-CoV-2 virus in their tissues, as detected by in situ hybridization (ISH) of viral RNA. A cohort of 14 patients was screened using 45 different microarray experiments in lung samples and their SARS-COV2 levels were also assessed. So, a two-group stratification of GSE150316 transcriptome cohort was defined according to their level of SARS-COV2 in lung (threshold 5 percent). Patients processed in this cohort come from two different medical centers: a majority of them from the Massachusetts General Hospital and a minority from the Colombia university Medical center (Fig. 1A). Clinical analysis of this cohort stratification revealed a significant association of SARS-COV2 virus quantification between the two groups of patients with a positive association (p-value = 6.67E−4, Additional file 1: Table S1). Analysis performed on symptoms at admission showed that patients that harbored high levels of virus in the lung were older with less coughing and myalgia but had increased symptoms of lethargy, weakness and hypotension (Additional file 1: Table S1). The preexisting disease association analysis, showed a negative relation between hypertension and high levels of SARS-COV2 but suggested a positive relation between gastrointestinal reflex diseases and high level of SARS-COV2 (Additional file 1: Table S2). Time from admission to death was shorter for patients with high level of SARS-COV2 (3 days versus 14 days, p = 0.0029) showing that this stratification in two groups was clinically justified. Consequently these patients were hospitalized for a shorter period of time than patients with a low viral load (8 days versus 17 days, p = 0.0082). Therapy intervention analysis showed a tendency for a negative relation between administrations of atorvastatin and hydroxychloroquine suggesting that these treatments tend to be more administrated in group of patients with low level of virus (Additional file 1: Table S3). An analysis of gene set enrichment was conducted using MSigDB version 7.2 with the Reactome and Gene Ontology subset; this revealed a significant increase in MHC class I antigen processing and presentation in the cells of patients with higher viral loads (NES: normalized enrichment score, FDR: False Discovery Rate, Fig. 1B). A principal component analysis was carried out on the expression profiles of genes associated with MHC class I antigen processing and presentation, and the resulting clustering pattern significantly discriminated between lung samples with a low amount of virus and those with a high amount of virus (p-value = 0.018, Fig. 1C). Using these expression profiles, we were also able to build a highly connected gene (174 relations) network for MHC class I antigen processing and presentation (Fig. 1D, Additional file 1: Table S4). These results suggest that in the lungs of COVID-19 patients, the intensity of antigen processing and presentation by class I molecules is proportional to the level of detected virus.
Activation of genomic cluster associated with the immunoproteasome in the lungs of COVID-19 patients correlates with the amount of SARS-CoV-2 RNA in the tissue
During MHC antigen presentation, catalytic subunits of the immunoproteasome are tightly regulated in different cell types, leading to the production of distinct repertoires of presented peptides . To assess potential changes in the expression of such genes in COVID-19 patients, we first assembled a geneset of immunoproteasome-associated genes. Using the PubTator Central text-mining algorithm , which explores the full text of articles in Pubmed Central, we analyzed all gene citations in Ferrington and Gregerson’s  review of immunoproteasome literature (PMC4405001). The resulting immunoproteasome-related gene set was then used to conduct a GSEA analysis between the two groups of COVID-19 patients (GSE150316) with low and high levels of virus in the lung. With this, we detected an increase in immunoproteasome activation that followed the level of virus detected in the lung (NES = + 2.24, Fig. 2A). Unsupervised principal component analysis performed with immunoproteasome related genes confirmed stratification of patient groups with low and high levels of SARS-COV2 in the lungs (p-value = 0.00099, Fig. 2B). To further investigate immunoproteasome implication, we performed differential expression analysis of its main components between groups of patients with low and high levels of SARS-COV2 virus in the lung. Specifically, the amount of virus in lung tissue was found to be associated with an increase in the expression of the following genes: PSMB8 (p-value = 0.00035, Fig. 2C), PSMB8-AS1 (antisense of PSMB8, p-value = 0.00032, Fig. 2C), PSMB9 (p-value = 0.029, Fig. 2C), TAP1 (p-value = 0.0033, Fig. 2C), TAP2 (p-value = 0.0053, Fig. 2C) and CALR (p-value = 0.021, Fig. 2C). When we examined the expression of immunoproteasome-related genes in the transcriptome dataset, we observed a positive correlation between the expression of PSMB8 and TAP1, CALR and CANX, TAP1 and TAP2, and TAP1 and PSMB10 (Fig. 2D). Co-regulated genes of the immunoproteasome were found to map together to the 6p21.32 genomic cluster of the human genome (HG38), specifically, inside the HLA class II region, from 32.820 to 32.860 kb (Fig. 2E). These results suggest that in the lungs of COVID-19 patients, immunoproteasome components located in a 40-kb span of the 6p21.32 genomic region are activated in a manner that may be dependent on the viral load of SARS-CoV-2.
Single-cell transcriptomes of bronchoalveolar cells reveal immunoproteasome activation in M1 macrophage cells during mild cases of COVID-19
Next, patterns of immunoproteasome activation were investigated in a dataset of single-cell transcriptomes obtained from bronchoalveolar fluid lavages from six healthy donors, three patients with mild COVID-19, and three patients with severe COVID-19 . These data were merged together for the purpose of UMAP dimensionality reduction (Fig. 3A) using the R-package Seurat ; after canonical correlation and filtration, the merged transcriptome analysis comprised 90,696 cells, with a representative proportion of each cell subgroup. A dotplot of the relative expression of macrophage-related markers appears in Fig. 3B. In these bronchoalveolar samples, we found that the M1 macrophage marker CD68, was expressed with a higher level in a cell population smaller in size for mild COVID-19 patients as compared to severe COVID-19 patients. As well as the macrophage-related genes MYD88 and STAT1, were upregulated in bronchoalveolar cells from patients with mild COVID-19, while cells from patients with severe COVID-19 were characterized by upregulation of the CD163 marker of M2 macrophages (Fig. 3B). Mild cases of COVID-19 were also characterized by increased expression of immunoproteasome components (Fig. 3C). Specifically, along with CD68 (Fig. 3D), the immunoproteasome-related genes PSMB8 (Fig. 3E), TAP1, PSMB9, PSMB10, and calreticulin (CALR) (Additional file 1: Figure S1) were all found to be expressed in the CD14+/CD16+ subpopulation of cells during mild COVID-19 (Fig. 3A). A correlation study of single-cell expression patterns in these bronchoalveolar cells revealed positive correlations between CD68 and TAP1 (coefficient r = 0.34), CD68 and PSMB8 (coefficient r = 0.56), CD68 and PSMB9 (coefficient r = 0.52), CD68 and PSMB10 (coefficient r = 0.58), and CD68 and CALR (coefficient r = 0.72) (Fig. 3F). When we instead investigated the relationships between these genes and the M2 macrophage marker CD163, which was overexpressed in bronchoalveolar cells from patients with severe COVID-19 (Fig. 3B and Additional file 1: Figure S2A), we found correlations that were still positive but notably weaker (between CD163 and PSMB8, coefficient r = 0.45; CD163 and PSMB9, coefficient r = 0.46; CD163 and PSMB10, coefficient r = 0.47; CD163 and CALR, coefficient r = 0.58) (Additional file 1: Figure S2B). These results suggest that immunoproteasome activation occurs in bronchoalveolar M1 macrophages during mild COVID-19 disease.
Single-cell heterogeneity in PSMB8 expression shapes a cell trajectory in alveolar M1 macrophages during mild COVID-19
Alveolar macrophages play a critical role in regulating immune responses  and maintaining homeostasis in the lung [31, 32]. Macrophages have been categorized as being “polarized” towards either the M1 (CD68+) proinflammatory phenotype, which mediates resistance to pathogens, or the M2 (CD163+) anti-inflammatory phenotype, which promotes tissue remodeling . Using the same dataset of single-cell transcriptomes of bronchoalveolar cells in patients with mild and severe COVID-19 , we decided to examine the relationship between CD68+ macrophages and immunoproteasome components in more detail. With the R-package monocle , we constructed a cell trajectory of PSMB8 expression in the CD14+/CD68+ subpopulation of bronchoalveolar cells from COVID-19 patients (Fig. 4A–C). In the resulting pseudotime transformation of cell development, branch number 3 appears to differentiate between cells with low expression of PSMB8 (i.e. severe COVID-19) from those with medium and high levels of PSMB8 (mild COVID-19) (Fig. 4D). The pseudotime expression signature of this branch highlighted a major cluster of inflammation-associated genes that are activated in severe COVID-19 (bottom cluster, Fig. 4E) and a smaller cluster of molecules activated only in cells from mild cases of COVID-19 (top cluster, Fig. 4E). This latter group comprised both PSMB8 and the CD68 marker (Fig. 4E). When we analyzed functional enrichment in this gene cluster using the Toppgene internet application , we discovered that, in mild cases of COVID-19, CD68+ cells with high levels of PSMB8 appear to demonstrate enrichment in functions such as defense response (comprising PSMB8, CD68, GRN, APOE, FN1, MDK, NUPR1, IFIT2, MARCO, HLA-DQA2,CAPG, FABP4, and TUBB) and lipid homeostasis (lipid transport and lipid catabolism: APOC1, APOE, CES1, PLD3, FABP4, CES1, PLIN2, FABP4, RBP4, and CYP27A1) (Fig. 4F). Similarly, in the trajectory signature for alveolar macrophages from mild COVID-19 patients (Additional file 1: Table S5), CALR, MARCO and TNFSF13 followed the expression patterns of CD68 and PSMB8 (Fig. 4G). These results suggest that CD14+/CD68+ bronchoalveolar cell activation in mild COVID-19 is accompanied by enhanced expression of immunoproteasome-related components and functionalities implicated in the defense response and lipid homeostasis. Instead, all of these activities appeared to be impaired in the corresponding cell population in patients with severe COVID-19.
High diversity in Asian populations for strong MHC class-I presenters of SARS-CoV-2
One of the main consequences of the activation of M1 macrophages is the presentation of viral peptides to the immune system and stimulation of the Th1 cytotoxic cell response. The efficiency of this mode of antigen presentation is still not well understood, but may be influenced by the genetic heterogeneity of presenters belonging to HLA class I. Of these, the genes HLA-A, -B, and -C are expressed in most human cell types and are the most polymorphic genes in the human genome, with over 12,000 distinct alleles documented worldwide. Other HLA class I molecules, like HLA-E, -F, and -G, are expressed only in specialized cell types . We decided to examine the diversity of the most common HLA class I alleles (types A, B, and C) in order to predict which types might bind more effectively with SARS-CoV-2 peptides. First, we investigated the prevalence of the 69 most common HLA class I alleles (types A, B, and C only) worldwide . This was accomplished through use of the bone marrow registry dataset from the USA, which documents the HLA status of 2,790,874 donors, representing 20 ethnicities with worldwide distribution (Additional file 1: Table S6) . In total, 69 alleles were found to be prevalent in the 20 ethnic groups examined. An unsupervised principal component analysis based on the prevalence of these 69 alleles clustered the 20 ethnicities into three main groups, representing three major regions of the world: Africa, Asia, and Europe/Americas (Additional file 1: Figure S3A-B). These results suggest that these 69 alleles of HLA-A-B-C are representative of much of the world’s population diversity. Certain alleles demonstrated region-specific patterns of prevalence, such as HLA-C*08:01, with a prevalence of more than 0.15 in some Asian populations (Additional file 1: Figure S3C); HLA-A*25:01, with a prevalence of more than 0.20 in some Western populations (Additional file 1: Figure S3D); and HLA-B*53:01, with a prevalence of more than 0.10 in African populations (Additional file 1: Figure S3E). Other HLA alleles, instead, demonstrated more of a worldwide distribution (Additional file 1: Figure S3B). This pattern of clustering into three world regions based on allele prevalence was further validated by a Random Forest learning algorithm (100 trees, total error rate 0.23, Additional file 1: Figure S4A), which was successfully able to classify each allele/world region in the majority of cases (Additional file 1: Figure S4B–D). Furthermore, based on their respective scores for Accuracy and Gini Importance, ethnicities from each world region were found to contribute to the learning process (Additional file 1: Figure S4E), suggesting that these 69 HLA alleles were relatively evenly distributed worldwide.
Our next step was to predict the respective binding patterns of these alleles to SARS-CoV-2 peptides using the algorithm NetMHCpan-4.1 , which is able to predict peptide binding to any MHC class I molecule of known sequence using artificial neural networks trained with information from 850,000 peptides (data on binding affinity and/or from mass spectrometry). For this analysis, we collected 38 protein sequences encoded by the entire genome of SARS-CoV-2 Wuhan-Hu-1 from the NCBI database (Additional file 1: Table S7) . An analysis of these 38 protein sequences and the 69 tested alleles predicted 102,524 possible peptide binding events (Additional file 1: Table S8). As described by the literature associated with the NetMHCpan algorithm  and applied in the study of Barquera et al. , it is possible to distinguish three subgroups of MHC class I molecules based on their binding affinities: strong peptide binders, with a binding affinity between 0 and 50 nM; regular peptide binders, with a binding affinity between 50 and 500 nM; and weak peptide binders, with a binding affinity between 500 and 5000 nM. Peptide counts for the three binding subgroups and the 69 investigated HLA alleles are provided in Additional file 1: Table S9; based on their proportions, we were able to identify seven clusters of alleles (Fig. 5A). These seven clusters could be further grouped into three superclusters based on the proportion of strong binder peptides (Fig. 5A): cluster 1.LOW_C123 (red cluster), with low amounts of strong binder peptides; cluster 2.MEDIUM_C67 (blue cluster), with medium amounts of strong binder peptides; and cluster 3.HIGH_C45 (green cluster), with high amounts of strong binder peptides. When we examined the distribution of these clusters among world regions, we observed that clusters C6 and C7 (MEDIUM supercluster) were more prevalent in African and Western populations (Fig. 5B). In contrast, Asian populations harbored higher proportions of clusters C4 and C5, in the HIGH supercluster (Fig. 5B). This supercluster was largely composed of HLA type A alleles (Fig. 5C), and was particularly prevalent in Asian populations (Fig. 5D). These results suggest that, based on patterns of HLA class I diversity, populations in Asia appear to be well adapted to strong HLA class I antigen presentation of SARS-CoV-2 peptides.
Of the 10 strongest-binding HLA alleles, seven were predicted to be highly prevalent in Asia (Fig. 6A, B): HLA-A*11:01, predicted to bind 1593 peptides and with a prevalence > 0.020 in Chinese and Vietnamese populations (Fig. 6A); HLA-A*02:06, 3962 peptides, prevalence > 0.075 in Hawaiian, Japanese, and Korean populations (Fig. 6A); HLA-B*40:01, 559 peptides, prevalence > 0.16 in Hawaiian and Chinese populations (Fig. 6A); HLA-C*03:03, 3678 peptides, prevalence > 0.10 in Japanese and Korean populations (Fig. 6B); HLA-C*03:04, 2295 peptides, prevalence > 0.10 in Japanese, Hawaiian, and Chinese populations (Fig. 6B); HLA-C*03:02, 3678 peptides, prevalence > 0.075 in the Chinese population (Fig. 6B); and HLA-B*15:25, 3753 peptides, prevalence > = 0.05 in the Vietnamese population (Fig. 6B). Of the remaining three strong-binding alleles, one was most common in Europe/Americas (HLA-A*02:01, predicted to bind 2356 peptides, prevalence > 0.20 in American and Caucasian populations; Fig. 6A) and another was most common in African populations (HLA-A*68:02, predicted to bind 3199 peptides, prevalence > 0.06 in African populations; Fig. 6B).
With respect to the 10 poorest-presenting HLA alleles (Fig. 7), five had worldwide distributions: HLA-C*07:04, predicted to bind 351 peptides (Fig. 7A); HLA-B*52:01, 667 peptides (Fig. 7A); HLA-C*04:01, 166 peptides (Fig. 7B); HLA-B*37:01, 760 peptides (Fig. 7B); and HLA-B*46:01, 331 peptides (Fig. 7C). Two were found with higher prevalence in African populations—HLA-A*74:01, binding 714 peptides (prevalence > 0.05, Fig. 7B); and HLA-B*58:02, binding 132 peptides (prevalence > 0.03, Fig. 7C)—and one of the weakest-presenting globally distributed alleles was also particularly frequent in African populations (HLA-C*04:01, 166 peptides, prevalence > 0.20; Fig. 7B). In Western populations, two weakly presenting HLA alleles were common: HLA-B*14:02, binding 692 peptides, prevalence > 0.05 particularly in Hispanic populations (Fig. 7C); and HLA-B*27:02, which was predicted to bind only 92 peptides (prevalence > 0.002, Fig. 7C). In Asian populations, only one weak-binding allele was identified: HLA-B*46:01, binding 331 peptides, prevalence > 0.10 in Chinese and Vietnamese populations (Fig. 7C). Based on these results, it appears that Asian populations may harbor a higher proportion of HLA class I alleles that are able to strongly present peptides of SARS-CoV-2.