How to build reference genome?

Scenario 1: Building a reference genome that is compatible with single-cell data from different platforms.

If you have both single-cell data from 10X Genomics and SeekOne® products, it is recommended to use 10X CellRanger to build the reference genome. SeekSoulTools is compatible with the reference genome built by CellRanger. The code for processing gene annotation files (GTF files) is as follows:

The code for processing gene annotation files (GTF files) is as follows:

/path/to/cellranger mkgtf Homo_sapiens.GRCh38.ensembl.gtf Homo_sapiens.GRCh38.ensembl.filtered.gtf \
    --attribute=gene_biotype:protein_coding \
    --attribute=gene_biotype:lncRNA \
    --attribute=gene_biotype:antisense \
    --attribute=gene_biotype:IG_LV_gene \
    --attribute=gene_biotype:IG_V_gene \
    --attribute=gene_biotype:IG_V_pseudogene \
    --attribute=gene_biotype:IG_D_gene \
    --attribute=gene_biotype:IG_J_gene \
    --attribute=gene_biotype:IG_J_pseudogene \
    --attribute=gene_biotype:IG_C_gene \
    --attribute=gene_biotype:IG_C_pseudogene \
    --attribute=gene_biotype:TR_V_gene \
    --attribute=gene_biotype:TR_V_pseudogene \
    --attribute=gene_biotype:TR_D_gene \
    --attribute=gene_biotype:TR_J_gene \
    --attribute=gene_biotype:TR_J_pseudogene \
    --attribute=gene_biotype:TR_C_gene
cellranger mkref --genome=GRCh38 --fasta=GRCh38.fa --genes=GRCh38-filtered-ensembl.gtf
cd GRCh38/genes
gunzip -dc genes.gtf.gz > genes.gtf

Note

  • If the reference genome built by CellRanger is not compatible with the STAR version of SeekSoulTools, you can specify the STAR path of CellRanger for SeekSoulTools with --star_path /path/to/cellranger-5.0.1/lib/bin/STAR.

  • The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is chr1, then the name of chromosome 1 in the gtf file must also be chr1.


Scenario 2: if you only have SeekOne® products, there is no need to consider platform compatibility.

The code for building genome index using STAR is as follows:

/demo/seeksoultools.1.2.0/bin/STAR \
  --runMode genomeGenerate \
  --runThreadN 16 \                        
  --genomeDir /path/to/star \             
  --genomeFastaFiles /path/to/genome.fa \  
  --sjdbGTFfile /path/to/genome.gtf \      
  --sjdbOverhang 149 \                     
  --limitGenomeGenerateRAM 17179869184     

Note

  • The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is chr1, then the name of chromosome 1 in the gtf file must also be chr1.