How to build reference genome?
Scenario 1: Building a reference genome that is compatible with single-cell data from different platforms.
If you have both single-cell data from 10X Genomics and SeekOne® products, it is recommended to use 10X CellRanger to build the reference genome. SeekSoulTools is compatible with the reference genome built by CellRanger. The code for processing gene annotation files (GTF files) is as follows:
The code for processing gene annotation files (GTF files) is as follows:
/path/to/cellranger mkgtf Homo_sapiens.GRCh38.ensembl.gtf Homo_sapiens.GRCh38.ensembl.filtered.gtf \
--attribute=gene_biotype:protein_coding \
--attribute=gene_biotype:lncRNA \
--attribute=gene_biotype:antisense \
--attribute=gene_biotype:IG_LV_gene \
--attribute=gene_biotype:IG_V_gene \
--attribute=gene_biotype:IG_V_pseudogene \
--attribute=gene_biotype:IG_D_gene \
--attribute=gene_biotype:IG_J_gene \
--attribute=gene_biotype:IG_J_pseudogene \
--attribute=gene_biotype:IG_C_gene \
--attribute=gene_biotype:IG_C_pseudogene \
--attribute=gene_biotype:TR_V_gene \
--attribute=gene_biotype:TR_V_pseudogene \
--attribute=gene_biotype:TR_D_gene \
--attribute=gene_biotype:TR_J_gene \
--attribute=gene_biotype:TR_J_pseudogene \
--attribute=gene_biotype:TR_C_gene
cellranger mkref --genome=GRCh38 --fasta=GRCh38.fa --genes=GRCh38-filtered-ensembl.gtf
cd GRCh38/genes
gunzip -dc genes.gtf.gz > genes.gtf
Note
If the reference genome built by CellRanger is not compatible with the STAR version of SeekSoulTools, you can specify the STAR path of CellRanger for SeekSoulTools with
--star_path /path/to/cellranger-5.0.1/lib/bin/STAR
.The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is
chr1
, then the name of chromosome 1 in the gtf file must also bechr1
.
Scenario 2: if you only have SeekOne® products, there is no need to consider platform compatibility.
The code for building genome index using STAR is as follows:
/demo/seeksoultools.1.2.0/bin/STAR \
--runMode genomeGenerate \
--runThreadN 16 \
--genomeDir /path/to/star \
--genomeFastaFiles /path/to/genome.fa \
--sjdbGTFfile /path/to/genome.gtf \
--sjdbOverhang 149 \
--limitGenomeGenerateRAM 17179869184
Note
The chromosome names in fasta files must match the chromosome names in the gtf file. For example, if the name of chromosome 1 in fasta files is
chr1
, then the name of chromosome 1 in the gtf file must also bechr1
.